Abstract
Object recognition is commonly described as a feedforward process, yet the tasks we carry out often affect what information in visual stimuli is diagnostic and may influence their processing. Surprisingly little is known about how task context is processed and when and how it interacts with the emerging representation of objects. Here we used magnetoencephalography (MEG) and multivariate decoding to investigate the temporal dynamics of task and object processing and their interaction. Participants viewed objects while we varied task context on a trial-by-trial basis, using both high-level conceptual and low-level perceptual tasks. Time-resolved multivariate decoding and temporal cross-classification revealed multiple distinct yet overlapping stages of task processing from the onset of a trial, likely reflecting a sequence of visual, semantic, mnemonic, and rule-related task representations. Object decoding was stronger in conceptual than perceptual tasks, with differences emerging around 530 ms after object onset. However, object decoding generalized well between task contexts, indicating differently strong but qualitatively similar brain responses. Using model-based MEG-fMRI fusion, we found that frontoparietal areas were strongly dominated by information about task context throughout the trial, while occipitotemporal regions reflected a mixture of both task and object category indicating their parallel encoding in the same brain area. Together, our results reveal the temporal evolution of task context representations and suggest that the impact of task context during object processing occurs late in time.
Significance Statement While much work in the vision sciences has focused on perceptual processing of visual stimuli, much less is known about the task context in which these stimuli occur. Here we studied the neural dynamics of task context and how it influences object processing. Using MEG, multivariate decoding and MEG-fMRI fusion, we reveal that task context evolves in multiple distinguishable yet overlapping processing stages, affecting object processing late in time. While frontoparietal regions were dominated by task, occipitotemporal regions exhibited a mixture of both task and object information. Our findings highlight the importance of temporal information in unravelling different stages of task processing and demonstrate the value of model-based MEG-fMRI fusion for a spatiotemporal analysis of cognitive processes.
Introduction
Our behavioral goals strongly influence how we interpret and categorize the objects around us. For example, the way in which we perceive and react towards a painting can vary quite dramatically when judging its period of art, whether it is hand-painted or a print, or whether it contains an animal or not. Despite the importance of task context in our everyday life, much work on the neural processing of objects has focused on single contexts only (e.g. fixation or discrimination). In part, this reflects the common view of object recognition as a hierarchical feedforward process localized to occipitotemporal cortex (Riesenhuber and Poggio, 2002; Serre et al., 2007; DiCarlo et al., 2012), with independent task-related processing in prefrontal and parietal cortex (Duncan, 2010). However, recent work suggests that task context can impact object representations not only in frontoparietal regions, but also in occipitotemporal visual cortex (Harel et al., 2014; Erez and Duncan, 2015; Bracci et al., 2017). In particular, task context has been reported to affect the strength (Erez and Duncan, 2015; Bracci et al., 2017) and qualitiative nature (Harel et al., 2014) of response patterns in occipitotemporal visual cortex.
While these studies demonstrate where in the brain task affects object processing, due to the low temporal resolution of fMRI they leave open the critical questions of (1) when the impact of task on object representations emerges and (2) how task is represented across time. Answers to these questions allow distinguishing alternative accounts for the mediating effects of task on object processing and in that way provide important insights into the neural mechanisms underlying task and object category. For example, changes in occipitotemporal cortex in response to task context could reflect an expectation-related top-down modulation of feedforward processing (Kok et al., 2012; Kok et al., 2013), potentially affecting the initial responses to visual stimuli. Alternatively, these responses may reflect a late, modulatory influence of task (McKee et al., 2014; see also Emadi and Esteky, 2014).
Here we studied the time course of the processing of task context and the effect of task on object category processing using magnetoencephalography (MEG) and time-resolved multivariate decoding (Carlson et al., 2013; Van de Nieuwenhuijzen et al., 2013; Cichy et al., 2014; Isik et al., 2014; Clarke et al., 2015; Kaiser et al., 2016). We measured the brain responses to object categories in four different task contexts, and used multivariate pattern classification to resolve the temporal dynamics of object category and task context, as well as their interaction. To locate the time-resolved task context and object category dynamics in the brain, we conducted model-based MEG-fMRI fusion (Cichy et al., 2014), using fMRI data from a previous study employing the same task (Harel et al., 2014). We found task context to be represented in a cascade of different processing stages across the time course of the trial. Task context affected the strength of object category representations late in time, however, we found no evidence for qualitatively different processing of objects between task types. MEG-fMRI fusion revealed strong task-related effects in frontoparietal regions, whereas task and category-related effects were mixed in occipitotemporal brain areas, suggesting their parallel processing of task and category.
Materials and Methods
Participants
22 healthy volunteers with normal or corrected-to-normal visual acuity took part in the study. Five participants were excluded due to at least one of the following exclusion criteria: behavioral performance below 90 % correct, excessive artifacts, or incomplete or corrupted recordings. Data from the remaining 17 participants (8 female, mean age 25.12, SD = 5.16) were used in all analyses throughout the study. All participants gave written informed consent as part of the study protocol (93-M-0170, NCT00001360) prior to participation in the study. The study was approved by the Institutional Review Board of the National Institutes of Health and was conducted according to the Declaration of Helsinki.
Experimental Design and Stimuli
The goal of this study was to investigate how task context is represented across time and when and how task context modulates the processing of visual object categories. For that purpose, we chose four tasks that could be carried out on a set of object images, two targeting low-level perceptual dimensions of the images, and two high-level conceptual dimensions (Figure 1A). The perceptual dimensions were Color (red / blue) and Tilt (clockwise / counterclockwise), and the conceptual dimensions were Content (manmade / natural) and Size (real world, large / small relative to an oven). Object images were chosen from 8 different categories (Figure 1B): Butterfly, cow, dresser, flower, motorbike, skate, tree, and vase. For each of the 8 object categories, we chose five different image exemplars. To allow participants to perform the Color and Tilt tasks, each object was presented with a thin red or blue outline, and objects were either tilted 30 degrees clockwise or counterclockwise relative to the principal axis of the object. The combination of stimulus types led to 160 unique stimulus combinations (8 categories × 5 exemplars × 2 colors × 2 tilts). Each stimulus was presented once in each task context, making a total of 640 stimulus presentations per participant. The presentation order of these stimulus-task combinations was randomized. In addition, we interspersed 80 catch trials that were chosen to be random combinations of task and stimulus (see below).
All stimuli were presented on black background with a white central fixation cross present throughout the experiment. Object images were greyscale cropped images of objects and were a subset selected from a previous fMRI study (Harel et al., 2014). Both task cues (e.g. ‘Content’) and possible responses (e.g. ‘manmade’ or ‘natural’) were shown as words in white font. Task cues were always presented centrally and possible responses were shown left and right of fixation.
Procedure
Prior to the experiment, participants were familiarized with the task by carrying out 36 randomly chosen trials outside of the MEG. For the actual experiment, participants were seated in an electromagnetically shielded MEG chamber with their head placed in the mold of the dewar while stimuli were backprojected on a translucent screen in front of them (viewing distance: 70 cm, image size: 6 degrees of visual angle). Each trial was preceded by a white fixation cross (0.5 s) that turned green (0.5 s) to prepare participants for the upcoming trial. A trial consisted of three major components: (1) A task cue which indicated the relevant task for the trial, (2) an object stimulus which was categorized according to the task, and (3) a response-mapping screen which indicated the task-relevant response options left and right of fixation (Figure 1C). Based on these components, in the following we separate each trial into three different time periods: a “Task Cue Period”, an “Object Stimulus Period”, and a “Response Mapping Period”. Each trial lasted 5 s. A trial began with the Task Cue Period consisting of the presentation of a task cue (0.5 s) followed by a fixation cross (1.5 s). This was followed by the Object Stimulus Period consisting of the presentation of an object stimulus (0.5 s) followed by another fixation cross (1.0 s). Finally, the trial ended with the Response Mapping Period during which a response-mapping screen was displayed (1.5 s). Participants responded with the left or right index finger using an MEG-compatible response box. In addition to the button press, participants were instructed to make an eye blink during the response period to minimize the contribution of eye blink artifacts to other time periods. The order of the options on the response-mapping screen was intermixed randomly to prevent the planning of motor responses before the onset of the response screen (Hebart et al., 2012).
Participants were instructed to encode the task rule as soon as being presented with the task cue and to apply it immediately to the stimulus. To encourage this strategy, they were asked to respond as fast and accurately as possible. To enforce a faster application of task to object category, we introduced catch trials for which the fixation period between stimulus offset and response-mapping screen onset was shortened from 1.0 s to 0.2 s. The experiment consisted of 20 runs of 36 trials each (32 experimental trials, 4 catch trials).
MEG Recordings and Preprocessing
MEG data were collected on a 275 channel CTF system (MEG International Services, Ltd., Coquitlam, BC, Canada) with a sampling rate of 1,200 Hz. Recordings were available from 272 channels (dead channels: MLF25, MRF43, MRO13). Preprocessing and data analysis were carried out using Brainstorm (version 02/2016, Tadel et al., 2011) and MATLAB (version 2015b, The Mathworks, Natick, MA). The specifics of preprocessing and multivariate decoding (see below) were based on previously published MEG decoding work (Cichy et al., 2014; Grootswagers et al., 2016) and fine-tuned on a pilot subject that did not enter the final data set. MEG triggers were aligned to the exact presentation time on the screen that had been recorded using an optical sensor attached to the projection mirror. Data were epoched in 5.1 s trials, starting 100 ms prior to the onset of the task cue and ending with the offset of the response-mapping screen. Then, data were bandpass filtered between 0.1 and 300 Hz and bandstop filtered at 60 Hz including harmonics to remove line noise.
To further increase SNR and to reduce computational costs, we carried out (1) PCA dimensionality reduction, (2) temporal smoothing on PCA components, and (3) downsampling of the data. For PCA, data were concatenated within each channel across all trials. After PCA, the components with the lowest 1 % of the variance were removed (maximum removal: 50 % of components). All further analyses were conducted on the reduced set of principal components. Then, data were normalized relative to the baseline period (for task decoding: -0.1 to 0 s, for object category decoding: 1.9 to 2.0 s). To this end, for each channel we calculated the mean and standard deviation of the baseline period and subtracted this mean from the rest of the data before dividing it by the standard deviation (univariate noise normalization). Finally, the components were temporally smoothed with a Gaussian kernel of ± 15 ms half duration at half maximum, and downsampled to 120 Hz (621 samples / trial).
Time-resolved Multivariate Decoding
Multivariate decoding was carried out using custom-written code in MATLAB (Mathworks, Natick, MA), as well as functions from The Decoding Toolbox (Hebart et al., 2015), and LIBSVM (Chang and Lin, 2011) using linear support vector machine classification (C = 1). Classification was conducted for each participant separately in a time-resolved manner, i.e. independently for each time point. Each pattern that entered the classification procedure consisted of the principal component scores at a given time point. In the following we describe one iteration of the multivariate classification procedure that was carried out for the example of object category classification. In the first step, we created supertrials by averaging 10 trials of the same object category without replacement (Isik et al., 2014). In the next step, we separated these supertrials into training and testing data, with one supertrial pattern per object category serving as test data and all other supertrial patterns as training data. This was followed by one-vs-one classification of all 28 pairwise comparisons of the 8 object categories (chance-level 50 %). To test the trained classifier on the left-out data, we compared the two predicted decision values and assigned an accuracy of 100 % if the order of the two test samples was predicted correctly and an accuracy of 0 % if the order was the opposite (for two samples and two classes this is mathematically equivalent to the common area-under-the-curve measure of classification performance and represents a classification metric that is independent of the bias term of the classifier). In a last step, the resulting pairwise comparisons were averaged, leading to an estimate of the mean accuracy across all comparisons. This training and testing process was then repeated for each time point. This completes the description of one multivariate classification iteration for the decoding of object category. The procedure for task classification was analogous, with 4 tasks and 6 pairwise combinations. To achieve a more fine-grained and robust estimate of decoding accuracy, we ran a total of 500 such iterations of trial averaging and classification, and the final accuracy time series reflects the average across these iterations. This provided us with time-resolved estimates of MEG decoding accuracy for object category and task classification, respectively.
Temporal Generalization of Task
To investigate whether the nature of task-related information remained stable across time or whether it changed, we carried out cross-classification across time, also known as the temporal generalization method (King and Dehaene, 2014). The rationale of this method is that if a classifier can generalize from one time point to another, this demonstrates that the representational format is similar for these two time points. If, however, a classifier does not generalize, then under the assumption of stable noise this indicates that the representational format is different. To carry out this analysis, we repeated the same approach as described in the previous section, but instead of testing a classifier only at a given time point, we tested the same classifier for all other time points separately. This cross-classification analysis was repeated with each time point once serving as training data, yielding a time–time decoding matrix that captures classifier generalization performance across time.
Model-based MEG-fMRI Fusion for Spatiotemporally-Resolved Information
To resolve task and category-related information both in time and space simultaneously, we carried out RSA-based MEG-fMRI fusion (Cichy et al., 2014, 2016). RSA allows comparison of brain patterns across modalities in terms of pattern dissimilarity, abstracting from the activity patterns of measurement channels (e.g. MEG sensors) to all pairwise distances of those patterns in form of a representational dissimilarity matrices (RDMs). RSA-based MEG-fMRI fusion allows a researcher to ask the following question: At what point in time does the representational structure in a given brain area (as determined from fMRI) match the representational structure determined from the time-resolved MEG signal? The reasoning for this approach is that if the fMRI RDM of a brain region and the MEG RDM of a time point show a correspondence, this suggests that there is a shared representational format in a given brain location and at a given point in time. Here we apply this approach to investigate the spatiotemporal evolution of object category and task representations.
FMRI RDMs for each combination of task and category (4 × 8 = 32 × 32 matrices) were available from five regions of interest (ROIs) in 25 participants who took part in a separate study employing the same task (Harel et al., 2014). None of these participants overlapped with the sample from the MEG study. The major difference between the MEG and the fMRI experiments were (1) that the fMRI study used an extended set of 6 tasks and (2) the exact timing of each trial was slower and jittered in the fMRI study. Details about data preprocessing have been described previously (Harel et al., 2014). RDMs were based on parameter estimates in a GLM for each condition which were converted to t-values (univariate noise normalization). Each entry in the matrix reflects 1 minus the correlation coefficient of the t-values across conditions, calculated separately for each ROI. RDMs were reduced to the relevant four task types. The five ROIs were early visual cortex (EVC), object-selective LO and pFS, lateral prefrontal cortex (lPFC) and posterior parietal cortex (PPC). EVC, LO and pFS were defined based on contrasts in an independent visual and object localizer session, and lPFC and PPC were defined by a combination of anatomical criteria and responses in the functional localizer session to the presence of objects.
For better comparability to this previous study, we created correlation-based MEG pattern dissimilarity matrices for all combinations of task and object category. In particular, for each combination of task and category, we created a mean pattern, yielding a total 32 brain patterns per participant (8 categories × 4 tasks). We then ran a Pearson correlation between all patterns and converted these similarity estimates to dissimilarity estimates (using 1 minus correlation), providing us with a 32 × 32 RDM for each time point and participant.
Since different groups of participants were tested in the fMRI and MEG studies, we used the group average pattern dissimilarity matrices of each modality as the best estimate of the true pattern dissimilarity. These RDMs were symmetrical around the diagonal, so we extracted the lower triangular component of each pattern dissimilarity matrix and converted them to vector format for further analyses, in the following referred to as representational dissimilarity vector (RDV).
For a given brain region, we conducted MEG-fMRI fusion by calculating the squared Spearman correlation between the fMRI RDV and the MEG RDV for each time point separately. The squared correlation coefficient is mathematically equivalent to the coefficient of determination (R2) of the fMRI RDV explaining the MEG RDV. This approach was repeated for each fMRI RDV of the five ROIs, providing us with five time courses of representational similarity between MEG and fMRI.
While MEG-fMRI fusion provides a temporal profile of representational similarities for a given brain region, these MEG-fMRI fusion time courses do not distinguish whether MEG-fMRI representational similarities reflect task, object category, or a mixture of the two. To disentangle task and object category-related information with MEG-fMRI fusion, we extended this approach by introducing model RDMs of the same size (32 × 32). These RDMs reflected the expected dissimilarity for the representation of task and category, respectively, with entries of 1 for high expected dissimilarity (different task / category) and 0 for low expected dissimilarity (same task / category). This model-based MEG-fMRI fusion approach was carried out using commonality analysis (Seibold and McPhee, 1979), a variance decomposition approach that allows estimating the shared variance between more than two variables (see Greene et al., 2016, for a similar approach using multiple model RDMs). For a given brain region and time point, these variables reflect (1) an MEG RDV, (2) an fMRI RDV and (3) the two model RDVs for task and object category representations.
A schematic of this model-based MEG-fMRI fusion is shown in Figure 5A. We conducted commonality analysis by comparing two squared semi-partial correlation coefficients (Spearman correlation), one reflecting the proportion of variance shared between MEG and fMRI partialling out all model variables excluding the variable of interest (e.g. task) from fMRI, and the other reflecting the proportion of shared variance when partialling out all model variables from fMRI including this variable of interest. The difference between both coefficients of determination (R2) then provides the commonality, which is the variance shared between MEG and fMRI that is uniquely explained by the variable of interest. Formally, the commonality at time t and location j can be described as: where X reflects MEG, Y reflect fMRI, A reflects task, and B reflects object category. Note that this variable can become slightly larger than the total R2 or slightly negative, due to numerical inaccuracies or the presence of small suppression effects (Pedhazur, 1997). In addition, commonality coefficients always reflect the shared variance relative to a target variable (in our case MEG), but depending on the relationship between the variables the estimate of shared variance can change when a different target variable is used (in our case fMRI). In the present study, the pattern of results was comparable irrespective of which variable served as a target variable.
Statistical Testing
Throughout this article, we used a non-parametric, cluster-based statistical approach to test for time periods during which the group of participants showed a significant effect (Nichols and Holmes, 2002), and bootstrap sampling to determine confidence intervals for peak latencies and peak latency differences. We did not compute statistics in time periods after the onset of the response-mapping screen, because (1) these time periods were corrupted by the instructed eye blinks and (2) information about task is contained in the response-mapping screen, making it difficult to uniquely assign these responses to task or response-mapping screen. For object category-related responses we did not compute statistics for time periods prior to the onset of the object stimulus, because it was not reasonable to assume that these periods would contain information about the category before its identity is revealed. For completeness, however, we plot these results in Figures 2 and 3. Please note that the pattern of results reported is very similar when including these time periods into our statistical analyses.
Non-parametric Cluster-based Statistical Approach
The cluster-based approach consists of two steps: first defining clusters as neighboring time points that all exceed a statistical cutoff (cluster-inducing threshold), and second determining significant cluster size.
For the first step – the definition of a cluster-inducing threshold – we ran a sign-permutation test. The null hypothesis is that any directional effect in the given sample of participants simply came about by chance. To test whether our sample was part of this null distribution, for each time series of accuracies we created all possible sign-permutations of measured accuracy values (217 = 131,072), and for each of these permutations generated a t-value for each time point. The permutations provided a null distribution of t-values of the group effect for each time point. The cluster-inducing threshold was defined as the 95th percentile of the distribution at each time point (equivalent to p < 0.05, one-sided).
For the second step – determining significant clusters – we determined the maximum cluster size in each permutation (Nichols and Holmes, 2002). Using this null distribution, we determined whether candidate clusters in the original time series exceeded the 95th percentile of maximum cluster sizes (equivalent to p < 0.05, one-sided). This provided us with significant clusters at the pre-specified statistical cutoffs.
For temporal generalization matrices, we extended the cluster-based approach described above to two dimensions, revealing significant 2D clusters. Because of computational limitations, we ran only a subset of 10,000 permutations drawn randomly without replacement from all available permutations.
For model-based MEG-fMRI fusion, we used the two-step approach as described above. However, instead of running a sign-permutation test across participants, we conducted a randomization test for which we created 5,000 MEG similarity matrices for each of the five ROIs. These matrices were based on random permutations of the rows and columns of the group average MEG similarity matrix (Kriegeskorte et al., 2008). We then carried out model-based MEG-fMRI fusion using these matrices to create an estimated null distribution of information time courses for each ROI. For each time point in each ROI, a cluster-inducing threshold was determined by choosing the 95th percentile of this estimated null distribution (equivalent to p < 0.05, one-sided). This was followed by determining the maximum cluster sizes across all permutations as described above, but across all ROIs to correct for multiple comparisons (equivalent to p < 0.05, one-sided, corrected for multiple comparisons across ROIs).
Determining Confidence Intervals for Peak Latencies
We used bootstrap sampling to estimate the 95 % confidence intervals (CI) of peak latencies and peak latency differences, respectively. For each iteration of the bootstrap sampling approach, we calculated a time course based on the bootstrap sample. For multivariate decoding analyses, this was a time course of accuracy from an average of n=17 time courses of decoding accuracy sampled with replacement from the pool of subjects. For MEG-fMRI fusion, this was a time course of commonality coefficients, generated by sampling n=17 time courses of MEG similarity matrices from the pool of subjects with replacement, averaging them, and repeating the model-based MEG-fMRI fusion approach as described above. For each bootstrap sample time course, we then calculated timing estimates in the relevant time periods (for peak latency: timing of maximum, for peak latency difference: time difference between maxima). This process was repeated (100,000 times for multivariate decoding and 5,000 times for MEG-fMRI fusion), which generated a distribution of timing estimates. The 2.5 and 97.5 percentiles of this distribution reflect the 95 % confidence interval of the true timing estimate. Since we downsampled our data (bin width: 8.33 ms), the confidence intervals were conservative and overestimated by up to 16.67 ms.
Results
Behavioral Results
Participants provided correct responses on average on 97.19 % of trials (SD: 2.40) and had a mean response time of 712.2 ms (SD: 121.8), with no significant differences between tasks (accuracy: F(3,48) = 0.6938, RT: F(3,48) = 0.039) or between object categories (accuracy: F(7,112) = 0.6024, RT: F(7,112) = 0.5431). On average, participants missed responses or responded too slowly (RT > 1,600 ms) in only 1.80 % of all trials (SD: 2.26). We included all trials in further analyses because (1) imbalances in the number of trials per condition would complicate multivariate analyses, requiring subsampling which would make the results less sensitive, and (2) there were no differences in accuracy or RT across tasks or objects, making it unlikely that the results would be biased by including a very small fraction of missed or incorrect trials.
Time-resolved Representation of Task Context and Object Category
The primary aim of this study was to characterize the temporal evolution of task context representations in the human brain and elucidating at what time task context affects object category processing. We thus separately analyzed task and object category-related brain signals using time-resolved multivariate decoding across the trial (see Figure 1C). This allowed us to describe and compare the temporal profiles of the two resulting classification time courses, one for object category averaged across task, and one for task averaged across object category.
In the following, we describe and report results from the “Task Cue Period” (0 to 2,000 ms) from onset of the task cue to onset of the object stimulus, and the “Object Stimulus Period” (2,000 to 3,500 ms) from onset of the object stimulus to onset of the response screen. We did not statistically analyze the ensuing “Response-Mapping Period” (3,500 ms to 5,000 ms), because it was contaminated by instructed blinks and response screen-related processes (see Materials and Methods, Statistical Testing). For completeness, we visualize results from this Response-Mapping Period in Figures 2 and 3.
Task Cue Period
Task-related information rose rapidly in response to the task cue, peaking around 100 ms (peak of group mean: 100 ms, 95 % CI: 96-121 ms). This was followed by a slow decay of information that remained significantly above chance until ~1,200 ms after cue presentation, outlasting the offset of the word cue by ~700 ms. Around 1,800 ms (~200 ms prior to onset of the object stimulus) task information was again found to be significantly above chance. As expected, during this time period – prior to the presentation of the object stimulus – classification of object category was at chance.
Object Stimulus Period
After onset of the object stimulus at 2,000 ms, object category information increased sharply, peaking around 100 ms later (peak of group mean: 2,104 ms, 95 % CI: 100-108 ms). This was followed by a gradual decline that remained significantly above chance until the onset of the response-mapping screen at 3,500 ms. This rapid increase in category-related information was accompanied by a slow rise of task-related information peaking around 600 ms after object onset (peak of group mean: 2,638 ms, 95 % CI: 2,517-2,825 ms). Information about task then remained well above-chance until the presentation of the response-mapping screen.
Together, these results show that both the presentation of task cue and object stimulus lead to rapid cortical processing. In addition, they suggest that maintained or reactivated task-related information that is present before object onset becomes gradually and increasingly relevant during object category processing.
Multiple Stages of Task Processing Revealed by Temporal Generalization Analysis
The decoding of task at different time points as described above characterizes the temporal progression of task-related information across the trial. However, these results alone do not allow distinguishing whether the decoding of task reflects a single or a sequence of multiple cognitive processes across time. There are three pertinent candidates that might explain task decoding at different time periods. For one, early decoding of task after task cue onset may reflect an early visual representation of the task cue that is maintained in short-term memory and accessed when the object stimulus appears in order to carry out the task. Alternatively, the task representation during object processing may reflect an abstract representation of the participant’s choice that has been formed after initial visual and semantic processing of the task cue. Finally, the visual information about the task cue may reflect a more abstract representation of task rule that is maintained and applied to the object stimulus representation.
To characterize the processing stages of task, we conducted a temporal generalization analysis using multivariate cross-decoding that reveals the temporal evolution of representational formats of task (Meyers et al., 2008; King and Dehaene, 2014). To this end, we trained a classifier at each time point during the trial to distinguish the four different tasks and then tested it at all other time points, providing us with a time × time temporal generalization matrix. The shape of the results informs about similarities and differences in the way information is represented across time. If the results show classification that generalizes to all time points, i.e. if the entire matrix carries information, this indicates that task information is maintained in a similar format across the entire trial. If the results show very little generalization across time, i.e. above-chance accuracies only very close to the diagonal, this indicates that task representation is highly dynamic across the trial. And if there are several quadratic blocks of cross-classification around the diagonal, this indicates that task representations change abruptly in different periods of the trial.
The temporal cross-classification analysis revealed multiple separate, but partially overlapping stages of processing after the onset of the task cue (Figure 3A). At a coarse level, the temporal generalization matrix exhibited a block structure within the Task Cue Period and Object Stimulus Period (Within-Period Cross-Decoding, Figure 3B, left panel). This indicates a shared representational format within each time period of the trial, but a largely different representational format between those time periods, and an abrupt change in the representational format of task after onset of the object stimulus. Since visual and semantic representations of the task cue are likely to emerge early in the Task Cue Period, this result speaks against a visual or semantic representation during the Object Stimulus Period. However, this separation was not complete: some time points exhibited cross-decoding between time periods (Between-Period Cross-Decoding, Figure 3B, middle and right panel), as evidenced by the off-diagonals of the generalization matrix (i.e. training time 0 to 2,000 ms, testing time 2,000 to 3,500 ms, and vice versa). This indicates a partially shared representational format between these periods of the trial.
Next we focused on the fine-grained dynamics of task processing in the Task and Object Cue Period, respectively. During the Task Cue Period (0 to 2,000 ms) the results revealed a block of increased cross-classification lasting from ~100 to ~600 ms after cue onset. This block likely reflects the time the task cue was available to visual cortex (accounting for the delay between stimulus onset and cortical processing). After offset of the task cue at 500 ms and prior to the onset of the object stimulus at 2,000 ms, information continued to be present and was found to generalize to other time points in the Task Cue Period. This reinforces the notion that information about task was actively maintained throughout the Task Cue Period, as suggested by the time-resolved decoding analysis presented above. This short-term memory representation of task exhibited temporal generalization to most time points of the Task Cue Period, including the visual presentation of the task cue, indicating that the short-term representation of task included visual and semantic properties of the task cue. During the Object Stimulus Period, there was a gradual build-up of task-related information until ~200 ms after object onset. At that point, the results exhibited high levels of cross-classification until the onset of the response-mapping screen, indicating a maintained representation of task context that did not change until the onset of the response mapping screen.
Significant cross-classification between the Task Cue and Object Stimulus Period was evident in two distinct phases. First, there was generalization from the Task Cue Period to the first ~200 ms of the Object Stimulus Period (training time ~300 to 2,000 ms, testing time 2,000 to ~2,200 ms, Figure 3B, middle panel), possibly reflecting a maintained short-term representation that continued until the task rule could be applied to the object. Second, there was generalization from the end of the Task Cue Period to the Object Stimulus Period (training time ~1,500 to 2,000 ms, testing time 2,000 ms to ~3,300 ms, Figure 3B, right panel), indicating that the short-term memory representation of task was similar to the representation during application of the task rule to the object. Interestingly, this cross-classification was specific to the late short-term memory representation and did not generalize to other time points of the Task Cue Period. Note that this cannot be explained by a representation of the correct response, because participants could not know the correct response during this short-term memory representation prior to the presentation of the object. Together, this pattern of results suggests that the representation of task during the Object Stimulus Period likely does not reflect visual or semantic processing of the task cue (which would predict cross-classification from the early Task Cue Period); nor does it reflect only a representation of the correct response. Rather, the results indicate that participants form an abstract representation of task rule during the short-term retention period prior to object onset, which they apply to the object stimulus when it is presented.
In summary, the results indicate separate, but overlapping stages of task context processing. These stages likely reflect a cascade of visual, semantic, and mnemonic processes including an abstract representation of task rule which during object processing is converted into an abstract behavioral choice reflected in the participant’s response.
Effects of Task Context on Object Category Representations
The robust decoding of task context that increases during object processing raises the question whether the task context representation is independent of object processing, or whether task context influences object category representations. If object category processing is influenced by task type, one prediction is that object information time courses would be different for different task types. To investigate this question, we conducted time-resolved multivariate decoding of objects separately for perceptual and conceptual task types and compared the time courses. The results of this analysis are presented in Figure 4A. The overall time course of object decoding was very similar for conceptual and perceptual tasks as compared to that reported for object decoding across tasks, as expected (see Time-resolved Representation of Task Context and Object Category and Figure 2): accuracies increased sharply after stimulus onset, followed by a gradual decline, dropping back to chance level towards the end of the Object Period. Comparing the decoding curves for conceptual and perceptual tasks directly revealed higher accuracies for conceptual tasks. This result reveals that task context affects object representations late in time.
In addition to these quantitative differences in object category representations across task types, we investigated whether the object representations were qualitatively similar but differently strong (more separable patterns), or whether they were qualitatively different across task types (different patterns). To this end, we compared object category classification within task to object category classification between tasks. The rationale of this approach is that if the between task cross-classification accuracy is lower than the within task accuracy, this demonstrates that the classifier cannot rely on the same source of information in these two conditions, i.e. the patterns must be qualitatively different between tasks. The results of this analysis are shown in Figure 4B. We did not find any differences in object decoding accuracies within vs. between task types, indicating that the object category-related patterns were qualitatively similar across time.
Together, our results reveal that task context affects object representations late in time and provide further evidence for the notion that task affects the strength of object category representations.
Model-based MEG-fMRI Fusion for Spatiotemporally-Resolved Neural Dynamics of Task and Object Category
To investigate the cortical origin of the task and object category-related effects, we carried out MEG-fMRI fusion based on representational similarity analysis (Cichy et al., 2014, 2016). We calculated time-resolved MEG representational dissimilarity matrices (RDMs) for all combinations of task and category and compared them to fMRI RDMs derived from brain activity patterns from five ROIs of a previously-published study employing the same task (Harel et al., 2014). Similarity between an fMRI ROI RDM and MEG RDMs indicates a representational format common to that location and those time points. To separately assess the contributions of object category and task to the representational similarity between MEG and fMRI, we decomposed the shared variance between MEG and fMRI RDMs using commonality analysis (Seibold and McPhee, 1979). This procedure identifies the portion of variance shared between MEG and fMRI that is unique to either task of category (Figure 5A). The task and category model RDMs were constructed based on the expected dissimilarity matrix for task irrespective of category, and for category irrespective of task, respectively.
The results of this model-based MEG-fMRI fusion are shown in Figure 5B-F separately for each ROI. The grey shaded area indicates the amount of variance captured by MEG-fMRI fusion. Blue and red lines indicate the amount of variance in the MEG-fMRI fusion uniquely explained by the task and object category model respectively.
In all ROIs and at most time points either the task or object category models explained the majority of the shared variance between MEG and fMRI, as indicated by the close proximity of the colored lines to the upper boundary of the grey shaded area. This result demonstrates that task and category model RDMs are good models for describing the observed spatio-temporal neural dynamics.
All regions carried information about task context and object category at some point throughout the trial, indicating overlapping representations of task and object category distributed across cortical location and time. However, regions differed in the predominance and mixture of the represented content. Both PPC and lPFC were clearly dominated by effects of task context, with much weaker object category-related commonality coefficients present in these areas. These regions exhibited high task-related commonality coefficients both during the Task Cue Period and the Object Stimulus Period. Interestingly, PPC exhibited significant task-related commonality coefficients throughout the short-term retention period that were not found in lPFC, which may speak towards a different functional role of these regions in the retention of task rules.
In contrast to frontoparietal regions, occipitotemporal regions EVC, LO and pFS generally exhibited weaker but significant task-related commonality coefficients than PPC and lPFC. All three regions displayed significant task-related commonality coefficients in the Task Cue Period. Interestingly, in the Object Stimulus Period all three regions exhibited a mixture of task and object-category related commonality coefficients, indicating the concurrent encoding of task and object category in these brain areas. Moreover, the relative size of task-related commonalities increased gradually from EVC through LO to pFS, indicating an increasing importance of task encoding when progressing up the visual hierarchy. In all five regions, after onset of the object stimulus category-related commonality coefficients peaked earlier than task-related commonality coefficients (all p < 0.05, based on bootstrap CI for differences in peaks), mirroring the results of the time-resolved multivariate decoding analysis.
Together, we found that the spatiotemporal neural dynamics as revealed by model-based MEG-fMRI fusion predominantly reflected task or object processing, with systematic differences across cortical regions: While PPC and lPFC were dominated by task and PPC carried task information throughout the Task Cue Period, EVC, LO and pFS exhibited a mixture of task and category-related information during the Object Stimulus Period, with relative increases in the size of task-related effects when moving up the visual cortical hierarchy.
Discussion
We used MEG and time-resolved multivariate decoding to unravel the representational dynamics of task context, object category, and their interaction. Information about task was found rapidly after onset of the task cue and throughout the experimental trial, which was paralleled by information about object category after onset of the object stimulus. Temporal cross-decoding revealed separate and overlapping task context-related processes, suggesting a cascade of representations including visual and semantic encoding of the task cue, the retention of the task rule, and its application to the object stimulus. Investigating the interaction of task context and object category, we found evidence for late effects of task context on object category representations, with differerences in the strength rather than the quality of category-related MEG patterns. Finally, model-based MEG-fMRI fusion revealed that parietal and frontal regions were dominated by effects of task, whereas occipitotemporal regions reflected a mixture of task and object category representations following object presentation, with relative increases in task-related effects over time and along the visual cortical hierarchy.
Representational Dynamics of Task Context
While previous fMRI studies investigated the cortical location of task effects on visual object processing (Harel et al., 2014; Erez and Duncan, 2015; Bracci et al., 2017; Bugatus et al., 2017), they could not provide insight into the temporal dynamics of task context. By manipulating task context on a trial-by-trial basis we were able to (1) map out the temporal evolution of task context effects across different stages of the trial, (2) uncover different stages of processing using temporal generalization analysis, and (3) localize task context-related information to different regions of the brain using model-based MEG-fMRI fusion.
The results from multivariate decoding and temporal generalization analyses indicate that following initial encoding of visual and semantic information about task cue (Task Cue Period), there was a weak but consistent short-term memory representation of this information, paralleled by a representation of the task rule. Temporal generalization analysis additionally revealed that after onset of the object stimulus (Object Stimulus Period) the task representation changed abruptly. This result is in line with previous work in non-human primates (Sigala et al., 2008; Stokes et al., 2013) demonstrating largely distinct representations of different task phases in prefrontal cortex. Since the representation of task after object onset did not generalize to early time periods during the initial processing of the task cue, this indicates that during object category processing task context is likely not represented in a purely visual or semantic format. Instead, our temporal generalization results suggest that at least part of the task-related information after object onset reflects a representation of task rule that is applied to the visually-presented object stimulus (Wallis et al., 2001; Stoet and Snyder, 2004; Bode and Haynes, 2009; Woolgar et al., 2011). This interpretation is in line with a recent working memory study reporting a reemergence of task rule-related MEG patterns during stimulus presentation (Peters et al., 2016).
Of note, the representation of task context in monkey prefrontal cortex has been shown to be even more dynamic than described above and not to generalize at all between different periods of the task (Stokes et al., 2013). Since our results demonstrate phases of cross-classification between these time periods, this suggests that in the present study the source of the cross-classification between these periods of the task may not originate from prefrontal cortex, but from other brain regions such as posterior parietal cortex. Indeed, this interpretation is supported by our MEG-fMRI fusion results that show no significant prefrontal representations of task context during the delay period prior to the onset of the object stimulus, but a representation of task in posterior parietal cortex.
Differential Involvement of Frontoparietal and Occipitotemporal Brain Areas in Task and Object Category Representations
Previous research has suggested a dominance of parietal and prefrontal cortex in representing task context (Duncan, 2010; Woolgar et al., 2011), while the processing of object category has been attributed to occipitotemporal cortex (Grill-Spector et al., 1999; Kravitz et al., 2010; Cichy et al., 2011). More recently, this view has been challenged: First, object category representations have been found – with some dependence on task context – in both parietal (Konen and Kastner, 2008; Jeong and Xu, 2016; Bracci et al., 2017) and prefrontal cortex (Harel et al., 2014; Erez and Duncan, 2015; Bracci et al., 2017). Second, there is some evidence for task context effects in occipitotemporal cortex, although the extent of such effects remains debated (Harel et al., 2014; Erez and Duncan, 2015; Lowe et al., 2016; Bracci et al., 2017; Bugatus et al., 2017), and the time course of any such effects has remained elusive.
Our model-based MEG-fMRI fusion results provide a nuanced spatiotemporal characterization of task and object category representations in frontoparietal and occipitotemporal cortex. Frontoparietal cortex was strongly dominated by task context, with much weaker representations of object category. This result reinforces the notion that the dominant role of frontoparietal cortex is the representation of task, with a secondary role in representing object category. In contrast, in occipitotemporal cortex, responses reflected a mixture of object category and task-related effects after object onset, with an increasing dominance of task over time and along the visual cortical hierarchy from low- to high-level visual cortex (EVC, LO, pFS). These results reveal that both task and object category are encoded in parallel in the same regions of occipitotemporal cortex and suggests an increasing role of task context in high-level visual cortex.
The finding of parallel effects of category and task suggests an interaction of task context and object category already in occipitotemporal cortex. This result contrasts with the view of a “passive” role of occipitotemporal cortex in the processing of object category, according to which object representations are read out by prefrontal cortex (Freedman et al., 2003). Instead, our results suggest that task biases late components of object processing along occipitotemporal cortex, an influence that may originate in brain regions strongly dominated by task in frontoparietal cortex (Waskom et al., 2014). In addition, our results suggest that this influence may increase along the visual cortical hierarchy. Indeed, pFS but not EVC or LO was found to represent task context immediately prior to object onset, suggesting that task context has the potential to affect the early stages of visual processing through a top-down bias. This bias may reflect a task-specific modulation of the representational strength of task-relevant object features after object onset. The concurrent representation of both task and category in the same brain region may be beneficial for optimizing the tuning of categorical brain responses to the demands of the task.
Interaction of Task Context and Object Category
The direct investigation of the temporal dynamics of task context and object category interactions revealed three key findings. First, we found that differences in object category processing between low-level perceptual and high-level conceptual tasks emerged late in time, suggesting a late top-down modulation of object processing after initial object processing has been completed, arguing against an early expectation-related modulation of feedforward processing. This result is consistent with a previous EEG study using natural images in an animal and vehicle detection task, finding an initial category-related signal followed by a late task-related response signaling the presence of a target stimulus (VanRullen and Thorpe, 2001). Similarly, a more recent MEG study (Ritchie et al., 2015) reported results for visual category processing in two different tasks (object categorization vs. letter discrimination) that are indicative of late differences in task-dependent stimulus processing. Overall, these combined results suggest that task representations affect late, rather than early processing of visual information.
Second, object category-related information leveled off more slowly for conceptual than perceptual tasks, indicating different neural dynamics for different task types. This suggests that for conceptual tasks encoding and maintenance of object category may be beneficial for carrying out the task, in contrast to perceptual tasks that do not necessitate categorical representations. While differences in the difficulty of the tasks could account for this pattern of results, we found no differences in response times or accuracy for the different tasks, arguing against the relevance of task difficulty. In support of this view, a previous study emploing a speeded version of the same tasks and categories found no differences in response times between tasks (Harel et al., 2014).
Finally, while task context affected the separability of object-related MEG patterns between task types, object classification showed no reduction in classification accuracy between task contexts, indicating that the overall structure of those patterns did not change. This result contrasts with our prior study demonstrating qualitatively different object-related patterns in lateral prefrontal and high-level object-selective cortex (Harel et al., 2014). However, the contribution of multiple brain regions to the MEG response may be masking an interaction between object category and task context. Indeed, our MEG-fMRI fusion data suggest that both task context and object category are being processed in parallel in pFS, although future work with independent data will be needed to resolve this issue.
Conclusions
Our results suggest that task is represented in multiple distinct yet overlapping processing stages and that the impact of task context during object processing occurs late in time. Our MEG-fMRI fusion results support the view of strong task-related responses in frontoparietal regions, while demonstrating the concurrent processing of both task and object category in occipitotemporal regions. Our findings provide a nuanced spatiotemporally-resolved view of task processing throughout the human cerebral cortex.
Acknowledgements
We would like to thank Maryam Vaziri-Pashkam for helpful discussions and Matthias Guggenmos and Edward Silson for comments on our manuscript. This work was supported by the Intramural Research Program of the National Institutes of Health (ZIA-MH-002909) - National Institute of Mental Health Clinical Study Protocol 93-M-0170, NCT00001360, the German Research Foundation (Emmy Noether Grant CI241-1/1), and a Feodor-Lynen fellowship of the Humboldt Foundation to M.N.H.
Footnotes
Conflict of Interest: The authors declare no competing financial interests.