Abstract
Task episodes consist of sequences of steps that are performed to achieve a goal. The current study used fMRI to examine which regions of the brain represent full episodes, items, and current step. Participants learned 6 tasks each consisting of 4 steps. Inside the scanner, participants were cued which task to perform and then sequentially identified the target item of each step in the correct order. The multiple demand (MD) network and the visual cortex exhibited phasic responses to each task step, suggesting that they are sensitive to the fine structure of the episode. In contrast, default mode (DMN) regions showed a phasic response predominantly to onset of the entire task episode. Beyond these phasic responses, gradually increasing activity across each task episode was seen throughout most of the brain. Representational similarity analysis of episode and item coding revealed a significant dissociation between MD and DMN networks. Compared to MD regions, which showed strong coding of individual items but not the entire episode, the DMN showed representation of both item and episode, with coding for the episode localized to the parahippocampal cortex. The data hint that the most abstract level of task structure may be encoded in medial frontal cortex.
A central feature of purposeful everyday behavior is the retrieval of learned sequences of events from memory (Hsieh and Ranganath 2015) to guide our current actions. This involves parcellating a main goal (e.g., “make a stew”) into smaller achievable steps (e.g., “take food from fridge”→“wash vegetables”→ “chop vegetables” → “cook on stove”) to allow progression towards the goal (Penfield and Evans 1935; Cooper and Shallice 2000; Farooqui et al. 2012). We call these temporally organized sequences of steps that occur within a given context “task episodes”. A key aspect of these task episodes is the control of extended episodes of behavior as one unit, and not as a collection of independent acts (Schneider and Logan 2006; Duncan 2010; Farooqui and Manly 2018a). Whenever a step is completed, its specific content loses relevance, but higher level task representations of the full episode must remain in behavioral control (Farooqui et al. 2012; Farooqui and Manly 2018a). This raises the question of how different brain regions work together to execute the current step of the task while keeping the overall goal in mind.
Previous literature has highlighted the importance of a set of frontal and parietal regions, known as the multiple demand (MD) network (Duncan and Owen 2000), in executing complex mental programs (Duncan 2010, 2013; Farooqui et al. 2012). It has been proposed that the MD network plays a key role in defining and controlling parts of task episodes, allowing goals to be achieved by decomposition into a structure of subgoals (Kurby and Zacks 2008; Farooqui et al. 2012; Duncan 2013). The MD network is well suited for focusing on specific contents of a current cognitive operation, dynamically encoding information relevant to a current decision (Asaad et al. 2000; Everling et al. 2002; Li et al. 2007; Woolgar et al. 2011; Stokes et al. 2013), and radically changing the pattern of activity across successive task steps (Sigala et al. 2008; Duncan 2010). In particular, Farooqui et al. (2012) investigated the role of MD activity in task episodes requiring a series of target detection steps. The authors found that target detections that completed the entire task episode elicited the greatest MD activity, followed by those completing a subtask, and finally steps within one subtask. As MD activity depended on task completion, it was suggested to be involved in directing and revising the control representations of each step of the episode.
The ability to organize sequences of events within a given context has also been a key topic in the study of episodic memory (Ezzyat and Davachi 2011; Eichenbaum 2013; Hsieh et al. 2014; Cohn-Sheehy and Ranganath 2017; Radvansky and Zacks 2017). Tulving’s original definition emphasized the importance of temporal events: “Episodic memory receives and stores information about temporally dated episodes or events, and temporal-spatial relations among these events (Tulving, 1972, p.385).” Event segmentation theory (Zacks and Tversky 2001; Zacks and Swallow 2007; Radvansky and Zacks 2017) proposes that humans can segment incoming information into temporal parts that are meaningfully related to the current situation. When important situation features change, the current event model is updated and experienced as an event boundary. Neuroimaging studies have found brain regions sensitive to event boundaries to overlap with areas associated with episodic memory retrieval including regions in the default mode network (DMN; Zacks et al. 2001; Speer et al. 2007; Ben-Yakov et al. 2014; Richmond and Zacks 2017; Baldassano et al. 2018). Furthermore, it has been suggested that these dynamics within the DMN may reflect the underlying meaning of the episode rather than simple stimulus changes (Radvansky and Zacks 2017), as coarse segmentation elicited greater DMN activity than fine-grained segmentation (Speer et al. 2007). Consistent with this observation, the DMN network has been implicated in higher level cognition at a broader scale, such as encoding of schemas (Robin and Moscovitch 2017), situation models (Reagh and Ranganath 2018), and cognitive contexts (Crittenden et al. 2015). In a study of topographic mapping of a hierarchy of temporal receptive windows (TRW), participants listened to a story scrambled at the time scales of words, sentences, and paragraphs (Lerner et al. 2011). Results showed that early sensory regions were driven by incoming sensory input and were similarly responsive in all conditions; however, MD regions exhibited intermediate TRWs, whereas DMN regions were at the apex of the TRW hierarchy, such that they responded reliably only when intact paragraphs were heard in a meaningful sequence. This evidence suggests that the DMN is well suited to representing task episodes over an extended timescale.
Although various brain networks have been implicated in the execution of task episodes, to our knowledge, no study has contrasted the roles of MD and DMN in representing different aspects of a task episode. In the current study, we aimed to examine which brain regions are involved in coding of information at various levels of abstraction within a single task: individual steps, including their content and position within an episode, whole episodes, and groups of related episodes. Prior to the experiment, participants learned 6 everyday task episodes (3 kitchen tasks and 3 bathroom tasks) that each consisted of 4 steps. Inside the scanner, participants carried out a continuous “execution” task, in which, after being cued which task to perform, they sequentially identified the target item of each step in the correct order. This design allowed us to examine which brain regions represent rooms (e.g., kitchen), full episodes (e.g., “make a stew”), items within the episode (“take food from fridge”), and current position in the episode (e.g., 1st step). We hypothesized that different regions of the brain would be sensitive to different levels of the temporal task hierarchy. We first focused on the MD and DMN networks as a priori regions of interest. We hypothesized that the MD network would be especially involved in moment-to-moment control (Duncan 2010, 2013; Farooqui et al. 2012); whereas the DMN would be especially involved in representation of full episodes (Lerner et al. 2011; Reagh and Ranganath 2018). In addition to these pre-defined networks, we examined information coding using a whole brain searchlight to localize relevant brain regions at a finer scale, both within and beyond the a priori networks. The visual cortex encodes physical properties of visual stimuli, and therefore we expected it to be involved in item coding along with the MD system. It has been suggested that both the MD network (Dosenbach et al. 2006, 2007) and the DMN (Andrews-Hanna 2012; Ranganath and Ritchey 2012) can be divided into finer components or subsystems. One region in the DMN, the medial prefrontal cortex (mPFC) has been implicated in schema representation, capturing similarities across particular episodes at a higher level (Ghosh and Gilboa 2014; Robin and Moscovitch 2017). Another set of regions in the DMN is the posterior medial network (Ranganath and Ritchey 2012; Reagh and Ranganath 2018), including the parahippocampal cortex (PHC), proposed to encode “situation models”. A situation model is conceived as a higher-level cognitive representation of relationships between different elements of an episode. We hypothesized that selective activity in the posterior medial network and/or mPFC might encode higher aspects of task structure, including episode and/or room. We used both univariate finite impulse response (FIR) models to characterize the temporal evolution of activity though the extended episode, and representational similarity analysis (RSA) to investigate coding of cognitive representations of task structure and content.
Methods
Participants
43 participants (22 male, 21 female; ages 18-39, mean = 26.54, SD = 4.93) were included in the experiment at the MRC Cognition and Brain Sciences Unit. An additional 18 participants were excluded (2 participants were discovered to have cysts, 1 participant lost several slices due to poor bounding box positioning, 9 were excluded due to poor behavioral performance with accuracies more than three scaled median absolute deviations below the median, and a further 6 were excluded due to excessive head motion > 5 mm). All participants were neurologically healthy, right-handed, with normal or corrected-to-normal vision. Procedures were carried out in accordance with ethical approval obtained from the Cambridge Psychology Research Ethics Committee, and participants provided written, informed consent before the start of the experiment.
Stimuli and task procedures
The study consisted of a learning session outside the scanner and an execution session in the scanner. During the learning session, participants learned 6 everyday task sequences (“episodes”) each based in one of two locations (“rooms”; 3 kitchen and 3 bathroom). Each episode consisted of 4 ordered “steps”. For example, the episode “make a stew” consisted of the steps “take food from fridge”, “wash vegetables”, “chop vegetables”, “cook on stove”. Each step was associated with a unique image (“item”). The complete set of stimuli is shown in Figure 1.
In the learning session, participants viewed the names and images of the steps of each task episode in sequential order. The step images were presented simultaneously with a background image corresponding to the room they occur in (kitchen or bathroom). The learning was self-paced, in separate runs for each room. Within each room, each task sequence was presented three times, and each item within the sequence was presented until the participant decided to move on to the next item. There was a 1.5 s inter-stimulus interval between items. After viewing all six sequences, participants were tested for their memory of the task episodes by (1) sorting picture cards representing all steps of the six task episodes into the correct sequences, and (2) completing a pen-and-paper test in which they were asked to write down the names of the steps in the correct order for each task episode. Most participants performed these two tests without error. A few participants made a mistake on 1-2 items but were able to correct their answers after being told they make a mistake. The tests ensured participants had memorized the specific step sequence of each task. Before entering the scanner, participants practiced a shortened version of the main experiment, containing one trial of each task episode. During scanning, participants performed two runs of the experiment, interleaved with shorter runs (∼5 minutes) of a localizer task that was not analyzed and is not described further.
Figure 2 illustrates the structure of the task episodes paradigm. At the start of each 45 s episode, participants were presented with a cue (e.g., “make a stew”) for 1 s, indicating which task to complete. This was followed by a fixation period lasting between 1.5 – 7.5 s before the onset of the first step. On each step, participants had to perform three visual searches. On each search, an array of 4 images was presented in a horizontal row (total left to right visual angle approximately 12.6°). These included (randomly ordered from left to right): the correct image (“target”) corresponding to the current task step; a distractor image representing an incorrect step from the correct task; a distractor representing the correct step but from an incorrect task; and an additional distractor from the same incorrect task. To ensure that each display contained 2 images from each room, incorrect-task distractors always came from the alternative room to the current task. The array remained for 2 s, and within this time, the participant’s task was to indicate the position of the target image using a 4-choice keyboard with their right hand. A 1 s fixation interval preceded onset of the next search array. Each step thus lasted for 9 s, with the participant selecting the same target in each of three search events, to allow separation of the hemodynamic response to successive task steps. At the end of the third search event, a 0.2 s presentation of the words “STEP COMPLETED” indicated the completion of that step, and was followed by a 0.8 s fixation interval. Without further cueing, the participant then moved on to the next task step. After completing the last of the 4 steps, a fixation interval of 0.5 – 6.5 s was presented before the onset of the cue for the next task. The total interval between the last step of the previous task and the first step of the next task was fixed at 9 s. Each run consisted of 36 task episodes (with an additional dummy episode to start), constructed so that each task appeared following each possible preceding task once. Task ordering was chosen before the start of each run by calculating the design efficiency (Dale 1999) of all pairwise contrasts between tasks. 1000 task orders were simulated, and the most efficient one was chosen. Each of the two runs lasted ∼28 min.
fMRI data acquisition and preprocessing
Scanning took place in a 3T Siemens Prisma scanner. Functional images were acquired using a multi-band gradient-echo echo-planar imaging (EPI) pulse sequence (TR = 1373 ms, TE = 33.4 ms, flip angle = 74°, 96 × 96 matrices, slice thickness = 2 mm, no gap, voxel size 2 mm × 2 mm × 2 mm, 72 axial slices covering the entire brain, 4 slices acquired at once). The first 5 volumes served as dummy scans and were discarded to avoid T1 equilibrium effects. Field maps were collected at the end of the experiment (TR = 400 ms, TE = 5.19 ms / 7.65 ms, flip angle = 60°, 64 × 64 matrices, slice thickness = 3 mm, 25% gap, resolution 3 mm isotropic, 32 axial slices). High-resolution anatomical T1-weighted images were acquired for each participant using a 3D MPRAGE sequence (192 axial slices, TR = 2250 ms, TI = 900 ms, TE = 2.99 ms, flip angle = 9°, field of view = 256 mm × 240 mm × 160 mm, matrix dimensions = 256 × 240 × 160, 1 mm isotropic resolution).
The data were preprocessed and analyzed using the automatic analysis (aa) pipelines and modules (Cusack et al. 2015), which called relevant functions from Statistical Parametric Mapping software (SPM 12, http://www.fil.ion.ucl.ac.uk/spm) implemented in Matlab (The MathWorks, Inc., Natick, MA, USA). EPI images were realigned to correct for head motion using rigid-body transformation, unwarped based on the field maps to correct for voxel displacement due to magnetic-field inhomogeneity, and slice time corrected. The T1 image was coregisted to the mean EPI, and then coregistered and normalized to the MNI template. The normalization parameters of the T1 image were applied to all functional volumes. The model incorporated a high-pass filter with a cutoff at 1/128 Hz. Spatial smoothing of 10 mm FWHM was applied for univariate analysis, but no smoothing was done for multivariate analysis.
Regions of interest
For the primary analysis, we focused on the MD and DMN networks. The MD network was taken from Fedoranko et al. (2013), and consisted of regions within the lateral prefrontal cortex (LPFC), extending along the inferior/middle frontal gyrus, a posterior-dorsal region of lateral frontal cortex, the intraparietal sulcus (IPS), parts of the anterior insular cortex, pre-supplementary motor area and adjacent anterior cingulate cortex (ACC). The DMN network was taken from Yeo et al. (2011), combining three subnetworks from the 17 network parcellation (numbers 15, 16, and 17; Andrews-Hanna 2012). These regions include the dorsal medial prefrontal cortex (dmPFC), temporoparietal junction (TPJ), lateral temporal cortex (LTC), temporal pole, ventral medial prefrontal cortex (vmPFC), posterior inferiorparietal lobule (pIPL), retrosplenialcortex (Rsp), parahippocampal cortex (PHC), hippocampal formation (HF+), anterior medial prefrontal cortex (amPFC) and posterior cingulate cortex (PCC). These networks are illustrated in Figure 5 Ai and Bi.
Univariate analysis
FIR Model
Statistical analyses were performed first at the individual level, using a general linear model (GLM). To capture the BOLD timecourse throughout the task episode, a 45 s epoch starting from the onset of the first search array of every task to the first search array of the next task was modeled using a finite impulse response (FIR) basis set of 30 1.5 s boxcar regressors. In this way, the response throughout task episodes could be modelled without making any assumptions about the shape of the hemodynamic response. Error episodes (defined as episodes that had > 25% errors) were removed from the analysis using a similar but separate set of regressors. Effects of cues, and errors on individual search arrays, were also removed by modelling with epoch regressors with duration of respective events (1 s for cues and 2 s for error events), convolved with a basis function representing the canonical hemodynamic response. The estimates for each time point were extracted from the two networks of interest, averaged over voxels within the network and across the 6 tasks. These average beta estimates for individual participants were entered into a random effects group analysis. The timepoints from the first 36 seconds of the average task response in each ROI were plotted for visualization.
Event-based GLM analysis
To complement the FIR model, an event-based GLM analysis was performed. In this analysis, we aimed to separate phasic activity linked to onset of each step from tonic activity across the whole step. Accordingly, each step was modelled using two regressors, an onset regressor modelled with 0 s duration and an epoch regressor modelled with 9 s duration, each convolved with the canonical haemodynamic response function. There were accordingly 48 regressors of interest, two (onset and epoch) for each of the 4 steps in each of the 6 tasks. Error and cue activities were removed as before, with the cue also modelled using a combination of onset (0 s duration) and epoch (duration from cue onset to the onset of the first task step) components. Beta estimates were averaged across the 6 tasks for individual participants, and the contrasts against implicit baseline were entered into a random effects group analysis. To determine whether BOLD signal showed significant linear changes towards goal completion, we performed t tests on increasing ([−3 −1 +1 +3]) and decreasing ([+3 +1 −1 −3]) linear contrasts across task steps. To complement ROI analysis, this analysis was also carried out at the whole brain level, using an FDR-corrected threshold of p < 0.05.
RSA analysis
We performed representational similarity analysis (RSA) using linear discriminant contrast (LDC) to quantify dissimilarities between activation patterns. The analysis was done using the RSA toolbox (Nili et al. 2014), in conjunction with in-house software. The LDC was chosen because it is multivariate noise-normalized, potentially increasing sensitivity, and is a cross-validated measure which is distributed around zero when the true distance is zero (Nili et al. 2014). The LDC also allows inference on contrasts of dissimilarities across multiple pairs of stimuli. A pattern for each step of each task was obtained, by averaging the onset and epoched responses from the standard GLM described above. This resulted in 24 patterns in total in each run. For each pair of patterns, the patterns from run 1 were projected onto a Fisher discriminant fitted for run 2, with the difference between the projected patterns providing a cross-validated estimate of a squared Mahalanobis distance. This was repeated projecting run 2 onto run 1, and we took the average as the dissimilarity measure between the two patterns. All pairs of pattern dissimilarities therefore formed a symmetrical representational dissimilarity matrix (RDM) with zeros on the diagonal by definition. To compare dissimilarity magnitude across ROIs of different sizes, the LDC values were normalized by dividing by the number of voxels within each ROI.
Coding of information within regions of interest
We first performed this RSA analysis using activation patterns from a priori MD and DMN network ROIs. Activation pattern dissimilarity of each stimulus pair, cross-validated across the two scanning runs, was quantified by LDC. The result was a 24 × 24 representational dissimilarity matrix (RDM), as shown in Figure 3A, with each cell showing cross-validated LDC dissimilarity between the corresponding two task events. These included event pairs that shared the same room and episode but different steps (red cells); events that shared the same room and step but different episodes (white cells); events that shared the same room, but different episodes and steps (purple cells); events that shared the same step, but different rooms and episodes (blue cells); and events that differed in rooms, episodes and steps (green cells). All event pairs additionally differed in item. The cells on the diagonal (yellow) are zero by definition as they do not reflect a distance between different task events.
Based on this matrix, separate contrasts were used to examine coding of room, episode, step and item. Brain regions that coded for room should show higher dissimilarity for patterns from different rooms than from the same room, for example, the dissimilarity between “get toothpaste” and “take food from fridge” should be greater than the dissimilarity between “prepare ingredients” and “take food from fridge” (Figure 3, green vs purple cells, and blue vs white cells). To obtain an index of room coding (Figure 3, bottom), we calculated mean LDC distances for green, purple, blue and white cells, and averaged the two differences green minus purple and blue minus white. Regions that code for episodes should show higher dissimilarity for different episodes compared to same episode, for example, “take food from fridge” should show higher dissimilarity to “hand mix batter” than to “wash vegetables” (Figure 3, purple cells vs red cells). Our index of episode coding was simply the mean distance for purple minus mean distance for red cells. Regions that code for steps should show greater dissimilarity for different steps compared to same steps, for example “hand mix batter” should be more dissimilar to “take food from fridge” than to “wash vegetables” (Figure 3, purple vs white, and green vs blue). To index step coding, the two differences purple minus white and green minus blue were averaged. A more complex formula was needed to derive item coding, assuming additive effects of item, episode and step. Within a room, red cells (Figure 3) differ in item and step, white cells differ in item and episode, and purple cells differ in item, step and episode. If there were no item coding, LDC dissimilarities for purple cells should be the sum of dissimilarities for red and white cells. With item coding, the dissimilarities for purple cells will be less than this sum. Accordingly, we computed item coding from the sum of mean distance for red and mean distance for white minus mean distance for purple cells (Figure 3, bottom).
Searchlight analyses
Next, to obtain more specific localization of regions that contained information within the larger networks, and to test for additional regions outside the predefined networks, we implemented a whole brain searchlight procedure (Kriegeskorte et al. 2006) to perform pattern analyses in small spherical ROIs (radius = 10 mm) centered on every voxel of the brain in turn. The procedure was identical to that described in the ROI analysis. Pairwise dissimilarities for each cell type were derived from a 24 × 24 RDM in each sphere, and were assigned to the center voxel. Differences of dissimilarities corresponding to the strength of room, episode, step, and item coding for each sphere were calculated as before, generating whole-brain maps of information coding for each subject. These individual subject maps were smoothed with a 10 mm FWHM Gaussian filter before performing second-level random effects analyses to identify voxels that coded for these four types of information across subjects. Unless otherwise specified, all results are reported at the FDR-corrected threshold of p < 0.05.
Results
Behavioral results
Group behavioral accuracy and reaction time results are show in Figure 4. Results show poorest performance for the first search array of each step, especially for steps 2-4 when participants were required to switch from one step to the next. Overall accuracy was 97.7% ± 0.1% (mean ± SEM) and overall reaction time was 869 ± 4 ms. A step (steps 1-4) × search array (first, second, third within each step) ANOVA for accuracy revealed a significant main effect for step (F(3,126) = 14.21, p < 0.001), a significant main effect for array (F(2,84) = 31.81, p < 0.001), and a significant step × array interaction (F(6,252) = 6.02, p < 0.001). A similar ANOVA for reaction time also showed a significant main effect for step (F(3,126) = 16.15, p < 0.001), a significant main effect for array (F(2,84) = 252.97, p < 0.001), and a significant step × array interaction (F(6,252) = 10.10, p < 0.001).
Univariate results
ROI analysis
The FIR model provided estimates of the observed BOLD response across a full task episode in successive 1.5 s windows starting from the onset of the first step. In the first analysis, we extracted these FIR responses from a priori ROIs (Figures 5Ai and Bi). The MD network exhibited clear phasic response to the execution of each task step (Figure 5Aii), with 4 peaks throughout each task episode, corresponding to the 4 steps, suggesting that it is sensitive to the fine structure of the contents within an episode. Additionally, overall MD activity gradually increased throughout the task episode, suggesting that the MD network is also sensitive to progress through the full task episode. In contrast, DMN regions showed a phasic response that was smaller and more specific to the first step, followed by a tonic activation that began below baseline but gradually increased throughout the episode (Figure 5Bii). These results suggest DMN involvement in initiation and progress of the entire task episode, with less sensitivity to its fine structure.
To quantify the phasic and tonic components contributing to the BOLD response at each task step, we performed a complementary GLM analysis with onset and epoch regressors modelling each task step. Onset regressors were designed to reflect phasic activity at the onset of each task step, while epoch regressors were designed to reflect tonic activity throughout the step. Within the MD network (Figures 5Aiii, iv), there were strong onset responses, in line with FIR results. Contrasts with baseline showed that all steps were significantly greater than baseline (all ts > 10.53, all ps < 0.001). A one-way repeated measures ANOVA showed no significant differences across the four steps (F(3,126) = 2.66, p = 0.08). In contrast, overall epoch responses showed no mean difference in comparison to baseline (t(42) = 0.49, p = 0.63), but increased across successive steps. ANOVA showed a significant main effect of step (F3,126) = 8.71, p < 0.001), as well as a significant linear trend (F(1,42) = 12.65, p = 0.001).
Within the DMN network, there was a significant onset response only for the first step, i.e. the onset of the whole task (t(42) = 5.65, p < 0.001 for the first step; all ts < 0.91 and all ps > 0.36 for steps 2-4). ANOVA showed a significant main effect of step in the onset responses (F(3,126) = 13.66, p < 0.001), as well as a significant linear trend (F(1,42) = 15.70, p < 0.001). Post hoc pairwise comparisons showed significant differences between step 1 and the following three steps (all ts > 4.36, all ps < 0.001, Sidak corrected), with no significant differences among any other steps (all ts < 0.99, all ps > 0.90). The overall epoch responses of the DMN showed no mean difference relative to baseline (t(42) = −1.97, p = 0.06), although ANOVA showed a significant main effect of step (F(3,126) = 17.21, p < 0.001), as well as a significant linear trend (F(1,42) = 29.76, p < 0.001). As seen in the FIR time-course, this implies a gradual increase in the tonic response across the duration of the task episode.
Whole-brain analysis
Results from the whole-brain analysis, again separating onset and epoch regressors, are shown in Figure 6. Panels A and B show contrasts of average onset and epoch regressors against baseline, while lower panels show increasing (C, D) and decreasing (E, F) linear trends across task steps.
In comparison to baseline, onset effects (Figure 6A) were significant in large parts of the MD network, including regions of lateral frontal, insular, dorsomedial frontal, and lateral parietal cortex. Onset effects were also seen in large regions of visual cortex, medial parietal cortex, and subcortical structures including the cerebellum. Epoch effects (Figure 6B), in contrast, were more restricted, including parts of the DMN network (medial prefrontal cortex, hippocampus, parahippocampal cortex, and temporal pole), as well as expected regions of visual and motor cortex. Onset regressors showed a linear increase across successive task steps in a more restricted subset of MD regions. Linear decreases were extensive, including many parts of the DMN network (compare Figures 5Bi and 6E). However, the ROI analysis indicates that this decrease across the DMN network is driven by an onset effect only for the first step of the task. Epoch regressors showed a linear decrease in visual cortex, but otherwise, an extensive pattern of linear increase across much of the brain.
These findings confirm that, in comparison to DMN, MD regions were sensitive to the fine temporal structure of the task, with phasic response to onset of each new task step. This phasic response was shared with several other brain regions, most notably visual and subcortical. DMN, in contrast, showed onset activity only for the start of an entire task. Finally, the data show a pattern of gradual increase in activity across the full 36 s of the task, widespread throughout the brain.
RSA results
Results of the RSA analysis are shown in Figure 7. Rows A-D show contrasts between sets of LDC values (Figure 3) reflecting coding of different types of information: room, episode, item and step. Left panels show contrast values for a priori MD and DMN networks, with asterisks indicating values significantly greater than zero. Right panels show whole-brain searchlight results.
Room coding
Neither the MD nor DMN network showed significant room coding (ts = −0.59 and −0.39, both ps > 0.55). There were no differences in room coding between the two networks (t(42) = −0.13, p = 0.90). In the searchlight analysis, we did not find room coding in any region after FDR correction; however, at the lenient threshold of uncorrected p < 0.05, a region in the medial prefrontal cortex (mPFC) showed greater dissimilarity for events in different rooms compared to those in the same room.
Episode coding
The DMN network showed significant coding for episode (t(42) = 2.63, p = 0.01), while MD did not (t(42) = 1.72, p = 0.09); however, the difference between networks was not significant (t(42) = −1.47, p = 0.15). The whole-brain searchlight analysis revealed episode coding to be localized rather specifically to a region near the anterior fusiform gyrus and PHC bilaterally.
It is possible that the response to regressors modelling adjacent steps could be similar due to imperfect temporal separation of the signal, such that pairs of steps within the same task appear more similar than those from different tasks due to differences in temporal separation in addition to differences in task episode. We examined this possibility by taking the voxels showing significant episode coding in the RSA searchlight, and recalculating the contrast using three subsets of cells, chosen to differ in separation of one, two, or three steps. That is, we extracted LDC values from cells of the RDM that represented either one (step 1 vs. step 2, step 2 vs. step 3, and step 3 vs. step 4), two (steps 1 vs. step 3 and step 2 vs. step 4), or three (step 1 vs. step 4) steps apart, and contrasted between-episode vs. within-episode cells within each subset separately. If temporal leakage were contributing to activity patterns, and hence to apparent episode coding, we should expect a stronger effect for steps closer together in time. However, we found no evidence of any difference in episode coding in these three conditions (F(2,84) = 0.11), nor a linear trend as a function of step (F(1,42) = 0.80).
Item coding
Both MD (t(42) = 7.20, p < 0.001) and DMN (t(42) = 5.60, p < 0.001) networks showed significant coding for item. The MD network showed greater item coding compared to the DMN (t(42) = 3.77, p < 0.001). Furthermore, item and episode coding in the DMN did not significantly differ in strength (t(42) = −1.26, p = 0.22). In the searchlight analysis, we observed item coding in many MD regions (including bilateral regions of the LPFC, ACC, and bilateral IPS), some DMN regions (including TPJ, pIPL, Rsp, PHC, HF+, amPFC, and PCC), as well as strong coding in visual regions as expected.
Step coding
Both MD (t(42) = 9.91, p < 0.001) and DMN (t(42) = 11.56, p < 0.001) networks showed significant coding for step. The MD network showed greater step coding compared to the DMN (t(42) = 4.46, p < 0.001). Step coding was widespread in all regions of the brain. This was not surprising, as in our univariate analysis, we observed significant linear trends across the task episode for much of the brain (visual cortex showed decreasing activity, while most other regions showed increasing activity).
Finally, we asked whether activity patterns in the MD and DMN networks differentially carried information about distinct aspects of task episodes by comparing LDC contrasts of room, episode, item, and step representation across these two network ROIs. A 2 (network) × 4 (information type) ANOVA showed a significant interaction (F(3,120) = 14.25, p < 0.001), as well as main effects of network (F(1,40) = 18.55, p < 0.001) and information type (F(3,120) = 70.51, p < 0.001). When limiting the information type to focus on episode and item coding, the ANOVA continued to show a significant interaction (F(1,42) = 12.31, p < 0.001), as well as main effects of network (F(1,42) = 13.23, p < 0.001) and information type (F(1,42) = 4.61, p = 0.04).
Discussion
The present study used fMRI to examine how different cortical networks represent task episodes. Specifically, we focused on the MD and DMN networks. Using FIR analysis to capture the evolution of the BOLD response throughout a multistep episode, we found that MD regions exhibited clear phasic responses to each task step, suggesting that they are sensitive to the fine temporal structure within a task. In contrast, DMN regions did not show significant phasic responses to every step, but instead exhibited a peak at the onset of the entire task episode. Both networks showed gradual increase in overall activity throughout the task, suggesting that they are both sensitive to task progression. Representational similarity analysis showed differential coding of task features in MD and DMN networks. MD regions showed strong coding of individual items but not the entire task episode, while for DMN, item coding was weaker than found in MD, and episode coding was also now significant. The RSA searchlight analysis confirmed differential representation of task features. The content of individual task steps was represented in visual cortex and the MD network. Task identity was represented in the PHC, with a hint of room identity in the mPFC. Step was widely represented across most of the brain, in line with strong changes in univariate activity as the task progressed.
The finding that MD regions are especially sensitive to the identity of a current task step and its specific item content is consistent with prior research. Many previous experiments have shown coding of task-relevant information in MD regions that can rapidly change according to task demands (e.g., Li et al., 2007; Woolgar et al., 2011; Freedman et al., 2001), including radical reorganization between successive task steps (Sigala et al., 2008). fMRI studies show strong MD activity when a subgoal is completed and in transitions from one event to another (Sridharan et al. 2007; Farooqui et al. 2012), with progressively increasing activity as a goal is approached (Farooqui et al. 2012; Desrochers et al. 2018). The pattern of MD activation in our study is consistent with these previous findings. The results suggest that, as a task episode progresses, MD representations are in constant flux, reorganizing to encode the detailed contents of each task step.
While MD regions did not show significant discrimination for episodes, the DMN exhibited both item and episode coding. Univariate results showed a significant peak of DMN activity at the beginning of each episode, but no significant onset responses to subsequent steps. These findings are consistent with prior reports of transient DMN activation at event boundaries (Ben-Yakov et al. 2013, 2014; Baldassano et al. 2018). It has been proposed that the mental programs required for carrying out a task are assembled at the beginning of task execution (Schneider and Logan 2006; Farooqui and Manly 2018b). It is possible that DMN is involved in long-term memory retrieval for the entire task sequence prior to episode initiation. Our results match the observation that the DMN has long temporal receptive windows and can code for information accumulated over longer time scales (Hasson et al. 2008; Lerner et al. 2011; Manning et al., 2015).
Searchlight RSA revealed that episode representation was not spread equally throughout the DMN, but was focused near bilateral PHC. While DMN is involved in retrieval, it has been proposed to serve a broader function of mental construction of prospective episodes, through information from stored episodic, conceptual, and contextual representations (Buckner and Carroll 2007; Gilbert and Wilson 2007; Andrews-Hanna 2012). The PHC is a key component of the DMN, and has been shown to be involved in context representation (Diana et al. 2007; Ranganath 2010a; Ranganath and Ritchey 2012; Aminoff et al. 2013; Reagh and Ranganath 2018). It has been proposed that DMN regions such as the PHC encode a situation model, representing spatial, temporal, and broader causal relationships between different elements of an event (Ranganath and Ritchey 2012; Reagh and Ranganath 2018). Our results match prior indications that PHC codes for broad features of a current context (Ranganath 2010a; Kim and Maguire 2018).
Within the DMN, it has been proposed that the mPFC is involved in the representation of schemas which are more general than particular task episodes (Preston and Eichenbaum 2013; Spalding et al. 2015; Gilboa and Marlatte 2017; Robin and Moscovitch 2017). Evidence has suggested that the mPFC accumulates information about the context of interrelated episodes (Preston and Eichenbaum 2013; Robin and Moscovitch 2017). In our data, we found a weak suggestion of room coding in the mPFC. Although we trained participants to memorize task episodes one room at a time, and the episodes have clear semantic associations with these locations, our experiment did not require grouping of episodes from the same room, perhaps contributing to weak room representation. Future research could provide more insight into the involvement of mPFC in encoding more generic cognitive contexts.
We also found that the DMN showed significant coding for items. The searchlight analysis revealed that areas that coded for items included several regions in the DMN network (including TPJ, pIPL, Rsp, PHC, HF+, amPFC, and PCC), in addition to more prominent representation in visual and MD regions. These results show some DMN representation not just for full task episodes, but also for specific contents within the episode. It has been suggested the hippocampus, a key region in the DMN, is involved in binding items to contextual episodes (O’Reilly and Rudy 2001; Diana et al. 2007; Manning et al., 2015; Hsieh et al., 2014). To play this binding role, it has been suggested that the hippocampus receives both item representations (e.g. from perirhinal cortex) and episode representations (e.g. from PHC and prefrontal cortex subregions) (Polyn and Kahana 2008; Ranganath 2010a, 2010b; Manning et al., 2015). Although we found both item and episode representations coexisting in PHC, consistent with a compositional code, this experiment cannot determine whether and where items and episodes might be bound into a conjunctive representation: because items were unique to each task, item-episode conjunctions are indistinguishable from item coding. Disentangling these different forms of co-representation requires designs where the same item appears in different contexts. As well as item-context conjunctions in the hippocampus (Hsieh et al., 2014) such designs have associated various frontal and temporal regions with item-order associations (e.g., Reverberi et al. 2012; Kalm and Norris 2014), and rule-rule compositionality (e.g., Cole et al. 2011).
Both MD and DMN, along with most regions of the brain, tracked progress through the task episode, shown by increasing linear trends in the univariate data and step coding in the RSA analysis. These observations are consistent with previous studies that tracked activity and step representation throughout a task episode in MD (Farooqui et al. 2012; Desrochers et al. 2018) and DMN (Hsieh and Ranganath 2015) ROIs, but suggest that it might be a much more global property of brain function. While visual cortex showed a decrease in sustained activity over time, which may reflect adaptation to the sensory input (Grill-Spector and Malach 2001; Grill-Spector et al. 2006), most other cortical regions showed an increase in sustained activity over the episode. As this effect was so widespread, it is difficult to offer a precise interpretation, and different areas may increase for different reasons (Kalm and Norris 2017). For example, it is possible that increased activations in some regions reflect revision and reconfiguration of control representations that may increase in demand as larger portions of the task are complete (Farooqui et al. 2012; Desrochers et al. 2015, 2016). These activity changes could also reflect gradual assembly of an episode representation (Dumontheil et al. 2011) or accumulation of new information (Hasson et al. 2008; Lerner et al. 2011).
A hierarchical control structure is an organized representation of control elements (Rosenbaum et al. 1983; Schneider and Logan 2006) with task identity, local entities, and serial position codes. Our results describe how broad brain networks are involved in executing task sequences, with differential representation of individual task components and entire task episodes. The DMN, we suggest, may establish overall cognitive context, representing both individual cognitive operations and their broader context, and perhaps involved in binding them together. Within the DMN, there may be differentiation between a posterior “situation model” and a broader, more schematic representation in the mPFC. At the same time, the MD system, along with sensory regions, encodes the detailed content of individual cognitive operations. Acting together, these two brain networks manage the hierarchical structure of goal-directed behavior.
Acknowledgements
This work was supported by funding from the Medical Research Council (United Kingdom), program SUAG/002/RG91365. TW was supported by the Medical Research Council PhD Studentship, Taiwan Cambridge Scholarship from the Cambridge Commonwealth, European & International Trust, and the Percy Lander studentship from Downing College.