Abstract
The cortico-basal ganglia-thalamus (CBT) loop is important for behavior. However, the activity and learning-related modulation within the loop in behavior remain unclear. To tackle this problem, we trained mice to perform a delayed sensorimotor-transformation task and recorded single-unit activity during learning simultaneously from four regions in a CBT loop: prelimbic area (PrL), posterior premotor cortex (pM2), dorsomedial caudate/putamen (dmCP), and mediodorsal thalamus (MD). Sensory and decision related information were encoded by the neurons within the loop, with weak interaction among neurons of different coding ability. The functional interaction among regions within the loop was dynamically routed in the loop during different behavioral phases and contributed to explain decision-related neuronal activity. The neurons of PrL and dmCP exhibited learning-related reorganization in neuronal activity and more persistent coding of sensory and decision-related information. Thus, both sensory- and decision-related information are processed in a functionally interacted CBT loop that is modulated by learning.
The cortex-basal ganglia-thalamus cortical (CBT) loop plays a central role to adaptive behavioral control (1–8). Anatomically, cortex and thalamus send excitatory projections to caudate/striatum of basal ganglia, which recurrently and topographically project back to cortex through intermediate regions including basal ganglia and thalamus (1, 8). Perturbation and recording studies demonstrated that the regions in the loop play critical roles in many processes, including motor control(2–4, 6, 9, 10), reinforcement(11–13), perceptual decision making(14–16), inhibitory control(5, 17), and working memory(18–31). Furthermore, impaired functions of the loop has been implicated in many psychiatric diseases(7). Therefore, it is important to understand how neurons within this CBT loop work together. However, the recurrent nature of the connectivity of this loop poses a notable challenge for understanding behavior-relevant coordination in its population activity.
We focused on a behavioral task requiring working memory (WM) and sensorimotor transformation (SMT). WM is a process of the brain to actively maintain and manipulate information for a brief delay period of several seconds to guide behavior (32, 33). SMT is a process of transforming sensory inputs to a motor output based on behavioral context. It is well-known that prefrontal, parietal, and motor-related cortical areas, basal ganglion and thalamus are involved in WM (19, 34) and SMT (35–38). However, recording from different regions were typically obtained from different subjects in different behavioral tasks, rendering difficulties in comparing the results. A simple hypothesis explaining the relationship of WM and SMT related activity is that single neurons can code for both the sample and test odors, in a manner related with pairing relationship (as in delayed match to sample task(41). However, it is unclear whether this simple hypothesis can explain the relationship between WM- and SMT-related activities in the CBT loop. Furthermore, functional interaction among brain regions may undergo distinct dynamics in different phases of learning. But only a few studies recorded neuronal activity throughout learning period of WM tasks (27, 39, 40).
We hereby trained head-fixed mice to perform an olfactory delayed paired association task. In the ODPA task, mice need to maintain sensory WM during a delay period and then decide whether to lick based on specific pairing of sample and test odors. Therefore, the maintained WM needs to be integrated and compared with the upcoming sensory information, then transformed into a lick or no-lick decision. By simultaneously recording single-neuron activity from four regions in a CBT loop, we found distributed representation of both sensory- and decision-related information within a CBT loop and its dynamics in cross-region interaction through learning.
Behavioral paradigm
We trained head-fixed mice to perform a behavioral task based on an automatic training system (Fig. 1A, Sfig. 1)(42). Specifically mice were trained with an olfactory delayed pair-association (DPA) task, as in (43). Mice need to maintain the sensory information of a sample odor (S1 or S2) during the delay period (5 sec in duration), after which a test odor (T1 or T2) was delivered (Fig. 1B). Licking within a response window in the paired trials (S1-T1 or S2-T2) leads to water reward (scored as hit if mice licked, as miss otherwise), but not in the unpaired trials (S1-T2 or S2-T1, scored as false alarm if mice licked, as correct rejection otherwise). Mice learnt the task well within several days (Fig. 1C). The learning was manifested as the increased correct rejection rate in the unpaired trials (SFig. 2A-B). Throughout the learning process, mice maintained a high lick probability in the paired trials (SFig. 2C-D). Mice did not lick during either sample-odor delivery or delay period (SFig. 2E).
Distributed modulation in neuronal activity within the CBT loop by the task stimuli
We custom-made tetrode with twisted wire to perform single-unit recording simultaneously from four regions of a CBT loop: PrL, pM2, dmCP and MD (Fig. 1D, SFig. 3). The recording locations were verified by electrical lesion after recording (Fig. 1E).
In our multi-region recordings, typically about ten neurons were recorded simultaneously in each region on each day of recording (see materials and methods, SFig. 4). Overall 1837 neurons were recorded. To visualize the activity modulation of each region in the task, we plotted the Z-score normalized firing rate of each neuron (Fig. 1F). The neuronal activity of all four regions was modulated during the sample-delivery, delay, and response periods (Fig. 1F, SFig. 5). Both activation and suppression in neuronal activity were observed following the sample- and test-odor delivery (cells in red and blue in Fig. 1F). On average there was a transient increase in the population firing rate during the odor-delivery period (Fig. 1G), then a small ramping-up modulation during delay period, as in (44). We then separated the neurons into the activated and suppressed groups. For the activated group, the neurons of PrL and dmCP are more strongly modulated by the sample and test odors (Fig. 1H). For the negatively modulated group, the difference of modulation among the four regions was not statistically significant (Fig. 1H). Variance in neuronal firing was increased during the odor-delivery period, as revealed by the Fano-factor analysis (Fig. 1I).
Neurons of the CBT loop maintained task-related sensory information
A hallmark of WM-related activity is the ability of coding the maintained information during the delay period (19, 24, 26, 27, 30, 34, 43–46). The behavioral design of the DPA task is sensory oriented, because motor-related decision cannot be made during delay period. Therefore, one can predict that an action-oriented CBT loop may not necessarily maintain the sensory information during the delay period. Instead, we observed many neurons in all four regions of this loop maintaining the sensory information (Fig. 2A-C). An example neuron with different firing rates following different sample odors was plotted in Fig. 2A. We defined the sensory selectivity of a neuron as the difference between the average firing rate following different sample odors, divided by the sum. We then plotted the sample-odor selectivity from all four regions in the heat maps to visualize the overall distribution of the sample selectivity (Fig. 2B). For each region, the average sample selectivity for the neurons preferring S1 or S2 was balanced (Fig. 2C). Both the selectivity for individual neurons (heat-map, Fig. 2B) and averaged population (Fig. 2C) demonstrated that the four regions exhibited different levels of selectivity for sample odor stimulus. The neurons in PrL and dmCP exhibited stronger sample selectivity in the early delay period, whereas the neurons in dmCP exhibited higher selectivity in the late delay period (SFig. 6). To verify the selectivity results, we performed the decoding analysis based on the classifier of maximum correlation coefficient (MCC)(47). The sample information can be successfully decoded during the early delay period in all four regions and the late delay period in PrL, pM2, and dmCP (Fig. 2D).
We then calculated the selectivity for the test odor. The neurons in all four regions significantly coded for the test odor (test-odor selectivity in Fig. 2E; decoding in SFig. 6C). To test the relationship between the ability of the CBT loop in coding the sample and test odors, we plotted the test-odor selectivity against the sample-odor selectivity. We observed significant but small correlation between the sample- and test-odor selectivity in all four regions (Fig. 2F). We also observed higher-than-chance level overlap in the neurons with significant sample- and test-odor selectivity in PrL and dmCP (Fig. 2G). Therefore, the coding of odor information in the CBT loop is weakly correlated for sample and test odors.
Neurons of the CBT loop represented the pairing-related information
To accomplish this olfactory memory task, mice need to integrate the information of sample and test odors across the delay period to make a licking decision, according to paired/non-paired relationship. The presence of both sample- and test-odor information in the CBT loop (Fig. 2) suggested the possibility that the computation within the loop might be calculating the pairing relationship. We therefore analyzed the coding ability of the pairing relationship and the process of SMT in the CBT loop.
We firstly analyzed the coding ability for the pairing relationship in the individual neurons of the four regions. An example neuron was plotted showing different firing rates following the paired and unpaired odor stimuli (Fig. 3A). The selectivity for pairing relationship was defined as the difference between the average firing rate in the trials with the paired and unpaired odors, divided by the sum. The neurons of all four regions exhibited significant pairing selectivity (Fig. 3B-C). There was no significant pairing selectivity before the test-odor delivery (Fig. 3C), consistent with the lack of pairing information during this period. During the test-odor delivery and response periods, the neurons in PrL and dmCP exhibited stronger decoding power for pairing relationship (Fig. 3D).
We then tested the relationship between the sensory and pairing coding of the neurons within the CBT loop. Firstly, the correlation between the sample- (Fig. 3E) or test-selectivity (Fig 3F) and pairing-selectivity were very weak (R2 ≤ 0.012), even though some of the correlation was statistically significant (P < 0.05, Spearman correlation, Fig. 3E-F). Secondly, the percentage of neurons coding both sensory stimuli and pairing relationship was mostly around chance level (except the neurons of PrL and dmCP, Fig. 3G). Therefore, the neurons of the CBT loop tended to code for sensory information and pairing relationship independently. In other words, there were very few paring-selective neurons integrating both sample and test information.
Further analyses supporting independent coding of sensory information and pairing relationship
To further corroborate the independent coding of sensory information and pairing relationship within the CBT loop, we performed two more independent analyses. Firstly, we performed decoding analysis while removing the neurons coding certain information. The logic is that if a give type of neurons is critical for population decoding, then removing them in decoding analysis should impair the decoding performance. Indeed, when we removed the neurons selective to pairing relationship, there was a marked reduction in decoding performance for pairing relationship (comparing the green with black curves, Fig. 4A; statistics in Fig. 4B). However, removing the neurons coding for sample and test selective neurons did not change the decoding performance (no difference among black, blue, and red curves, Fig. 4A; statistics in Fig. 4B). It is noted that the green curves in Fig. 4A were still increased one second after the test-odor delivery. This period was corresponding to the water-reward delivery period. The activity during this period was not used to quantify the pairing-selective neurons. Because the water reward was correlated with the hit response in the paired trials, one can infer that there were neurons with reward selectivity during the period, which was indeed shown in SFig. 7. In the current design the coding for water reward cannot be dissociated with the pairing relationship besides the onset timing. Furthermore, the reward prediction cannot be dissociated with the pairing relationship in the paradigm.
Secondly, we fitted the neuronal firing with a general linear model (GLM) to quantify the relative contribution of different types of neurons in explaining the activity of recorded neurons. The task-related variables (including sample/test odors and pairing relationship) and neuronal activities of simultaneously recorded other neurons (from all regions) were incorporated to fit the neuronal activity of each recorded neuron (neural model, Fig. 4C). To measure the goodness of fit, deviance is defined as the squared summation of the error between the model prediction and recorded neuronal activity. As a control, we also generated a model incorporating the task-related variables and randomly generated neuronal firing with the same number of simultaneously recorded other neurons (control model). The deviance of control model is significantly larger than that of real-neural model at each time in the task (Fig. 4D). Therefore, the neuronal activities can explain firing of other simultaneously recorded neurons, consistent with (48–50). To measure the contribution of a given group of neurons, fitting performance is defined as the difference between the deviance of neural model and the deviance of respective control model normalized by the deviance of control model. The fitting performance keeps stable during the task for PrL, pM2, dmCP and MD (Fig. 4E). To identify the contribution of different types neurons in explaining neuronal activity of pairing-selective neurons, we separately calculated the fitting performance without sample-selective, test-selective, or pairing-selective neurons. For PrL, pM2 and dmCP, removing sample-selective or test-selective neurons did not induce significant effects on fitting performance compared with model incorporating all neurons (Fig. 4F, blue and red). However, removing pairing-selective neurons significantly impaired the fitting performance (Fig. 4F, green). For MD, removing sample-selective or test-selective neurons also significantly impaired the fitting performance. Removing pairing-selective neurons induced significantly stronger impairment. These results supported the dissociation between pairing-selective neurons and sensory-selective neurons.
Pairing-selective neurons in PrL, pM2 and dmCP are more tightly coupled
The previous GLM was calculated based on the neurons from all regions. To further examine the functional cross-region interaction, we calculated the fitting performance of each region according to model with neurons from only one region in the CBT loop. For example, we fit the PrL neural activity only with firing of pM2 neurons to infer the functional interaction from pM2 neurons to PrL neurons. In other words, such fitting performance indicates the coupling from the neurons of the source region to the neurons in the fitted target region. During either the sample or the test period, the fitting performance among the neurons in PrL, pM2 and dmCP were higher than that of MD (Fig. 5A for sample period, Fig. 5B for test period). Therefore, PrL, pM2 and dmCP coupled with each other more tightly than they coupled with MD (Fig. 5C).
To dissect the functional interaction specific to the coded information, we calculated the fitting performance of regions in the CBT loop only considering the neurons with significant coding for sample, test or pairing information. The coupling pattern calculated for sample-selective neurons (Fig. 5D), test-selective neurons (Fig. 5E) and pairing-selective neurons (Fig. 5F) all had similar patterns as the coupling patterns calculated for all neurons. Specifically, the coupling strength calculated from pairing-selective neurons was stronger than that calculated from sample- or test-selective neurons in the PrL-pM2-dmCP loop (Fig 5D-F). This was further demonstrated by arranging the fitting performance of pairing-neurons in all regions (Fig. 5G). The ranking for the fitting performance of pairing-neurons is quite similar within PrL-pM2-dmCP loop, which is consistent with the patterns from the heat maps (comparing Fig. 5G with Fig. 5D-F). We further separated the PrL-pM2-dmCP pairs and MD-related pairs to compare their different coupling patterns. Similarly, we observed higher fitting performance in PrL-pM2-dmCP pairs than MD-related pairs, both for fitting the firing of test-selective and pairing selective neurons (Fig. 5H). Furthermore, within the PrL-pM2-dmCP loop, the fitting performance was higher for pairing-selective neurons than test-selective neurons (Fig. 5H left, comparing the blue and red bar). This was consistent with the notion that pairing-selective neurons in this loop can be better explained by the firing of other neurons.
Learning-related dynamics in neuronal firing and functional interaction in the CBT loop
Mice were trained to perform the DPA task well within five days (Fig. 1C), therefore enabling us to record and examine the neural correlates during the entire learning process. The fitting performance of pairing-neurons of the PrL-pM2-dmCP pairs was significantly higher on the third day of learning while the fitting performance of pairing-neurons of the MD-related pairs did not have such significant change (Fig. 6A-B). We found that sample selectivity was not significantly changed along learning process in PrL, dmCP and MD (Fig. 6B-C). We detected a statistically significant change in sample selectivity in pM2 neurons (red curves in Fig. 6B), which might be due to the lower selectivity in Day 4 in learning (post hoc test). In contrast, the pairing selectivity in all recorded regions was significantly increased during learning in decision-making period (Fig. 6D).
To summarize the learning-related dynamics in firing of all the recorded regions, we calculated the percentage of modulated neurons for different time periods on day 1, 3 and 5 of learning process (Fig. 6F), including: a) Significantly Modulated during sample-delivery period; b) Significantly modulated during decision-making period; c) Significantly selective to sample odors, during sample-delivery period; d) Significantly selective to sample odors, during late delay period (during 4~6 s after sample odor); e) Significantly selective to test odors during decision-making period; f) Significantly selective to pairing during decision-making period; g) Significantly selective to pairing after reward-feedback (during 10~12 s after sample odor). Generally, the four recorded regions in CBT loop have similar trend of change in firing properties. The neural modulations were slightly decreased along learning (a-b in Fig. 6F). The representations for sensory information were stable while the representations for decision were increased along learning (c-g in Fig. 6F). Among the four regions, PrL exhibited stronger sensory representation and dmCP exhibited stronger decision representation. Thus, in the recorded CBT loop for the OPDA task, sensory information was stably maintained and the pairing/decision-related information was gradually improved during learning.
Discussion
In the present study we trained mice to perform a delayed paired association task, in which maintenance of sensory information during delay period and learning association between arbitrary odor pairs are required for a successful task performance. Consistent with the critical role of cortico-subcortical loop in working memory (18–31), all four regions exhibit task-related modulation in neuronal activity (Fig. 2). Interestingly, all four regions also exhibit similar degree of selectivity to sensory information during delay period (Fig. 3). Such a broad distribution in neuronal modulation and selectivity supports the notion of distributed nature of working memory processes (34).
Neuronal activity of all four regions was associated with paired/non-paired relationship during the test-odor delivery and response periods (Fig. 4), with dmCP neurons exhibited the largest difference between the paired vs. non-paired trials. The neuronal response during this period can be associated with many brain processes, such as decision making, motor planning, and reward expectation. The exact functional role of neuronal coding of this behavioral phase remains to be determined.
In the current DPA design, mice can either use prospective or retrospective strategy(46). The maintained information in retrospective strategy is the already-presented sample odor, whereas that in prospective strategy is to code for the paired incoming test odor. Our recording study in anterior piriform cortex, a sensory region, suggested that mice were not using prospective strategy to perform the task(43). There was no coding for pairing relationship before test-odor delivery (Fig. 3C), further supporting the idea that mice were not using prospective-coding strategy to perform the task.
When we calculated the percentage of neurons significantly selective for task-related information, the selectivity corresponding to a single stimulus (sample, test and pairing) were higher than chance level (Fig. 2C, 3C). However, the percentages of neurons coding multiple stimuli were close to chance level of overlapping (Fig. 2G, 3G). Therefore, among the four regions, task-related stimulus spread randomly, with low percentage of neurons coding all task-related information. However, the task information must be somehow integrated for optimal SMT. Although the percentage of such neurons is very low for the four regions, we indeed observed a small percentage of them (~1%, Fig. 3G). One hypothesis is that the neurons coding for both sample and test information have very strong projection to other neurons to make them selective for pairing, which need to be determined by future studies.
We generated GLM models to examine the functional interactions among the four regions. The results of the GLM should not be viewed as mono-synaptic connections, because both direct and indirect innervation can contribute to the GLM fitting. However, we could still obtain information about how information functionally flows among the four regions (Citation needed).
In conclusions, we simultaneously recorded multiple neurons from mice PrL, pM2, dmCP and MD. Neurons of the four regions have different level of modulation and selectivity for task-related information. The coding of WM and SMT information is independent. The neurons within PrL-pM2-dmCP loop exhibited stronger coupling than MD, especially for pairing-selective neurons. During learning, the change of firing patterns mostly happens on decision-related neurons and remains stable for sensory-related neurons. Our results provide important insights concerning the dynamic activation and cross-region interaction of a distributed cortico-basal ganglia-thalamus loop in learning a delayed sensorimotor task.
Acknowledgments
The work was supported by the National Science Foundation for Distinguished Young Scholars of China (31525010, to C.T.L.), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB32010100), the Shanghai Municipal Science and Technology Major Project (Grant No. 2018SHZDZX05), the Instrument Developing Project of the Chinese Academy of Sciences (Grant No. YZ201540), the Key Project of Shanghai Science and Technology Commission (No.15JC1400102, 16JC1400101), the General Program of Chinese National Science Foundation (31471049), the China–Netherlands CAS-NWO Programme: The Future of Brain and Cognition (153D31KYSB20160106).