Abstract
Our propensity to acclimate to new surroundings and choose a goal-directed behavior for maximal reward (i.e., optimal outcome) is natural, for it affects our survival. A line of studies suggested that anterior cingulate cortex (ACC) is a potential hub for regulating adaptive behaviors. For instance, an experimental study noted ACC contribution to selecting motor response for maximal reward; it found 1) that ACC neurons were selectively activated when reward was reduced and 2) that the suppression of ACC activity impaired monkeys’ ability to change motor response to gain maximal reward. Then, how does ACC modulate motor responses? To address this question, we sought biologically-plausible mechanisms that can account for experimental findings mentioned above. Here, we built a computational model that can replicate the observed ACC activity patterns. Our simulation results raise the possibility that ACC can correct behavioral response by reading out and updating motor plans (guiding future motor responses) stored in prefrontal cortex (PFC).
Introduction
We are able to choose the best action to obtain the optimal outcome by evaluating available choices in terms of (expected) rewards; see (Soltani and Wang, 2008; Sugrue et al., 2005) for reviews. The exact value of action may be uncertain and even evolve over time, and thus the strategy to gain maximal rewards needs to be updated constantly. Then, how does the brain accomplish adaptive decision-making? In principle, the brain would need a precise mapping among at least three distinct factors, sensory inputs, (expected) rewards, and available choices. Notably, anterior cingulate cortex (ACC) has access to crucial information necessary for generating such mapping. First, dopamine, which has been thought to mediate reward signals (Berridge and Robinson, 1998; Roesch et al., 2007; Sugrue et al., 2005), projects to ACC (Berger et al., 1991; Seamans, 2007). Second, ACC interacts with PFC (Paus, 2001; Sallet et al., 2011) known to participate in executive functions such as working memory (Fuster, 2001; Lara and Wallis, 2015; Miller and Cohen, 2001; Tanji and Hoshi, 2008). Consistent with these findings, ACC has been suggested to detect errors (Brown and Braver, 2005; Carter and Cohen, 1998; Ito et al., 2003; Ullsperger and von Cramon, 2003; Yeung et al., 2004) and select motor response for maximal reward (Bush et al., 2002; Holroyd and Coles, 2002, 2008; Shima and Tanji, 1998). Here, we aim to better understand the contribution of ACC to adaptive decision-making.
A detailed account of ACC activity during reward-based decision-making was reported by Shima and Tanji (Shima and Tanji, 1998), who trained monkeys to ‘push’ or ‘turn’ a lever when a visual cue was given. Monkeys received reward when they selected the reward-choice, which was randomly chosen. After a block of trials, the reward-choice was changed from one to another; to monkeys, this transition was signaled by reward reduction. When reward was reduced, monkeys needed to switch from previously selected motor response to another to obtain maximum reward. During reward-based decision-making, reward reduction depolarized ACC neurons, and the suppression of ACC neuron activity impaired monkeys’ ability to switch motor response accurately (Shima and Tanji, 1998). They further found that most ACC neurons, which responded to reward reduction, were sensitive to the sequence of prior-response (before reward reduction) and post-response (after reward reduction). For instance, a subset of ACC neurons was activated when monkeys changed their response from push to turn, but they remained quiescent when monkeys’ response sequence was reversed. The results indicate that ACC neurons can access information regarding the identity of prior-response and reward. However, a more recent study (Walton et al., 2006) questioned the accuracy of the central role of ACC in error correction (Shima and Tanji, 1998) by reporting that the lesion in ACC impaired monkeys’ ability to maintain the right choice rather than their ability to correct the choice (Walton et al., 2006).
This inconsistency is puzzling, as the same experimental protocol was employed in the two studies (Shima and Tanji, 1998; Walton et al., 2006). To gain insights into ACC function in adaptive decision-making, we sought neural mechanism that can account for these seemingly conflicting observations. Specifically, utilizing a biologically-plausible computational model, we asked if a single network can account for both experimental findings (Shima and Tanji, 1998; Walton et al., 2006). Inspired by earlier studies suggesting that prefrontal cortex (PFC) subserve working memory (Baddeley, 2003; Miller and Cohen, 2001; Wang et al., 2013; Wilson et al., 1993) and that PFC and ACC are connected reciprocally (Paus, 2001; Sallet et al., 2011), we hypothesized 1) that PFC can hold motor plans in working memory to guide monkeys’ response which is required at the onset of visual cues and 2) that ACC can read out and update motor plans stored in PFC. To address these hypotheses, we built a computational model consisting of ACC, PFC, and motor cortex (MC), which we included to describe the full process of motor response (i.e., lever movement) shift elicited by reward reduction.
In this study, we mimicked the experimental protocol used to measure ACC activity (Shima and Tanji, 1998). To model the effects of dopamine in ACC, we simulated the effects of D2 receptors known to induce inhibition in ACC (Darvish-Ghane et al., 2016) by providing inhibitory currents to ACC neurons. In our model, when reward is reduced, inhibitory currents are removed. Our simulation results suggest 1) that the interplay between ACC and PFC can account for the experimentally observed ACC activity patterns (Shima and Tanji, 1998), 2) that ACC can read out and update the motor plan in PFC and 3) that the removal of a ACC neuron population driven by internal excitation within ACC can destabilize the motor plan in PFC. The results support our hypothesis and further suggest that a single network architecture can account for the two seemingly inconsistent experimental observations (Shima and Tanji, 1998; Walton et al., 2006). This model can provide a starting point from which we can develop biologically-plausible model of ACC that can account for diverse functions of ACC in a single framework (Ebitz and Hayden, 2016; Paus, 2001; Vassena et al., 2017; Yarkoni et al., 2011).
Results
Our model consists of three distinct cortical areas, ACC, PFC and MC; see Fig. 1A. Each area consists of multiple populations of leaky integrate-and-fire neurons (Methods). All neurons are identical in terms of internal dynamics (i.e., subthreshold dynamics and spike generation), but they receive distinct external background and recurrent synaptic inputs; see Methods. In ACC, three excitatory populations (PT, TP and NS) and two inhibitory populations (PTi and TPi) interact with one another via connectivity listed in Table 1 (Fig. 1). NS population does not receive any external background inputs and is internally driven by other ACC neurons. PTi and TPi provide feedforward inhibition onto PT and TP, respectively. PT and TP project to NS, which sends excitation to both PTi and TPi; that is, NS can project an indirect feedback inhibition onto PT and TP. Recurrent connections are established within the populations (Table 1); they are not illustrated in Fig. 1 for clarity. ACC is also connected with PFC and MC (Methods), both of which include two excitatory (P and T) and two inhibitory populations (Pi and Ti). In PFC, the excitatory recurrent connections within the excitatory population are set to be strong enough for PFC populations to generate persistent activity (Table 1), which are well-studied neural correlates of working memory. In MC and PFC, T and P represent ‘turn’ and ‘push’, respectively. In MC, they represent motor responses.
We simulated the experimental protocol with 3200 msec-long simulations by considering the effects of dopamine D2 receptors which are known to induce inhibition in ACC (Darvish-Ghane et al., 2016); Fig. 1B illustrates the simulation protocol. In the simulation, the reward reduction is simulated by removing inhibitory currents to ACC (−250 pA). Each simulation consists of 3 blocks starting with a visual cue (Fig. 1B). As visual cues were used to signal monkeys to move the lever in the experiment (Shima and Tanji, 1998), they must not carry any information regarding reward-response. Therefore, in our model, we simulated visual cues with afferent inputs projecting to both populations in MC. As seen in Fig. 1B, visual cues are introduced to MC at 200, 1200 and 2200 msec after the onset of simulations, and their durations are 200 msec. To test reward effects on network activity, we performed simulations in two different conditions, continuous-reward (CW) and reduced-reward (RW). In the CW condition, the reward level remains constant, whereas in the RW condition, it is reduced at 2000 msec after the second visual cue. The initial motor plan is always ‘turn’ unless stated otherwise. As we assumed that PFC store motor plan as persistent activity, we provided 100 msec-long excitatory inputs to T population in PFC to initiate persistent activity within it before the first visual cue.
PFC can hold motor plan and guide MC response accordingly
We first tested model responses in the CW condition (i.e., the high level of reward through simulations). As seen in Fig. 2A, T population in PFC, stimulated by external inputs before the first visual cue, fires continuously at a high firing rate due to strong recurrent connections in PFC (Table 1). That is, ‘turn’ is retained as the motor plan during the entire simulation period. With this motor plan in PFC, T population in MC responds to all three visual cues (Fig. 2B); ACC neurons remain quiescent. In the model, MC neurons are set to fire only when they receive afferent inputs from both PFC and visual cues. This makes MC neurons fire only during the visual cue, even though PFC neurons continuously project afferent inputs to MC T population. To examine if this result is robust over random connections and background inputs, we instantiated 100 independent networks. Each network has randomly chosen connections which are drawn from the same connectivity rule (Table 1), and we repeated the same simulation with independent sets of background Poisson spike trains (Table 2). In each simulation, we compared the average firing rates of P and T populations in MC during the presentation of third visual cue (2200–2400 ms). As seen in Fig. 2C, MC responses to the third are ‘turn’ in all 100 simulations, indicating that this network model sustains the same responses when reward is maintained at a constant level.
ACC neurons update motor plan in PFC when they are depolarized by reward reduction
We next tested responses of the model in the RW condition, in which reward is reduced at 2000 ms (after the second visual cue). As in the CW condition, T population in MC responds to the first two visual cues (200–400 and 1200–1400 msec), but P population responds to the third visual cue (2200–2400 msec) (Fig. 3A). This means motor response is updated successfully by the reduction of reward. Then, how does the reward shift motor response? In the model, when the reward is reduced, inhibition onto ACC neurons is lifted, making ACC neurons more responsive to afferent inputs. Among TP and PT populations in ACC, TP population is activated selectively (Fig. 3B) due to the excitation from T population in PFC (Fig. 3C).
We note that the activation of TP population in ACC initiates a series of changes in the model. First, the persistent activity of T population is turned off (Fig. 3C). Once TP population is active, it increases excitatory inputs to Ti in PFC, which in turn inhibits T population in PFC. Second, P population in PFC is turned on (Fig. 3C). As soon as T population in PFC is turned off, P population is liberated from lateral inhibition induced indirectly by T population (Fig. 1A) and starts firing due to external background inputs (Table 2); background inputs alone can drive PFC neurons to fire. Third, NS population in ACC becomes active due to excitation from TP population (Fig. 3B), which increases inhibition onto PT and TP by innervating PTi and TPi. This delayed feedback inhibition suppresses TP activity after active firing, meaning the activation of NS population makes TP population transient, as observed by Shima and Tanji (Shima and Tanji, 1998).
With these changes, P population in PFC is active, when the third visual cue is given (2200–2400 msec), and P population in MC becomes active consequently (Fig. 3A). After the switch in this response is instantiated properly, we assumed that the reward level is restored and re-introduced inhibitory inputs to ACC, making ACC neurons quiescent again (Fig. 3B). These results suggest that ACC can shift motor response indirectly by updating the motor plan in PFC. As in the CW condition, we repeated simulations using 100 independently instantiated networks. To examine when the motor plan is switched in all these simulations, we calculated the selective index (SI) defined by Eq. 2 in each simulation. SI becomes positive when T population produces stronger responses in the selected 50 ms-time bin than P population. They are normalized to be between −1 and 1; see Methods. The simulation results are summarized in Fig. 4. TP neurons in ACC generate stronger spiking activity than PT neurons do (Fig. 4A), the persistent activity in PFC is successfully transferred from T to P after the reward reduction (Fig. 4B), and motor responses are properly switched in all 100 simulations (Fig. 4C).
We further tested the importance of ACC in shifting motor response in the model by removing inter-areal connections from ACC to PFC. As seen in Figs. 5A and B, T population in MC responds to the third visual cue when inter-areal connections from ACC to PFC are removed, confirming that it is ACC that corrects motor response via interactions with PFC. Finally, we tested whether these results depend on the initial motor plan by changing it from ‘turn’ to ‘push’. As seen in Fig. 6, we observed the equivalent results even when the initial motor plan was switched.
NS population is necessary for keeping motor plan in PFC stable
NS population in ACC shows delayed activity (Fig. 3B), and this population makes TP (PT) population activity transient. These transient and delayed responses are consistent with ACC activity patterns observed during reward-based decision-making (Shima and Tanji, 1998), suggesting that NS population is necessary for replicating ACC activity patterns experimentally observed. To gain better understanding of NS population function, we removed NS from ACC and repeated simulations in the RW condition. We found that MC response to the third population became erroneous (Fig. 7A). In 6 out of 100 independent simulations, T population produces more responses than P population; that is, motor responses are not properly switched in 6 simulations. Interestingly, we note that both TP and PT populations become active when NS population is removed (Fig. 7B). PT population also becomes active because P population provides excitation to PT population once the motor plan is shifted to ‘push’ (Fig. 7C). With TP and PT populations active, both P and T populations in PFC receive inhibition, disrupting the persistent activity in both populations (Fig 7C). After the quiescent period, the persistent activity can emerge either in P or T population (Fig. 7D).
These observations led us to hypothesize that NS population may be responsible for stabilizing the motor plan stored in PFC. To test this hypothesis, we prolonged each simulation to have 6 visual cues. Specifically, 3 more visual cues are added, and simulations before the fourth visual cue are indeed identical to those with 3 visual cues discussed above. Figure 8A shows how PFC activity evolves over time in the RW condition without NS population. As seen in the figure, the persistent activity moves back and forth after reward reduction. This unstable PFC activity makes MC response to the last four visual cues become erroneous; T population, instead of the correct P population, sometimes responds to visual cues (Figs. 8B–E). We also note that incorrect answers to the first cue (after the reward reduction) are less frequent than those to other three cues (Fig. 8B–E). These results are consistent the observation that the lesion in ACC disrupted monkeys’ ability to maintain the correct choice (Walton et al., 2006), while it did not strongly degrade the accuracy of response to the first cue (after the reward reduction).
Discussion
To gain insights into ACC’s function in regulating motor response during adaptive decisionmaking, we sought neural circuits capable of 1) replicating ACC neuron activity patterns observed during a simple reward-based decision-making (Shima and Tanji, 1998) and 2) explaining how ACC neurons change motor response. In our model, there are two types of excitatory ACC neurons, PT/TP and NS. They show transient and delayed responses consistent with those of ACC neurons, which were experimentally observed (Shima and Tanji, 1998). Our simulation results suggest that the two neuron types have distinct functions. TP/PT neurons can read out and update motor plan when reward is reduced, which is consistent with correlated activity in the two areas (Oemisch et al., 2015; Paus, 2001) and the suggested central role of PFC in error detection (Gehring and Knight, 2000). In contrast, NS neurons, internally driven by PT/TP neurons, do not participate in correcting responses directly. Instead, they enable MC and PFC to maintain the right choice by suppressing TP/PT neuron activity. Thus, our model raises the possibility that ACC can update and stabilize motor response, which, in turn, can explain the two seemingly conflicting experimental findings reported by earlier studies (Shima and Tanji, 1998; Walton et al., 2006) with a single mechanism.
Below, we discuss the potential links to the earlier models of ACC; see (Vassena et al., 2017) for a comprehensive review.
ACC as a motor control filter
Our model predicts that ACC neurons can switch motor response via interacting with PFC. That is, ACC in the model works as a type of ‘motor control filter’ proposed by (Holroyd and Coles, 2002, 2008). In the original motor control filter model, multiple motor controllers produce answers, and ACC selects the best controller for the given task. In our model, PFC is the only considered motor controller instead of multiple controllers, and ACC can dictate motor plan to be stored in PFC. However, our model does not exclude the possibility that ACC can interact with multiple motor controllers, as the motor control filter theory proposes. On the contrary, it proposes a potential mechanism by which ACC can select one motor controller over others. In our model, ACC suppresses the incorrect motor plan by providing di-synaptic inhibition to the PFC population representing it. We underline that this mechanism can also be used to inhibit all other unnecessary controllers. We note that some ACC neurons show sustained activity (Shima and Tanji, 1998), and they can disable unselected motor controllers by inducing constant (tonic) inhibition in them.
ACC as a conflicting monitor
In our model, populations in ACC have exclusive counter parts in PFC; for instance, PT (TP) population in ACC does not interact with T (P) population in PFC (Fig. 1A). These exclusive connections are desirable to implement the strategy to ensure maximal reward in the task used to study ACC activity during reward-based decision-making (Shima and Tanji, 1998). In the brain, however, the exclusive connections may be too restrictive, and the weak cross connections from PFC to ACC (e.g., P in PFC projects to TP in ACC) exist likely. With these cross connections, our model can effectively account for conflicting monitoring attributed to ACC (Botvinick et al., 2001; Yeung et al., 2004), if PFC can encode multiple aspects of sensory evidence (e.g., color and identity of word). With PFC neurons responding to both the color and the identity of words, a word ‘red’ written in blue color will innervate more PFC neurons than a word ‘red’ in written in red color. That is, the incongruent word will enhance ACC activity more than the congruent one, as the conflicting monitoring theories suggested (Botvinick et al., 2001; Yeung et al., 2004), which is supported by PFC’s dominant role in action monitoring occurring in ACC (Gehring and Knight, 2000).
Extension to unified framework of ACC function
It is well understood that ACC activity is elicited during various types of cognitive tasks (Ebitz and Hayden, 2016; Paus, 2001; Vassena et al., 2017; Yarkoni et al., 2011). For instance, ACC activity is reported to be correlated with expected values of outcomes (Shenhav et al., 2013; Walton et al., 2006), and it can also encode the efforts for given tasks or choice difficulty (Aarts et al., 2008; Botvinick, 2007; Hauber and Sommer, 2009; Schweimer and Hauber, 2006; Vassena et al., 2014). Although the diverse hypothetical functions of ACC pose great challenge when developing a unified theory of ACC function, recent theoretical studies provide potential mechanisms to account for diverse ACC functions in a single framework; see (Vassena et al., 2017) for a review. Predicted outcome model (PRO) and reward value prediction model (RVPM) assume that ACC neurons predict the value or the likelihood of future outcome via reinforcement learning (Alexander and Brown, 2011; Silvetti et al., 2011; Vassena et al., 2017).
Based on these studies, we may assume that behavioral tasks with stochastic rewards (i.e., uncertainty of rewards) are necessary to fully understand ACC function. As our goal is to elucidate biologically-plausible mechanisms to account for a simple deterministic behavioral task, in which components of the task have no uncertainty, we did not consider ACC functions associated with stochastic rewards in this study. In the future, we will extend our model by considering 1) more complex tasks with stochastic rewards (i.e., uncertainty) and 2) D1 receptor which was suggested to underlie ACC’s contribution to effort-based decision-making (Schweimer and Hauber, 2006). The extended study could bring us a step further to a single biologically-plausible theory to account for diverse functions of ACC.
It should be noted that biologically-plausible models can simulate multimodal neural signals including functional magnetic resonance image, electroencephalogram and single unit activity. As these multimodal signals can be used to test a theory against abundant sets of data, we expect the extended version of our biologically-plausible model to provide rigorous ways to test multiple theories regarding ACC functions. For instance, it will produce multiple scale neural responses derived from PRO and RVPM models, and such predictions will help us choose the model that describes ACC function more accurately.
Methods
Our model consists of three cortical areas interacting with one another via inter-areal connections shown in Fig. 1 and Table 1. In each area of the model, neurons are split into several populations and have distinct connectivity. In addition, external background inputs (Poisson spike trains) are introduced to them to regulate their excitability and make their activity stochastic. The rates of Poisson spike trains are specific to neuron types (Table 2). We used NEST, the peer-reviewed simulator, to build our network model (Gewaltig and Diesmann, 2007).
Neuron and synapse models
All neurons in the model are implemented by leaky integrate-and-fire (LIF) neurons. Specifically, the subthreshold membrane potentials obey Eq. 1, and the crossing of the spike threshold (−55 mV) induces a spike. After the spikes are registered in the simulators, the membrane potentials are reset to the resting membrane potential (−70 mV), and no spike was allowed during the absolute refractory period (3 ms). The synapses in the model induce instantaneous voltage jumps in target neurons and decayed exponentially according to Eq. 1. , where V is the membrane potentials; where τm =10 ms and τsyn=2 ms are the time constants of membrane potentials and synaptic events, respectively; where tk is the spike time of presynaptic neurons; where wij are synaptic strengths listed in Table 1; where C=250 pF is the capacitance of the membrane. The NEST’s native neuron model ‘iaf_psc_exp’ is used to implement this neuron (Gewaltig and Diesmann, 2007); see also (Potjans and Diesmann, 2014) for additional information. The synaptic weights are fixed during the entire simulation; that is, we ignored the plasticity of synapses. The external background spike trains are mediated by synapses whose strengths are 200 pA.
Model structure
Our model implements three cortical areas and their interactions to address our hypothesis. Each cortical area consists of identical LIF neurons and has multiple populations of excitatory and inhibitory neurons, which was inspired by cortical network models proposed earlier; see (Ardid et al., 2007; Compte et al., 2000) for instance. Here we assign specific functions to the populations in each area. All these three areas interact with one another via connections illustrated in Fig. 1. Specifically, ACC and PFC are reciprocally connected (Paus, 2001), and MC projects nonspecific inputs to ACC. In all three areas, excitatory populations consist of 400 neurons, whereas inhibitory populations consist of 100 neurons; for clarity, the raster plots illustrated in Results show 10% of neurons. All connections are drawn randomly using the connection probabilities and synaptic weights listed in Table 1.
The model ACC consists of three excitatory and two inhibitory populations. The first two excitatory populations named PT and TP aim to simulate ACC neurons observed in the experiment, which were sensitive to the sequence of motor responses prior to and post reward reduction (Shima and Tanji, 1998); TP and PT stands for ‘Turn to Push’ and ‘Push to Turn’, respectively. The two inhibitory populations TPi and PTi are exclusively connected to either TP or PT and provide feedforward inhibition onto the target population (Fig. 1). The third NS (standing for non-selective) population receives excitatory inputs from both PT and TP and sends excitation to TPi and PTi, which means that NS provides indirect feedback inhibition to PT and TP. ACC is known to have dopamine receptors, and we considered the effects of D2 receptors in ACC. As D2 receptors in ACC induce inhibition (Darvish-Ghane et al., 2016), we simulated the effects of dopamine by providing inhibitory currents to all ACC neurons (-250 pA). These inhibitory neurons are removed when the reward reduction is assumed.
The model PFC consists of two excitatory and two inhibitory populations. The two populations P and T are assumed to represent motor plans ‘Push’ and ‘Turn’, respectively. These neurons are connected with strong recurrent connections within P and T (Table 1) so that they could generate persistent activity which is believed to underlie working memory (Baddeley, 2003; Miller and Cohen, 2001; Wang et al., 2013; Wilson et al., 1993). P and T also interact with each other and with inhibitory populations (Fig. 1).
The model MC has a similar structure as that of PFC. As seen in Fig. 1, the two excitatory populations (P and T) interact with two inhibitory populations. P and T populations receive afferent inputs from P and T population in PFC inspired by the influence of PFC on motor areas (Tanji and Hoshi, 2008). P and T are assumed to represent actual motor response, and the activation of P (T) population is interpreted as Push (Turn) response. In addition, we assumed that these two excitatory populations receive afferent inputs elicited by visual cues; see (Salinas and Romo, 1998) for the effects of sensory inputs onto motor cortex.
Simulation protocol
We mimicked the protocol used in the experiment (Fig. 1). Specifically, we simulated two trials prior to the onset of reward reduction and a single post trial. We performed simulations in two simulation conditions. In the reduced reward (RW) condition, the three visual cues are presented at 200, 1200 and 2200 msec, and the reward is assumed to be reduced at 2000 msec. Specifically, we removed inhibitory inputs to ACC neurons. We also performed the continuous reward (CW) as the control experiment, in which reward level is kept constant during the entire simulation period.
To examine the robustness of our results to stochasticity, 100 independent (but statistically equivalent) networks were instantiated using the same connectivity rule (Table 1), and 100 independent simulations were conducted. In each simulation, the background inputs to neurons in the model are independently generated using the rate given in Table 2.
Time course of PFC activity
To track the change in motor plans in PFC, we compared the outputs between T and P in PFC. Specifically, we split the spikes of PFC neurons into the 50 ms-long bins, which do not overlap one another. Then, we calculated the selective index (SI) using the binned firing rates (Pn, Tn) of P and T populations using Eq. 2 , where tn is the center of nth bin. When SI values are positive, T population produces stronger responses than P population. In contrast, SI values become negative, P population produces stronger responses. In the simulations, P and T populations are activated exclusively with one another due to the di-synaptic mutual inhibition (Table 1); once one of the populations is activated, its activity is persistent.
Acknowledgments
We wish to thank the Allen Institute founders, Paul G. Allen and Jody Allen, for their vision, encouragement and support.