Common features in plastic changes rather than constructed structures in recurrent neural network prefrontal cortex models

Satoshi Kuroki; Takuya Isomura

doi:10.1101/181297

Abstract

We have flexible control over our cognition depending on the context or surrounding environments. The prefrontal cortex (PFC) controls this cognitive flexibility; however, the detailed underlying mechanisms remain unclear. Recent developments in machine learning techniques have allowed simple recurrent neural network PFC models to perform human- or animal-like behavioral tasks. These systems allow us to acquire parameters, which we could not in biological experiments, for performing the tasks. We compared four models, in which a flexible cognition task, called context-dependent integration task, was performed; subsequently, we searched for common features. In all the models, we observed that high plastic synapses were concentrated in the small neuronal population and the more concentrated neuronal units contributed further to the performance. However, there were no common properties in the constructed structures. These results suggest that plastic changes can be more general and important to accomplish cognitive tasks than features of the constructed structures.

Introduction

Humans can generate and flexibly switch cognitive sets depending on the situation even with the same sensory-sensory or sensory-motor associations. The prefrontal cortex (PFC) controls cognitive flexibility, also the called executive function (Miller & Cohen, 2001; Nobre & Kastner, 2014). Although the control mechanism of executive function has been studied using animal models, the detailed underlying mechanism remains unclear. One of the reasons is the limited number of observable and manipulatable biological variables of the brain. Recent developments in simple recurrent neural network (RNN) computational models allowed the study of behavioral tasks and measurement of cognitive flexibility in animal models. The network activities resembled those the PFC activities during the performance of similar behavioral tasks (Mante, Sussillo, Shenoy, & Newsome, 2013; Miconi, 2017; Song, Yang, & Wang, 2016, 2017). With this novel computational method, we are now able to analyze nearly all the parameters involved in behavioral tasks.

Mante et al. developed a behavioral task for evaluating the cognitive flexibility of monkeys by modifying a random-dot motion task and adding color factors to the motion factors (Mante et al., 2013), called context-dependent integration task (Song et al., 2016). For the task, the monkey needed to select one of two choices depending on colored dots moving randomly on a screen. The contextual information of the task was changed regarding whether the monkey should select a correct answer based on color or motion (Figure 1A). In addition, Mante et al. constructed an RNN model for performing the task, in which populational activities during the task were similar to those of the monkey’s PFC neurons. However, the optimization method used for the model was hessian-free (HF) optimization (HF model) (Martens, 2010; Martens & Sutskever, 2011), which is not sufficiently biologically plausible.

Figure 1:

(A) Context-dependent integration task applied to a monkey. The left panel shows a schematic of the task. At the center of the screen, the green or red dots move randomly from right to left. The monkey should choose between the left green or right red option by saccade while referring to the central moving dots. The reference (color or motion) for the correct answer depended on the contextual cue at the bottom of the screen. The right panel indicates a variety of colored moving dot patterns and contextual cues.(B) Schematic image of RNN PFC models. All models consist of input sensory and contextual information and output choices. In the RNN, neuronal units are connected to each other.

Aiming for more biologically plausible models, a few groups suggested other RNN models for performing context-dependent integration tasks. Song et al. proposed an RNN model (pycog model) (Song et al., 2016), which comprised separate excitatory and inhibitory neuronal units and involved the use of a simpler optimization method than HF, namely, a modified stochastic gradient descent (SGD) (Pascanu, Mikolov, & Bengio, 2013). In addition to context-dependent integration tasks, the model allowed the study of several PFC dependent behavioral tasks.

Song et al. constructed another RNN model (pyrl model), which consisted ofa policy (choosing next behaviors) and a baseline network (evaluating feature rewards) by which learning was reinforced with reward signals and not error signals between actual and optimal outputs (Song et al., 2017). This method was built with policy gradient reinforcement learning, particularly the REINFORCE algorithm for RNN (Wierstra, Forster, Peters, & Schmidhuber, 2010; Williams, 1992), and the structure was known as an actor-critic structure (Sutton & Barto, 1998). The system optimized the baseline network to estimate the future rewards in each situation and the policy network to make an optimal choice in order to maximize the future rewards. This model consisted of value-evaluating tasks as well as cognitive tasks.

Miconi introduced the reward-modulated Hebbian rule as an optimization method (rHebb model) (Miconi, 2017). He introduced a reward-modulated Hebbian rule and node-perturbation method for learning (Fiete, Fee, & Seung, 2007), assuming that the algorithm is more biologically plausible than HF or SGD. The system also allowed the study of several cognitive tasks.

We compared the synaptic weight structures of these four RNN models (HF, pycog, pyrl, and rHebb models) while performing context-dependent integration tasks (Figure 1B) and searched for common features among these models, assuming that the general features could be important to perform the tasks. Although we could not detect any common structures among the learned network structures, we found interesting features in plastic change from the initial network state to the last learned state, which was conserved among all models: (1) plastic synapses were concentrated in small populations of units, particularly postsynaptically, meaning that the unital distribution was highly skewed; (2) distributions of the synaptic weight changes had high kurtosis. Additionally, the highly plastic units had greater contributions to performing behaviors than the low plastic units in the HF, pycog, and pyrl but not in the rHebb model. As far as we know, this study is the first to report a detailed structural analysis of the RNN PFC models. The present results indicate that plastic changes secondary to task learning can be more important than the last established structure of the system and can give us new insights on biological and computational neuroscience research.

Results

Analysis of the constructed structures

At first, we confirmed whether each system could allow the study of context-dependent integration task (Figure 2A). All systems showed psychometric curves (the relationship between sensory inputs and behavior responses), which changed depending on the context information. More than 85% of the results were correct. Next, the synaptic weight values of each learned system were visualized (Figure 2B). We used excitatory-excitatory (E-E) connections in the pycog model for the analysis because this is the most popular connection in the model. Additionally, we used the policy network in the pyrl model because the baseline network was not related to the choice behavior even though it is important to learn the task (see Materials and Methods). We detected line patterns in the HF model, meaning that high negative or positive weight seemed to be concentrated in few portions of the neuronal units, particularly postsynaptically. However, we could not find any specific patterns in the other models.

Figure 2:

(A) Task performance and (B) synaptic weight values from pre- (horizontal axis) to post-unit (vertical axis) of each system after task learning. We plotted distributions of the synaptic weight values (Figure 3 and Table 1). HF and rHebb had a Gaussian distribution as initial states. The HF system showed non-Gaussian high-kurtosis distribution after task learning, whereas rHebb still showed Gaussian distribution. The pycog and pyrl models did not show Gaussian distributions even from the initial states (see Materials and Methods). The initial distributions are shown in Figure 3-Figure supplement 1.

Figure3-Figure supplement 1:

Initial synaptic weight distribution of each model.

Because both negative and positive high weights seemed concentrated in sparse unital populations postsynaptically in the HF model, we sorted the units by means of absolute weight values per post-units (defined as post-unital mean weight) and confirmed that high weights were postsynaptically concentrated to a few units in the HF model (Figure 4A). To quantify the weight concentration in the sparse population, we analyzed the distribution of the post-unital mean weight values. The distribution of the HF model showed a highly skewed distribution (Figure 4B and Table 2). The other models did not show that kind of clear weight concentration. In conclusion, with learned network analysis, we could not find any common features in static network structures across all four models, even though all of them succeeded at the learning task.

View this table:

Table 1:

Distribution properties in learned network weights

View this table:

Table 2:

Distribution properties in post-unital mean weight values of the learned network

Figure 4:

(A) Weight value color plots sorted by post-unital mean weights. (B) Distribution of post-unital mean weights. Doted red line indicates normal distribution with mean and sigma of the values.

Analysis of plastic changes with task learning

We then analyzed synaptic weight changes by task learning defined as the difference between the initial and final weight values. We found that the weight changes of all models showed line pattern-like HF learned weight distributions as shown in Figure 2B. For a few units, the weight changes tended to be larger either as increases or decreases compared with other units. We sorted the neural units by post-unital mean weight changes (Figure 5A), similar to the post-unital mean weight values of the learned structures (Figure 4A). All the models showed concentrated weight changes in sparse neural populations. By quantitatively analyzing the distribution of post-unital mean weight changes, we found that all models showed highly skewed distributions. This indicates that the networks consisted of a small number of high-plasticity units and a large number of low-plasticity units (Figure 5B and Table 3). Notably, only 10% of the synapses in the pyrl model were plastic (the other synapses did not change through learning) as a default setting.

View this table:

Table 3:

Distribution properties in post-unital mean weight changes

Figure 5:

(A) Color plots of weight changes sorted by post-unital mean weight changes. (B) Distribution of post-unital mean weight changes. The dotted red line indicates a normal distribution of the mean and sigma of the values.

In addition, the weight difference distributions of all the models were high-kurtosis distributions as results of the statistical tests, while the shapes of the distributions of pyrl and rHebb model were close to a Gaussian distribution (Figure 6 and Table 4). Pycog and pyrl models had other network connections, such as inhibitory networks in pycog and baseline networks in pyrl. Most of these connections similarly showed that large weight changes were concentrated in the restricted populations of neuronal units (Figure5-Figure supplement 1).

Figure5-Figure supplement 1:

Weight change plots of pycog I-I, I-E, E-I, pyrl baseline-baseline and policy-baseline connections, sorted by post-unital mean weight changes.

View this table:

Table 4 Distribution properties in weight changes

Figure 6:

Distribution of weight changes. The dotted red line indicates a normal distribution of the mean and sigma of the values.

We validated whether units with higher plasticity had higher contributions to behavior performances. The fixed number of units in each model was inactivated (n_inact) with ascending (starting from low-plasticity units), descending (starting from high-plasticity units) and shuffled order (sort_type) based on sorted values as shown in Figure 5A while the behavioral task was being performed (Figure 7 and Table 5). The “n” in Table 5 indicates the number of systems used for the test with different initial settings from different random seeds. All the models showed significant differences in behavior performances among sort_type x n_inact interaction and/or sort_type in a two-way analysis of variance (ANOVA), but there was no significant difference among sort type in rHebb model with post-hoc multiple comparisons. These results indicated that units with higher plasticity in HF, pycog, and pyrl models made larger contributions to task performance; however, the rHebb model presented redundancy for the loss of the high-plasticity units.

View this table:

Table 5:

Two-way ANOVA results of the inactivation experiments

Figure 7:

Accuracies of task performances in the unital inactivation experiment.

The features of weight change distribution could be altered depending on the initial network state or behavioral tasks. We checked the weight distributions for the different behavioral tasks and initial conditions. Most of them showed comparable results, which have highly skewed post-unital mean weight change distributions and high-kurtosis weight change distributions (Figure5-Figure supplement 2 and 3, Figure5-Table supplement 1 and 2). The weight distribution of the pycog model was initially a uniform distribution; however, after learning, it became a highly skewed distribution. In contrast, the weight distribution of the pyrl model initially had a Gaussian distribution, and its distribution remained Gaussian after learning (Figure5-Figure supplement 3). We also studied the rHebb model, which initially had a uniform distribution. However, the learning of the task could not be sufficiently converged (data not shown).

Figure5-Figure supplement 2:

Sorted weight changes plot (left), post-unital mean weight change (middle) and weight change (right) distributions performing different cognitive tasks. References for each task are shown below (working memory (Romo, Brody, Hernández, & Lemus, 1999), random dot motion (Gold & Shadlen, 2007), multisensory (Raposo, Kaufman, & Churchland, 2014), delayed nonmatch to sample (Simola et al., 2010)).

Figure5-Figure supplement 3:

Sorted weight change plot (upper left), post-unital mean weight change (upper middle), weight change (upper right) distributions with different initial weight distributions. Lower left and right panels show the initial and last weight value distributions, respectively.

View this table:

Figure5-Table supplement 1:

Properties in post-unital mean weight change distributions in different initial states and tasks

View this table:

Figure5-Table supplement 2:

Distribution properties in weight changes in different initial states and tasks

Discussion

We analyzed static network structures of four RNN models for performing context-dependent integration tasks. We found that the structure of synaptic changes has some common features, i.e., high skewness in post-unital mean weight change distributions (Figure 5) and high kurtosis in weight change distributions (Figure 6), in contrast to no similarities in the structures learned last (Figures 3 and 4). The high-plasticity units were large contributors to the behavioral performance tasks in most models (Figure 7). These detailed data acquisitions and modifications may not have been achieved without the computational models. Additionally, as far as we know, detailed analyses of the RNN PFC models, such as the ones reported here have not been performed previously. These results indicate that plastic changes may be more important to perform cognitive tasks than accomplished network structures, which may have an impact on both computational and biological viewpoints of future research.

Figure 3:

Weight distribution of each system after learning. The dotted red line in the HF and rHebb models represents a normal distribution with mean and sigma similar to the weight values

We found long-tailed distributions of plastic changes at the neuronal unit and synapse level as common features in all the models. In the biological brain, it is known that gene expression, which induces plastic changes in the neuron, is sparse in the cerebral cortex and hippocampus. In addition, it has been hypothesized that the plastic changes of a small neuronal population mainly represents learning and memory, namely, the engram hypothesis (Hebb, 1949; Tonegawa, Pignatelli, Roy, & Ryan, 2015). Additionally, at the synapse level, only a small population showed plastic changes with learning (Hayashi-Takagi et al., 2015; Yang, Pan, & Gan, 2009). In the neural network, protecting high-informative synapse while learning a new task can also prevent catastrophic forgetting of previously learned tasks (Kirkpatrick et al., 2017). Most of our results have comparable properties with the results of biological studies at both the neuronal (unital) and synaptic levels (Figure 5 and 6) and support the engram hypothesis (Figure 7).

Regarding the learned network structure, we could not find any similarities among the RNN models. It is well-known that in real brains, the distribution of excitatory postsynaptic potential is a log-normal distribution (reviewed in Buzsáki & Mizuseki, 2014). This corresponds to the weight distributions of the HF (Figure 3) and pycog models (Figure5-Figure supplement 3). Both models showed a long-tailed shape distribution regardless of the initial distribution setting. A previous study reported a long-tailed post-mean unital distribution of learned networks with the HF method (Chaisangmongkon, Swaminathan, Freedman, & Wang, 2017). Furthermore, excitatory neurons in the biological cortex form subclusters with mutual connections (reviewed in Harris & Mrsic-Flogel, 2013). Weight changes of pycog showed a high synaptic plasticity between high-plasticity units (Figure 5A), which could be the basis of the mechanism to construct the neural subclusters. Comparing these observations with the biological brain, HF and pycog models, followed by pyrl, were more similar to the animal brain, and rHebb was the most dissimilar. The result was opposite to our assumption that rHebb and pyrl models are more biologically plausible than pycog and HF models.

One potential reason why HF and pycog were found to be more similar to the real brain is the relationship between spike-timing-dependent plasticity (STDP) and SGD such that the dynamics of these two can be similar during learning (Bengio, Mesnard, Fischer, Zhang, & Wu, 2015; Xie & Seung, 2000). STDP is a major plastic feature in the biological brain (reviewed in Feldman et al., 2012) and it is thought to be one of the reasons for the log-normal synaptic distribution (Buzsáki & Mizuseki, 2014). If this is true, the pycog (adopting SGD) and HF models (HF, an advanced method of SGD) are more biologically plausible as synaptic update mechanisms, and it is reasonable to think that they are moresimilar to the biological brain than the other models in the last structure. A spiking RNN model adopting STDP can also perform similar cognitive tasks (Klampfl & Maass, 2013).

The pyrl model also adopted SGD, and the basic assumption of the eligibility trace in the rHebb model (Equation 17 of this paper) seems to be similar to the assumption in the STDP model (Formula (1) or (2) in Bengio et al., 2015). However, the pyrl and rHebb models showed milder kurtosis in weight change distribution (Figure 6) and relative equal contribution of each unit to the task performance (Figure 7) compared with the HF and pycog models. The major difference between the two groups was whether the learning method was reward-based reinforcement or error (between the reference signal and system output)-based supervision. We thought that the difference in learning processes of each method, particularly during the initial stages, could cause the differences of the distributions. Error in supervised learning is the largest and relatively constant during the initial stages of learning. The responsible synapses for the error seem to be fixed. Conversely, rewards in reinforcement learning were occasionally supplied depending on the stochastic fluctuation of the network activities. The network updates based on stochastic activities and can randomly scatter contributions of the synapses to task performance, particularly during the initial learning stages. With these views, the network might get sparse by supervised learning (and STDP), while there may be less cell or synapse loss by reinforcement learning.

Another possible explanation for the highly skewed synaptic weight distribution observed for the biological brain is that it results from multiple experiences through life. Based on our results, the synaptic change distributions of all models were long-tailed distributions (Figure 6). Repeating the learning processes, even with mild plastic changes, such as those in the rHebb model, can result in a long-tailed distribution in the real brain.

Our results revealed that the features in plastic changes of weight values were more general than the learned weight structure itself. We can easily control weight distribution shapes of the learned structure of the computational models with additional terms of objective function such as the regularization term (Lee, Battle, Raina, & Ng, 2006), which can make the distribution sparse and long-tailed. These are possible reasons why the last structure learned is less common among the models than plastic changes, even though the initial distributions and terms are critical for the learning efficiency. The other properties of neural plasticity have been previously reported in the biological brain, such as synaptic homeostasis (reviewed in Pozo & Goda, 2010) and stabilization (reviewed in Koleske, 2013), which should correspond to regularization (Lee et al., 2006) and elastic weight consolidation (Kirkpatrick et al., 2017) in the neural network methods, respectively. Considering these factors, we can establish more biologically plausible computational models in future studies.

Common properties in synaptic change may be explained by the superlinearity of either the activation function or learning rule in each model. In the rHebb model, the superlinear functions of the learning rule (but not linear or sublinear ones) lead to sparse and precise synaptic change (Miconi, 2017), which can establish the high-kurtosis synaptic change distribution. Moreover, the pycog and pyrl models use the so-called rectifier activation function (ReLU) to calculate the firing rate. Such rectifier units act as a superlinear function and make neural activity sparse (Glorot, Bordes, & Bengio, 2011), resulting in only a few neurons to be active and plastic.

We focused on the context-dependent integration task to know the necessary structures involved in the process of achieving flexible cognition. Our findings, however, seem to apply more generally to the learning tasks than the cognitive flexibility task (Figure5-Figure supplement 2). In this study, we only analyzed the synaptic weight structures of RNN models. As a future work, we plan to also analyze the dynamics of unital activities while performing a task and the underlying learning process. These analyses will give us further insights on how the networks encode and establish the task information. Furthermore, theoretical investigation is also needed to understand the meaning of our findings and further develop the RNN models. Recent innovations in optimization RNN methods allowed the systems to perform cognitive tasks similar to human and model animals, which allows us to compare the processes of the biological and computational brains (Cadieu et al., 2014; Carnevale, de Lafuente, Romo, Barak, & Parga, 2015; Mante et al., 2013; Sussillo, Churchland, Kaufman, & Shenoy, 2015; Yamins et al., 2014). Merging such knowledge, methods, and ways of thinking in both fields that study cognitive tasks must improve our understanding of how our brain works.

Materials and methods

Model descriptions

The parameter settings were set to default values based on previous reports and scripts (Mante et al., 2013; Miconi, 2017; Song et al., 2016, 2017). All the models were constructed as follows: where τ ∈ ℝ is the time constant, corresponds to themembrane potentials of recurrent neuronal units at discrete time step t, N_rec is the number of recurrent units, represents the firing rate and is calculated by applying the rectified linear activation function (r(t) = x(t) for x(t) Ȁ and r(t) = 0 otherwise; for the pycog and pyrl models) or the hyperbolic tangent function (r(t) = tanh(x(t)); for the HF and rHebb models) to x(t). u(t) ∈ ℝ^Nin is an (external) task input consisting of sensory andcontextual information, N_in is the number of input units (four channels in the HF and rHebb models and six channels in the pycog and pyrl models), w_rec ∈ and W_in ∈ are the synaptic weight matrices from recurrent and task inputs to each recurrent unit, respectively. is the offset constant of recurrentunits, and is the noisy fluctuation of each unit following a Gaussian distribution. For the pyrl model, we used a gated recurrent unit (Chung, Gulcehre, Cho, & Bengio, 2014) and Equation 1 was modified (see section pyrl model). Note that we used N_rec =100 for HF, N_rec = 150 for pycog (120 excitatory and 30 inhibitory units), and N_rec = 200 for rHebb. For pyrl, we used N_rec = 100 for the policy network and N_rec = 100 for the baseline network.

The use readout units of HF, pycog, and pyrl were as follows: where is the output of the system, N_out is the number of the output units (one channel in the HF model, two channels in the pycog model, and three channels in the pyrl model), is the synaptic weight matrix from recurrent units to readout units, and his the offset constant of the readout unit. By contrast, the rHebb model used an arbitrary recurrent unit as an output. The choices of the system were represented as the signs of the output unit (z(t) in the HF model and arbitrary unit of r(t) in the rHebb model; one channel in total) or the highest channel (among two channels in the pycog model and among three channels in the pyrl model) of the output units. Only the pyrl model had additional choices (like “stay”) to choice 1 and choice 2. The N_in, N_rec, and N_out of each model are summarized in Table 6.

View this table:

Table 6:

Number of input, recurrent, and output units of each model

Task descriptions

The task inputs, u(t) in Equation 1, consisted of two sets of sensory and two sets of contextual information. Sensory inputs were as follows: where u_m(t) ∈ ℝ¹ ^or ² and u_c(t) ∈ ℝ¹ ^or ² are the motion and color sensory inputs, respectively, d_m ∈ ℝ¹ ^or ² and d_c ∈ ℝ¹ ^or ² are the offsets, and ρ_m(t) ∈ ℝ¹ ^or ² and ρ_c(t) ∈ ℝ¹ ^or ² are Gaussian noises with zero mean. The amplitudesof d_m and d_c represent motion and color coherence. Choice 1 or choice 2 was represented as a plus or minus signs of d_m and d_c, respectively, in the HF and rHebb models (two channels in total), and as independent channel inputs in the pycog and pyrl models (four channels in total). The contextual information was modeled with two inputs, u_cm(t) ∈ {0,1} and u_cc(t) ∈ {0,1}: u_cm(t) = 1 and u_cc(t) = 0 in the motion context and u_cm(t) = 0 and u_cc(t) = 1 in the colorcontext at every time step t.

HF model

The HF model was implemented based on a previous study (Mante et al., 2013). HF optimization was mounted with modifying scripts written by BL Nicolas on Github (https://github.com/boulanni/theano-hf) (Boulanger-Lewandowski, Bengio, & Vincent, 2012).

HF optimization (Martens, 2010; Martens & Sutskever, 2011; Shewchuk, 1994) was processed by minimizing the following objective function ε(θ).

where

Note that θ is the vectorized parameter for the optimization, includingW_rec, W_in, W_out, b_x and b_z, where L(θ) indicates the error between the target and actual output (zn(t, θ))of the system at the first and last timepoints (t = 0, T) of all trials (N_trial). R(θ) is the term of structural damping to prevent disruptive changes of recurrent unital activities because a small perturbation of the recurrent units can result in a very large output difference (Martens & Sutskever, 2011). λ_R > 0 determines the degree of the R(θ) penalty, and its value is determined using the Levenberg-Marquardt algorithm (Nocedal & Wright, 1999), and D(r(θ), r(θ + ∆θ)) is the distance (cross entropy in our script) between outputs of recurrent units with parameter θ and θ + ∆θ. The target outputs of the last time points (t = T) were considered correct by setting the target output 1 and −1 to correspond with choice 1 and choice 2, respectively, the target outputs and of the first time points (t = 0) were set to 0 regardless of the correct choices to fix the initial points of the output transients. The minimum of the object function was calculated with HFoptimization (Martens, 2010; Martens & Sutskever, 2011).

The values N_in = 4, N_rec = 100, and N_out = 1 were used in the HF model. Because of memory capacity, the time step was 25 times larger than the original setting (Δt = 25 ms). The standard deviations of noises reported previously (Mante et al., 2013) were decreased in the present study to the following equation: Std[ρ_x] = 0.004 in Equation 1 and Std[ρ_m], Std[ρ_c] = 0.04 in Equation 3 and Equation 4, respectively.

Pycog model

The pycog model was obtained from Github (https://github.com/xjwanglab/pycog) and was run in its original setting (Song etal., 2016). This system learns tasks with modified SGD (Pascanu et al., 2013) by minimizing the following objective function ε(θ).

where θ is the vectorized parameter set for the optimization, N_trials is thenumber of trials, and Ln(θ) is the error between the actual and target outputs(z (t, θ) ∈ ℝ² and , respectively) through trial (T) and thenumber of output units (N_out) given by where is the error mask consisting of 0 or 1 and determines whether the error at time point t should be taken into account (in a context-dependent integration task; only the last output is considered). Ωn(θ) in Equation 8 is a regularization term for preserving the size of the gradients as an error during learning processes, and λΩ determines the effects of the regularization.

The values N_in = 6, N_rec = 150 and N_out = 2 were used in the pycog model. Of note, this system includes both excitatory and inhibitory units at an excitatory to inhibitory ratio of 4:1, indicating that the number of excitatory units is 120 and that of inhibitory units is 30. For our network analyses, we mainly used E-E connections. The initial weight distribution of the default setting is a gamma distribution and multiplied signs depending on the input unit types (excitatory or inhibitory). Applying a uniform distribution as an initial weight distribution, the minimum and maximum are 0 and 1, respectively.

pyrl model

The pyrl model was also obtained from Github (https://github.com/xjwanglab/pyrl) and was run in its original setting (Song et al.,2017). The network consists of policy and baseline RNN, in which the nodes are gated recurrent units (Chung et al., 2014). We modified Equation 1 as follows: Where and are the updat and reset gates, respectively, and are the weight matrices from recurrent units, and are the weight matrices from task inputs, and are the offset constants of update gates λ(t) and γ(t), respectively, and ⊙denotes an element-wise product (Hadamard product). λ(t) and γ(t) are nonlinearized with sigmoid function. A policy recurrent network takes sensory, context, and inner recurrent inputs and selects the next action at every time step. The outputs are normalized with the softmax function . A baseline network receives recurrent unit activities (r(t)) and choice outputs (π(t)) of the policy network, and predicts the sum of reward values through the trial.

The policy network aims to maximize the expected future rewards, which areoptimized using the REINFORCE algorithm (Wierstra et al., 2010; Williams, 1992).

where θ is the vectorized parameter set of the policy network for the optimization, ε^π(θ) is the objective function of the policy network, N_trials is the number of trials, j_n(θ) is the expected reward prediction, is regularization term, and E_H represents the expectation of reward R_t over all possible trial histories H.

The baseline network minimizes the difference between the actual and the estimated reward values through the trial.

where θ is the vectorized parameter set of the baseline network for the optimization, ℰ^ν(θ) is the objective function of the baseline network, E_n(θ) is the error between the actual reward R_τ (a correct decision is rewarded with R_τ = 1, if incorrect R_τ = 0, and the duration of breaking the fixation before the decision is negatively rewarded with R_τ = −1), is the expected (readout) reward prediction of the baseline network under recurrent unit activities and choice (π_1:t) of the policy network through a trial (T), and is regularization term of the baseline network. This system uses Adam SGD (Kingma & Ba, 2015) with gradient clipping (Pascanu et al., 2013) for optimization.

The values N_in = 6 (task inputs), N_rec = 100, and N_out = 3 (choice) were used in the policy network, and the values N_in = 103 (r(t) and π(t) of the policy network), N_rec = 100, and N_out = 1 (readout reward prediction) were used in the baseline network. We used the policy network for the main analysis of this study because the baseline network was not involved in performing the task (the baseline network is critical for the optimization of the system). The default initial weight distribution was obtained from the gamma distribution with multiplying signs randomly in both policy and baseline networks. The plastic synapses were set at 10% of all synapses and the other synaptic weights were fixed in default through the training. We only used plastic synapses for the weight change distribution analysis. This system had three output choices (choice 1, choice 2, or stay) although the other models had only two choices (choice 1 or choice 2). When a normal distribution was used as an initial weight distribution, the mean and standard deviation are 0 and 1, respectively

rHebb model

TherHebbmodelwasobtainedfromgithub(https://github.com/ThomasMiconi/BiologicallyPlausibleLearningRNN) and wasbasically run in its original setting (Miconi, 2017). The network pools Hebbian-like activity in every time step as follows: where e_i,j(t) is the accumulated eligibility trace of the synapse i (pre) and j (post), S is the monotonic superliner function (in this case S = x³), r_j(t) is theoutput of unit j, x_i(t) is the membrane potential of unit at t, and is theshort-time running average of x_i. The synaptic weights are modulated with the pooled value and reward error at the end of every trial as follows: where ∆W_i,j is the change of synaptic weight between i and j, η is the learning rate, R is the reward from the environment (absolute difference fromoptimal response is supplied as negative reward), and is the average of previously received reward. Three units are used as constant input unit in default. The output and constant input units are excluded from the weight change distribution analysis.

The values N_in = 4 and N_rec = 200 and N_out = 1 were used in rHebb model. The output of this system is an activity of an arbitrarily chosen unit from the recurrent units.

Statistical analysis of distributions

Python libraries, numpy, scipy, statsmodels, matplotlib, seaborn, and Jupyter were used for statistical analysis. Shapiro-Wilk normality test was applied to check the normality of the distributions, which was implemented as a scipy function. Kurtosis and skewness were tested as scipy functions, stats.kurtosistest and stats.skewtest (https://docs.scipy.org). One-way ANOVA, two-way ANOVA and multiple comparisons (Tukey honestly significant difference) were performed with the python library, statsmodels (http://www.statsmodels.org)

Neuronal unit inactivation

Scripts were modified as shown below. Selected unit outputs were set to 0 (pycog, pyrl, and rHebb model) or a constant value (HF model, b_x) in every recurrent loop. The number of inactivation units was increased in increments of 10 and the orders were sorted in ascending, descending, or shuffled order of the postsynaptic mean of absolute synaptic weight changes (Figure 5A). The trial settings, including the number of trials and offset and noise settings of sensory inputs, also followed default conditions in each script.

Source scripts

Codes used in this work are available at https://github.com/sakuroki/flexible_RNN.

Acknowledgment

We thank Naoki Hiratani for technical advice on RNN optimization, Taro Toyoizumi for valuable contributions to the discussion, and Shigeyoshi Itohara for allowing us to publish this study based on work done at his laboratory.

Author contributions

SK, Conceptualization, Software, and Writing; TI, Theoretical analysis and Writing

References

↵
Bengio, Y., Mesnard, T., Fischer, A., Zhang, S., & Wu, Y. (2015). STDP as presynaptic activity times rate of change of postsynaptic activity. arXiv, http://arxiv.org/abs/1509.05936.
↵
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In Proc. 29th Int. Conf. Machine Learn.
↵
Buzsáki, G., & Mizuseki, K. (2014). The log-dynamic brain: how skewed distributions affect network operations. Nature Reviews Neuroscience, 15(4), 264–278. https://doi.org/10.1038/nrn3687
OpenUrl CrossRef PubMed
↵
Cadieu, C. F., Hong, H., Yamins, D. L. K., Pinto, N., Ardila, D., Solomon, E. A., DiCarlo, J. J. (2014). Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLoS Computational Biology, 10(12), e1003963. https://doi.org/10.1371/journal.pcbi.1003963
OpenUrl CrossRef
↵
Carnevale, F., de Lafuente, V., Romo, R., Barak, O., & Parga, N. (2015). Dynamic Control of Response Criterion in Premotor Cortex during Perceptual Detection under Temporal Uncertainty. Neuron, 86(4), 1067–1077. https://doi.org/10.1016/j.neuron.2015.04.014
OpenUrl CrossRef PubMed
↵
Chaisangmongkon, W., Swaminathan, S. K., Freedman, D. J., & Wang, X.-J. (2017). Computing by Robust Transience: How the Fronto-Parietal Network Performs Sequential, Category-Based Decisions. Neuron, 93(6), 1504–1517.e4. https://doi.org/10.1016/j.neuron.2017.03.002
OpenUrl
↵
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS 2014 Workshop on Deep Learning.
↵
Feldman, D. E. (2012). The spike-timing dependence of plasticity. Neuron, 75(4),556–71. https://doi.org/10.1016/j.neuron.2012.08.001
OpenUrl CrossRef PubMed Web of Science
↵
Fiete, I. R., Fee, M. S., & Seung, H. S. (2007). Model of Birdsong Learning Based on Gradient Estimation by Dynamic Perturbation of Neural Conductances. Journal of Neurophysiology, 98(4).
↵
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. In 14th International Conference on Artificial Intelligence and Statistics (pp. 315–323).
↵
Harris, K. D., & Mrsic-Flogel, T. D. (2013). Cortical connectivity and sensory coding. Nature, 503(7474), 51–8. https://doi.org/10.1038/nature12654
OpenUrl CrossRef PubMed Web of Science
↵
Hayashi-Takagi, A., Yagishita, S., Nakamura, M., Shirai, F., Wu, Y. I., Loshbaugh, A. L., … Kasai, H. (2015). Labelling and optical erasure of synaptic memory traces in the motor cortex. Nature, 525(7569), 333–338. https://doi.org/10.1038/nature15257
OpenUrl CrossRef PubMed
↵
Hebb, D. (1949). The organization of behavior; a neuropsychological theory. New York: John Wiley & Sons, Inc.
↵
Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In. Int. Conf. Learn. Represent.
↵
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., … Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114(13), 3521–3526. https://doi.org/10.1073/pnas.1611835114
OpenUrl Abstract/FREE Full Text
↵
Klampfl, S., & Maass, W. (2013). Emergence of Dynamic Memory Traces in Cortical Microcircuit Models through STDP. Journal of Neuroscience, 33(28).
↵
Koleske, A. J. (2013). Molecular mechanisms of dendrite stability. Nature Reviews Neuroscience, 14(8), 536–550. https://doi.org/10.1038/nrn3486
OpenUrl CrossRef PubMed
↵
Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Proceedings of the 19th International Conference on Neural Information Processing Systems (pp. 801–808). MIT Press.
Mante, V., Sussillo, D., Shenoy, K. V, & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature, 503(7474), 78–84. https://doi.org/10.1038/nature12742
OpenUrl CrossRef PubMed Web of Science
↵
Martens, J. (2010). Deep learning via Hessian-free optimization. In Proc. 27th Int.Conf. Machine Learn.
↵
Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with hessian-free optimization. In Proc. 28th Int. Conf. Machine Learn.
↵
Miconi, T. (2017). Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks. eLife, 6, 229–256. https://doi.org/10.7554/eLife.20899
OpenUrl CrossRef
↵
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202. https://doi.org/10.1146/annurev.neuro.24.1.167
OpenUrl CrossRef PubMed Web of Science
↵
Nobre, A. C., & Kastner, S. (Eds.). (2014). The Oxford Handbook of Attention. New York: Oxford University Press.
↵
Nocedal, J., & Wright, S. J. (Eds.). (1999). Numerical Optimization. New York: Springer-Verlag. https://doi.org/10.1007/b98874
↵
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proc 30th Int Conf Mach Learn. https://doi.org/10.1109/72.279181
↵
Pozo, K., & Goda, Y. (2010). Unraveling mechanisms of homeostatic synaptic plasticity. Neuron, 66(3), 337–51. https://doi.org/10.1016/j.neuron.2010.04.028
OpenUrl CrossRef PubMed Web of Science
↵
Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Carnegie Mellon University.
↵
Song, H. F., Yang, G. R., & Wang, X.-J. (2016). Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLOS Computational Biology, 12(2), e1004792. https://doi.org/10.1371/journal.pcbi.1004792
OpenUrl
↵
Song, H. F., Yang, G. R., & Wang, X.-J. (2017). Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife, 6, 679–684. https://doi.org/10.7554/eLife.21492
OpenUrl
↵
Sussillo, D., Churchland, M. M., Kaufman, M. T., & Shenoy, K. V. (2015). A neural network that finds a naturalistic solution for the production of muscle activity. Nature Neuroscience, 18(7), 1025–1033. https://doi.org/10.1038/nn.4042
OpenUrl CrossRef PubMed
↵
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning?: an introduction.MIT Press.
↵
Tonegawa, S., Pignatelli, M., Roy, D. S., & Ryan, T. J. (2015). Memory engram storage and retrieval. Current Opinion in Neurobiology, 35, 101–109. https://doi.org/10.1016/j.conb.2015.07.009
OpenUrl CrossRef PubMed
↵
Wierstra, D., Forster, A., Peters, J., & Schmidhuber, J. (2010). Recurrent policy gradients. Logic Journal of IGPL, 18(5), 620–634. https://doi.org/10.1093/jigpal/jzp049
OpenUrl CrossRef
↵
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. https://doi.org/10.1007/BF00992696
OpenUrl CrossRef
↵
Xie, X., & Seung, H. S. (2000). Spike-based learning rules and stabilization of persistent neural activity. In Advances in Neural Information Processing Systems 12 (pp. 199–208).
OpenUrl
↵
Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J.J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8619–24. https://doi.org/10.1073/pnas.1403112111
OpenUrl Abstract/FREE Full Text
↵
Yang, G., Pan, F., & Gan, W.-B. (2009). Stably maintained dendritic spines are associated with lifelong memories. Nature, 462(7275), 920–924 https://doi.org/10.1038/nature08577
OpenUrl CrossRef PubMed Web of Science

Supplemental References

↵
Gold, J. I., & Shadlen, M. N. (2007). The Neural Basis of Decision Making. Annual Review of Neuroscience, 30(1), 535–574.
OpenUrl CrossRef PubMed Web of Science
↵
Raposo, D., Kaufman, M. T., & Churchland, A. K. (2014). A category-free neural population supports evolving demands during decision-making. Nature Neuroscience, 17(12), 1784–1792
OpenUrl CrossRef PubMed
↵
Romo, R., Brody, C. D., Hernández, A., & Lemus, L. (1999). Neuronal correlates of parametric working memory in the prefrontal cortex. Nature, 399(6735), 470–473.
OpenUrl CrossRef PubMed Web of Science
↵
Simola, N., Morelli, M.,Mizuno, T., Mitchell, S. H., de Wit, H., Curran, H. V Barnes, T. R. E. (2010). Delayed (Non)Match-to-Sample Task. In Encyclopedia of Psychopharmacology (pp. 372–372). Berlin, Heidelberg: Springer Berlin Heidelberg.

View the discussion thread.

Posted August 28, 2017.

Download PDF

Citation Tools

Subject Area

Neuroscience

Subject Areas

All Articles

Animal Behavior and Cognition (5215)
Biochemistry (11745)
Bioengineering (8752)
Bioinformatics (29200)
Biophysics (14972)
Cancer Biology (12096)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14182)
Epidemiology (2067)
Evolutionary Biology (18308)
Genetics (12245)
Genomics (16803)
Immunology (11869)
Microbiology (28085)
Molecular Biology (11592)
Neuroscience (60969)
Paleontology (451)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7340)
Zoology (1651)

[1] ↵
Bengio, Y., Mesnard, T., Fischer, A., Zhang, S., & Wu, Y. (2015). STDP as presynaptic activity times rate of change of postsynaptic activity. arXiv, http://arxiv.org/abs/1509.05936.

[2] ↵
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In Proc. 29th Int. Conf. Machine Learn.

[3] ↵
Buzsáki, G., & Mizuseki, K. (2014). The log-dynamic brain: how skewed distributions affect network operations. Nature Reviews Neuroscience, 15(4), 264–278. https://doi.org/10.1038/nrn3687
OpenUrl CrossRef PubMed

[4] ↵
Cadieu, C. F., Hong, H., Yamins, D. L. K., Pinto, N., Ardila, D., Solomon, E. A., DiCarlo, J. J. (2014). Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLoS Computational Biology, 10(12), e1003963. https://doi.org/10.1371/journal.pcbi.1003963
OpenUrl CrossRef

[5] ↵
Carnevale, F., de Lafuente, V., Romo, R., Barak, O., & Parga, N. (2015). Dynamic Control of Response Criterion in Premotor Cortex during Perceptual Detection under Temporal Uncertainty. Neuron, 86(4), 1067–1077. https://doi.org/10.1016/j.neuron.2015.04.014
OpenUrl CrossRef PubMed

[6] ↵
Chaisangmongkon, W., Swaminathan, S. K., Freedman, D. J., & Wang, X.-J. (2017). Computing by Robust Transience: How the Fronto-Parietal Network Performs Sequential, Category-Based Decisions. Neuron, 93(6), 1504–1517.e4. https://doi.org/10.1016/j.neuron.2017.03.002
OpenUrl

[7] ↵
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS 2014 Workshop on Deep Learning.

[8] ↵
Feldman, D. E. (2012). The spike-timing dependence of plasticity. Neuron, 75(4),556–71. https://doi.org/10.1016/j.neuron.2012.08.001
OpenUrl CrossRef PubMed Web of Science

[9] ↵
Fiete, I. R., Fee, M. S., & Seung, H. S. (2007). Model of Birdsong Learning Based on Gradient Estimation by Dynamic Perturbation of Neural Conductances. Journal of Neurophysiology, 98(4).

[10] ↵
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. In 14th International Conference on Artificial Intelligence and Statistics (pp. 315–323).

[11] ↵
Harris, K. D., & Mrsic-Flogel, T. D. (2013). Cortical connectivity and sensory coding. Nature, 503(7474), 51–8. https://doi.org/10.1038/nature12654
OpenUrl CrossRef PubMed Web of Science

[12] ↵
Hayashi-Takagi, A., Yagishita, S., Nakamura, M., Shirai, F., Wu, Y. I., Loshbaugh, A. L., … Kasai, H. (2015). Labelling and optical erasure of synaptic memory traces in the motor cortex. Nature, 525(7569), 333–338. https://doi.org/10.1038/nature15257
OpenUrl CrossRef PubMed

[13] ↵
Hebb, D. (1949). The organization of behavior; a neuropsychological theory. New York: John Wiley & Sons, Inc.

[14] ↵
Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In. Int. Conf. Learn. Represent.

[15] ↵
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., … Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 114(13), 3521–3526. https://doi.org/10.1073/pnas.1611835114
OpenUrl Abstract/FREE Full Text

[16] ↵
Klampfl, S., & Maass, W. (2013). Emergence of Dynamic Memory Traces in Cortical Microcircuit Models through STDP. Journal of Neuroscience, 33(28).

[17] ↵
Koleske, A. J. (2013). Molecular mechanisms of dendrite stability. Nature Reviews Neuroscience, 14(8), 536–550. https://doi.org/10.1038/nrn3486
OpenUrl CrossRef PubMed

[18] ↵
Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Proceedings of the 19th International Conference on Neural Information Processing Systems (pp. 801–808). MIT Press.

[19] Mante, V., Sussillo, D., Shenoy, K. V, & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature, 503(7474), 78–84. https://doi.org/10.1038/nature12742
OpenUrl CrossRef PubMed Web of Science

[20] ↵
Martens, J. (2010). Deep learning via Hessian-free optimization. In Proc. 27th Int.Conf. Machine Learn.

[21] ↵
Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with hessian-free optimization. In Proc. 28th Int. Conf. Machine Learn.

[22] ↵
Miconi, T. (2017). Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks. eLife, 6, 229–256. https://doi.org/10.7554/eLife.20899
OpenUrl CrossRef

[23] ↵
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202. https://doi.org/10.1146/annurev.neuro.24.1.167
OpenUrl CrossRef PubMed Web of Science

[24] ↵
Nobre, A. C., & Kastner, S. (Eds.). (2014). The Oxford Handbook of Attention. New York: Oxford University Press.

[25] ↵
Nocedal, J., & Wright, S. J. (Eds.). (1999). Numerical Optimization. New York: Springer-Verlag. https://doi.org/10.1007/b98874

[26] ↵
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proc 30th Int Conf Mach Learn. https://doi.org/10.1109/72.279181

[27] ↵
Pozo, K., & Goda, Y. (2010). Unraveling mechanisms of homeostatic synaptic plasticity. Neuron, 66(3), 337–51. https://doi.org/10.1016/j.neuron.2010.04.028
OpenUrl CrossRef PubMed Web of Science

[28] ↵
Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Carnegie Mellon University.

[29] ↵
Song, H. F., Yang, G. R., & Wang, X.-J. (2016). Training excitatory-inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLOS Computational Biology, 12(2), e1004792. https://doi.org/10.1371/journal.pcbi.1004792
OpenUrl

[30] ↵
Song, H. F., Yang, G. R., & Wang, X.-J. (2017). Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife, 6, 679–684. https://doi.org/10.7554/eLife.21492
OpenUrl

[31] ↵
Sussillo, D., Churchland, M. M., Kaufman, M. T., & Shenoy, K. V. (2015). A neural network that finds a naturalistic solution for the production of muscle activity. Nature Neuroscience, 18(7), 1025–1033. https://doi.org/10.1038/nn.4042
OpenUrl CrossRef PubMed

[32] ↵
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning?: an introduction.MIT Press.

[33] ↵
Tonegawa, S., Pignatelli, M., Roy, D. S., & Ryan, T. J. (2015). Memory engram storage and retrieval. Current Opinion in Neurobiology, 35, 101–109. https://doi.org/10.1016/j.conb.2015.07.009
OpenUrl CrossRef PubMed

[34] ↵
Wierstra, D., Forster, A., Peters, J., & Schmidhuber, J. (2010). Recurrent policy gradients. Logic Journal of IGPL, 18(5), 620–634. https://doi.org/10.1093/jigpal/jzp049
OpenUrl CrossRef

[35] ↵
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. https://doi.org/10.1007/BF00992696
OpenUrl CrossRef

[36] ↵
Xie, X., & Seung, H. S. (2000). Spike-based learning rules and stabilization of persistent neural activity. In Advances in Neural Information Processing Systems 12 (pp. 199–208).
OpenUrl

[37] ↵
Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J.J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8619–24. https://doi.org/10.1073/pnas.1403112111
OpenUrl Abstract/FREE Full Text

[38] ↵
Yang, G., Pan, F., & Gan, W.-B. (2009). Stably maintained dendritic spines are associated with lifelong memories. Nature, 462(7275), 920–924 https://doi.org/10.1038/nature08577
OpenUrl CrossRef PubMed Web of Science

Common features in plastic changes rather than constructed structures in recurrent neural network prefrontal cortex models

Abstract

Introduction

Results

Analysis of the constructed structures

Analysis of plastic changes with task learning

Discussion

Materials and methods

Model descriptions

Task descriptions

HF model

Pycog model

pyrl model

rHebb model

Statistical analysis of distributions

Neuronal unit inactivation

Source scripts

Acknowledgment

Author contributions

References

Supplemental References

Citation Manager Formats

Subject Area