Abstract
Understanding visual processing requires a detailed description of computations performed by neurons across stages of the visual system. However, the diverse tunings of neurons beyond the primary visual cortex (V1) have yet to be fully characterized. Using two-photon calcium imaging and stochastic visual stimuli, we catalogued the response properties of a dense sample of 40,000 neurons in V1 and six secondary visual areas of awake mice. All areas encode unique sets of features with distinct spatiotemporal preferences, motion speed selectivity, and differential responses to oriented and non-oriented stimuli. Central areas V1 and LM have the most diverse tunings, with distributed spatiotemporal preferences and a moderate bias for nonoriented stimuli. Preferences of V1 and LM neurons differ strikingly showing tuning to low and midrange spatiotemporal frequencies, respectively. Lateral areas PM and LI are highly biased towards high spatial and low temporal frequencies, showing weak selectivity for motion speed. Anterior areas AL, RL and AM are highly biased towards high temporal frequencies and have the largest proportion of motion tuned cells. Accordingly, activity patterns in these areas carry more information about motion speed than any other visual areas. With regards to spatial preferences, LI differs strikingly from PM and anterior areas in that it is heavily biased towards non-oriented stimuli. The data provides a detailed description of the segregation of encoding of spatiotemporal feature in the rodent visual cortex and provides a stark demonstration of the high functional specialization of visual areas.
Introduction
Recent studies on mice harvested important insights of the neural basis underpinning sensory computations. On the one hand, the genetic and experimental access to the mouse visual system revealed fundamental organizations at multiple stages of visual processing (Maruoka et al., 2017; Wanner et al., 2017; Han et al., 2018; Liang et al., 2018) and specific circuitries underlying a range of computations (Ko et al., 2011, 2013, Lien and Scanziani, 2013, 2018; Reinhold et al., 2015). In parallel, studies of mouse visual behaviors have revealed a greater richness than previously recognized. Mouse are able to identify arbitrary shapes and pictures (Brigman et al., 2005; Nithianantharajah et al., 2013), differentiate coherent motion direction (Stirman et al., 2016; Marques et al., 2018), using visual cues to navigate (Prusky et al., 2000; Harvey et al., 2009; Chen et al., 2013) or guide accurate approach to the prey (Hoy et al., 2016), and associate stimulus identity with stimulus value (Poort et al., 2015; Burgess et al., 2016). Many of these complex visual behaviors rely on an elaborate neural network of higher-order visual cortical areas. Therefore, a necessary step to understand the neural basis for those visual behaviors is to understand how specific visual stimuli are encoded in the higher visual cortex.
The mouse visual cortex encompasses over ten retinotopic higher visual areas surrounding V1 (Wang and Burkhalter, 2011; Garrett et al., 2014; Zhuang et al., 2017). These areas are suggested to be specialized for specific visual features. Surveys in anesthetized mice found neurons in higher visual areas respond to drifting gratings with distinct spatiotemporal frequencies and speeds, and pattern versus component motion (Marshel et al., 2011; Roth et al., 2012; Tohmi et al., 2014; Juavinett and Callaway, 2015; Smith et al., 2017). However these stimulus preferences might be very different in the awake brain since anesthesia profoundly affects visual responses (Haider et al., 2013; Lien and Scanziani, 2013; Aasebø et al., 2017). Nevertheless, there have been far fewer studies of the response properties of neurons in higher visual areas in awake mice, and these have only focused on a small set of areas (Andermann et al., 2011; Glickfeld et al., 2013).
To understand how specific visual features are encoded in the visual cortex, we undertook a comprehensive functional characterization of layer 2/3 neurons in V1 and six higher visual areas in awake mice using two-photon calcium imaging. Using a library of stochastic visual stimuli (spectral noise stimuli), we found neurons in distinct higher visual areas present distinct selectivities for the spatiotemporal frequency, visual motion speed and spatial stimulus anisotropy (oriented versus non-oriented stimuli). Population coding analysis further revealed greater speed discriminability in higher visual areas than V1. Overall, higher visual areas present segregated encoding of spatiotemporal features that might underpin distinct computations, such as processing of visual motion and shape.
Results
Cortical neurons are highly selective for spectral noise stimuli
Using two-photon calcium imaging, we recorded cellular responses to spectral noise stimuli in layer 2/3 of V1 and six higher visual areas (LM: lateromedial; AL: anterolateral; RL: rostrolateral; AM: anteromedial; PM: posteromedial; LI: laterointermediate; Figure 1A - figure supplement 1) in awake Thy1-GCaMP6s mice (Figure 1B; Dana et al., 2014).
The spectral noise stimuli consisted of 4s-epochs of spatiotemporally filtered bandpass noise stimuli interleaved by 4s-epochs of equiluminent gray screen (Figure supplement 2; Materials and Methods). The stimuli spanned a broad range of spatial (0.02 to 0.32 cycle per degree) and temporal frequencies (0.5 to 16 Hz) as well as various degrees of orientation bandwidth (5, 10, 40 degree, and no filter for non-oriented stimuli). These stimuli were designed to drive diverse orientation-tuned neurons. At one extreme, ISO stimuli had isotropic spatial frequency spectra with uniform power in all orientations, resembling a cloud of moving dots (Figure 1C -- figure supplement 2A). At the other, ANISO stimuli (Figure 1C - figure supplement 2A) had energy within a narrow orientation band (5 degree), resembling sinusoidal gratings. A global rotation was implemented in ANISO stimuli to sweep the entire orientation space within each stimulus epoch.
In V1, around 80% of fluorescent neurons were responsive to at least one stimulus condition; this number is lower in higher visual areas (40~65%, Figure supplement 3B). In addition, V1 neurons responded more reliably than those in higher visual areas (trial-to-trial correlation: V1 vs others, KS tests with Bonferroni correction, all p values < 0.01; Figure supplement 3C). These data demonstrate V1 neurons robustly respond to spectral noise stimuli and encode reliable visual representations, whereas neurons in higher visual areas are selective for appropriate stimulus dimensions. Neurons that didn’t respond might require stimulus dimensions that were not explored in this study (visual or non-visual).
We further investigated whether ISO and ANISO stimuli activated distinct populations of neurons. We found distinct stimulus preferences across neurons. A subset of neurons (Figure 1C, example cell 1) selectively responded to ISO stimuli (Figure 1D, ISO), showing a strong preference for non-oriented stimuli; whereas another (Figure 1C, cell 2) solely responded to ANISO stimuli (Figure 1D, ANISO), preferring oriented stimuli. Each subset comprised a substantial fraction of the responsive neurons in all areas (Figure 1E, ~20 to 40% per area). The remaining responsive cells (Figure 1C, cell 3) responded to both ISO and ANISO stimuli with comparable response profiles, albeit the difference in the response strengths (Figure 1D, group ‘Both’). This demonstrates distinct subsets of cortical neurons encode oriented or non-oriented features of the visual scene.
By characterizing the responses to spectral noise stimuli of populations of thousands of neurons in V1 and higher visual areas, we could quantitatively determine the degree to which areas in the mouse visual cortex are specialized for distinct spatiotemporal features.
Distinct population response profiles across areas
To determine whether cortical areas contain distinct functional populations, we categorized the reliably responsive neurons using a non-supervised clustering approach (spectral clustering, see Method and Materials). For each area, we obtained a subsample of 2000 neurons with random selection, each neuron yielded a vector of peak-normalized responses to ISO and ANISO stimuli (Figure 2B). We firstly tested whether there are discrete functional types or not. Response profiles of cortical neurons were highly diverse and covered the response space in a continues manner. Therefore, we didn’t observe discrete functional types based on the response profiles to spectral noise stimuli (Figure supplement 4A). Nevertheless, we used the clustering approach to investigate population response profiles across areas. Neurons were categorized into 12 broad groups, showing similar response profiles within each group and remarkable differences across groups (Figure 2A -- Figure supplement 4B). These groups presented distinct response properties for spatiotemporal frequencies and stimulus anisotropy (Figure supplement 5). Each area contained a unique composition of functional groups (Figure 2B, C). The interareal difference was estimated as the Euclidian distance of population compositions between pairs of areas, shown in a hierarchical tree (Figure 2D). V1 was distinct from higher visual areas, nevertheless more similar to LM. LM showed a uniform distribution of functional groups. Anterior areas, AL, RL and AM, showed similar abundance of a set of clusters and thus form a separate branch from other areas. On the other side of the tree, PM and LI contained functional clusters nearly non-overlapping with the anterior areas. These results reveal distinct population response profiles across cortical areas, suggesting the functional segregation for the processing of distinct spatiotemporal features.
To investigate how spatiotemporal information are encoded in the visual cortex, we undertook quantitative analysis on individual spatiotemporal features within and across areas.
Distinct preferences for the stimulus anisotropy across cortical areas
Cortical neurons showed diverse preferences for ISO and ANISO stimuli (Figure 1, 2B), which differed in the orientation characteristics. We then asked whether neurons were tuned for the degree of orientedness, which we refer as stimulus anisotropy. The tuning curve was measured as the responses to four degrees of anisotropy at the preferred spatial frequencies (Figure 3A, B). The stimulus anisotropy index (SAI) is the difference/sum ratio of the responses to the most anisotropic and isotropic stimuli. From −1 to 1, neurons shift the preferences from isotropic to anisotropic stimuli. The tuning of individual cells matched their response preferences for ISO or ANISO stimuli (Figure 3B; same cells in Figure 1C). Neurons selectively responding to ISO stimuli strongly responded to the non-oriented stimuli, the responses declined quickly as the stimuli became anisotropic (Cell 1; SAI = −0.91). ANISO neurons selectively responded to oriented stimuli, preferring higher degrees of anisotropy (Cell 2; SAI = 0.95). The remaining populations were broadly tuned, responding invariantly to different degrees of anisotropy (Cell 3, SAI = −0.1).
To investigate whether neurons in different areas have distinct preferences for stimulus anisotropy, we compared average tuning curves (Figure 3C) and distributions of stimulus anisotropy index across areas (Figure 3D). The average responses of V1 and LM neurons peaked at non-oriented stimuli, and decreased to increasingly oriented stimuli (Figure 3C). Likewise, the distributions of tuning index also deviated towards isotropic stimuli (Figure 3D). Areas AL, RL, AM and PM, showed heterogeneous preferences, encoding a uniform representation of the stimulus anisotropy space. Contrastingly, area LI showed a pronounced bias for non-oriented stimuli (Figure 3C, D), distinct from other areas. These results demonstrate mouse cortical areas encode distinct information of non-oriented and oriented components of the visual scene.
LM neurons encode a uniform representation of spatiotemporal frequency space
Besides the selectivity for stimulus anisotropy, cortical neurons also showed diverse responses to spatiotemporal frequencies (Figure 2B). Individual neurons’ tuning curve for the spatiotemporal frequency was estimated by fitting a two-dimensional Gaussian model to the trial-averaged responses to the preferred stimuli (ISO or ANISO stimuli; Figure 4A, B; Priebe et al., 2003; Andermann et al., 2011).
We firstly compared the spatiotemporal selectivities of LM neurons with those in V1, as LM neurons were shown to have similar functional properties as V1 neurons (Van Den Bergh et al., 2010; Marshel et al., 2011). Both areas contained neurons showing highly diverse preferences for spatial and temporal frequencies, spanning the entire frequency spectra (Figure 4E, F). V1 population, however, showed the preference for lower frequencies; whereas LM population formed a uniform representation of the frequency space with a slight bias for intermediate frequencies (Figure 4C: 20 random response fits; 4E: fraction of responsive cells to certain frequencies). These differences were also evident in the distribution of preferred frequencies and the proportion of neuron of different passband properties (Figure 4D, F-- figure supplement 6).
In the spatial frequency domain, V1 preferred lower frequencies than LM (Figure 5A; median values, V1 vs LM: 0.055 vs 0.074 cpd; KS test with Bonferroni correction, p < 0.01), and had more lowpass cells (Figure supplement 6C; V1 vs LM: 33% vs 27%), indicating a tendency to respond to even lower spatial frequencies. Cutoff frequencies (Figure 5B) were measured to estimate the range of frequency represented by each neuronal population. V1 neurons preferentially represented low spatial frequencies (median low and high cutoffs: 0.03 and 0.137 cpd), contrasting LM neurons that preferred intermediate frequencies (median cutoffs: 0.046 and 0.163 cpd; V1 vs LM, KS-test: p< 0.01 for each pair). In addition, the spatial frequency tuning width of V1 bandpass cells were wider than those in LM, suggesting that LM neurons were more selective for spatial frequencies (Figure supplement 6D; KS test, p<0.01).
In the temporal frequency domain (Figure 5C, D -- figure supplement 6E, F), LM neurons preferred higher frequencies than V1 (higher preferred and cutoff frequencies, and less lowpass cells in LM). The bandwidths were comparable for V1 and LM neurons (V1 vs LM: 2.09 vs 2.02 octave; KS test, p>0.05). Altogether, these results demonstrate the spatiotemporal selectivity is different between LM and V1: LM neurons encode a uniform representation of the spatiotemporal frequency space, whereas V1 shows an overrepresented lower frequency domain.
Anterior areas prefer low spatial and high temporal frequencies
Anterior areas, including AL, RL and AM, preferentially responded to low spatial and high temporal frequencies (Figure 4,5 - figure supplement 6). Among all areas, RL showed the utmost preference for low spatial and high temporal frequencies (median preferred frequencies: 0.062 cpd and 4.59 Hz; KS test: p < 0.01 for all area pairs). It contained the largest fraction of spatially lowpass / temporally highpass cells among all areas (25% vs 1~15%). In comparison, AM neurons were biased to intermediate frequencies (median preferred frequencies: 0.085 cpd and 3.34 Hz) and contained a substantial group of spatially bandpass neurons (64%). AL showed median preferred frequencies comparable to AM, but higher diversities in the population (Figure 4F), resulting in a more uniform representation of low to intermediate spatial frequencies (Figure 4E). These results demonstrate overall similar preferences for low spatial, high temporal frequencies in anterior areas, suggesting the specialization for encoding fast-changing, large-scale stimuli.
Lateral areas prefer high spatial and low temporal frequencies
In contrast to anterior areas, areas LI and PM (Figure4,5—figure supplement 6), who are situated at lateral sides of the visual cortex, preferred high spatial and low temporal frequencies (median preferred, PM vs LI: 0.158 vs 0.142 cpd; 1.29 vs 1.36 Hz; KS test, for each pair, p<0.05). PM and LI contained different subpopulations of neurons (Figure 5A). PM population presented a bimodal distribution in the spatial frequency domain with a small and a large fraction of neurons responding to low and high frequencies, respectively; whereas LI contained more diverse population that responded to a broader range of spectrum deviated to higher spatial frequencies. Nevertheless, PM and LI presented overall similar preferences for slow-moving, refined stimuli, opposing to anterior areas.
Preferences shifted to higher frequencies in awake mice
We compared our results to Marshel et al., 2011, where the same set of areas were investigated in anesthetized mice. Individual areas’ preferred frequencies in the current study are remarkably higher (up to 2-fold) than in Marshel’s study (Figure 5E, F). By contrast, our results are largely comparable to the observation in awake mice (in comparison to Andermann et al., 2011; except TF tuning for V1). These comparisons might reflect the influence of different brain states on the neural representation of visual information in the visual cortex, and suggest awake animals have greater capacities to respond to fast changing and/or refined visual stimuli.
Higher visual areas encode complementary representation of visual motion speed
The complete mapping of the spatiotemporal frequency space allowed us to determine individual neurons’ tuning for visual motion speed. The speed is given as the ratio of temporal and spatial frequencies. A speed-tuned neuron has similar tuning curves for speed across spatial frequencies, and thus the temporal frequency tuning varies as a function of the spatial frequency tuning. By contrast, a non-speed-tuned cell has separable, independent tuning for the spatial and temporal frequency (Figure 6A - figure supplement 7A). The speed tuning index ξ is the correlation between spatial and temporal frequency, extracted from the Gaussian fits. If ξ >= 0.5, neurons are speed tuned; otherwise, untuned. The speed tuning analysis was focused on the responses to isotropic noise stimuli, which contained local motion components rather than global, coherent motion (such as drifting gratings). The responses to anisotropic noise stimuli were not used given the potential confound induced by the embedded rotatory motion.
The speed tuning index ξ of V1 population was centered around 0, indicating most V1 cells were not tuned for speed (Figure 6B, C). Neurons in higher visual areas were significantly more tuned than V1 neurons (Figure 6B, C; KS tests with Bon, p<0.01). Among higher visual areas, AM contained the largest fraction of speed-tuned cells, followed by RL and AL. In comparison, LM, PM and LI contained relatively smaller fractions of speed-tuned cell. Moreover, speed-tuned neurons encoded distinct ranges of speed across areas (Figure 6B, D). V1 and LM were broadly tuned for the intermediate range of speed (peak at 12.5 and 25 deg/s respectively). AL and AM primarily responded to fast-moving stimuli (peak at 100 deg/s); RL neurons selectively responded to extremely fast stimuli (peak at 400 deg/s). By contrast, PM and LI mainly responded to slow motion (peak at 6.25 deg/s). These results suggest mouse visual cortical areas encode distinct ranges of visual motion speed: V1 is largely untuned for speed; LM is broadly tuned; anterior areas and lateral areas selectively encode fast and slow motion respectively.
Increased speed discriminability in higher visual areas
As higher visual areas show distinct selectivities for spatiotemporal features, an interesting question arises: do these areas computationally benefit from such a functional specialization? To investigate the difference of spatiotemporal information processing between areas, we compared how well could information about the stimulus category (spatiotemporal frequencies) could be decoded from the neural population activities. The decoding accuracy increased as a function of population size, showing distinct frequency-specific performances across areas (Figure supplement 8A, B). In V1, decoding accuracies for all frequencies increased rapidly as the population size increased (Figure supplement 8B), reaching 90% average accuracy with ~50 cells (Figure supplement 8C). This high decoding capacity might attribute to the heterogeneous nature of V1 population. With similarly high heterogeneity, LM population also showed great decoding capacity, especially for intermediate spatial frequencies (Figure supplement 8C). By contrast, highly specialized areas showed decreased decoding performance. In these areas, decoding accuracies rose slowly with increasing population sizes; some frequencies could not be successfully discriminated with even very large populations (e.g. RL, accuracy <80% with 1000 neurons; figure supplement 8B). RL showed deteriorated performance across frequencies, with a slight recovery at the intermediate frequencies. AM was better at intermediate frequencies and bad at high spatial frequencies. AL populations, showing relatively high diversity (Figure 2F), presented elevated performances, especially intermediate spatial frequencies. PM and LI were better at discriminating high spatial frequencies. LI outperformed PM in the decoding performance for intermediate temporal frequencies. The decreased decoding performance for non-preferred frequencies might be just a consequence of absence of responses: no response, no information. Whereas the decrease for preferred frequencies might be due to the absence of diversity, as the stimulus information is low if all neurons respond to it. These results suggested V1 and LM encode holistic information about visual stimuli, whereas distinct spatiotemporal information is distributed across higher visual cortical areas.
Higher visual areas showed higher fractions of speed-tuned neurons than V1, are they better at encoding for speed? We measured decoding performances between stimulus pairs corresponding to distinct local motion speeds (e.g., stimulus S1T2 has half spatial frequency and twice temporal frequency as stimulus S2T1, thus S1T2 is four times faster than S2T1; Figure 7B, speed pairs are orthogonal to the iso-speed lines). V1 showed relative homogeneous decoding capacities across speed pairs, showing a slight increase at lower spatial frequencies. LM showed greater decoding capacities for slow to intermediated speed (6.25 – 200 deg/s). AL population clearly separated slow stimuli (12.5-25 deg/s) from intermediate ones (50-100 deg/s). AM and RL were specialized for speed discrimination at intermediate frequencies, albeit RL exhibited a global decrease in decoding performance. LI showed elevated discrimination capacity for the lower range of speed (3.1-100 deg/s). PM showed a similar tendency for better discrimination for slow stimuli, albeit less pronounced as LI. Moreover, the frequency-specific enhancement of speed decoding performances in higher visual areas did not merely reflect individual areas’ frequency preferences, since the speed decoding remarkably outperformed the decoding for iso-speed stimulus pairs in the same frequency space (Figure 7B; speed pairs: S1T2 vs S2T1; compare to Figure 8E; iso-speed pairs: S1T1 vs S2T2). Altogether, these results demonstrated greater speed discrimination capacities in higher visual areas, with the loss of spatiotemporal discriminability. This finding suggests information lost due to functional specialization is a trade-off for better encoding capacities of novel features, which might arise from integrating non-correlated features by pooling large populations of neurons with specific tuning properties.
Discussion
Mouse higher visual areas are suggested engaged in a wide range of behaviorally relevant visual computations (Harvey et al., 2012; Olcese et al., 2013; Burgess et al., 2016; Morcos and Harvey, 2017). However, the neural basis of these computations remains poorly understood, largely due to the lack of knowledge of the functional properties of neurons in higher visual areas. Using rich spectral noise stimuli and two-photon calcium imaging, we provide a detailed characterization of the stimulus preference of layer 2/3 neurons in V1 and 6 higher visual areas in awake mice. We found cortical areas showed distinct preference for stimulus anisotropy; area LI showed the most pronounced preference for non-oriented stimuli amongst areas. Moreover, higher visual areas, contrasting V1, contained a substantial fraction of neurons sensitive to the visual motion speed. Area LM, encompassing diverse spatiotemporally tuned neurons, respond to a broad range of speed. Anterior areas (AL, RL and AM) preferentially responded to low spatial, high temporal frequencies, thus fast motion; whereas lateral areas (PM and LI) preferred high spatial, low temporal frequencies, thus slow motion. Population coding analysis further revealed greater speed discriminability in higher visual areas. These findings provide novel insights of a highly specialized network of cortical areas that might underpin distinct visual computations, such as motion and shape processing.
Comparison of spatiotemporal selectivity with previous studies
The preferred spatiotemporal frequencies observed in the current study are remarkably higher (up to 2-fold) than in earlier study on anesthetized mice (Marshel et al., 2011), but comparable to the report on awake mice (Andermann et al., 2011). These results hint to the impact of anesthesia on neuronal responses. The lower ranges of preferred temporal frequencies observed in anesthetized mice (Van Den Bergh et al., 2010; Marshel et al., 2011; Roth et al., 2012; Tohmi et al., 2014) might result from suppressed thalamocortical synaptic transmission (Reinhold et al., 2015). The difference in the spatial frequency tuning is, however, unlikely caused by different anesthetic levels (Zhuang et al., 2014; Durand et al., 2016). One explanation could be differential neuronal populations sampled in these two studies. In Marshel’s study (Marshel et al., 2011), excitatory and inhibitory neurons were ubiquitously labeled with the synthetic calcium indicator (Oregon Green Bapta-1 AM); whereas in this study we sampled from a subset of excitatory neurons in Thy1-GCaMP6s transgenic mice (Dana et al., 2014). As V1 interneurons prefer lower spatial frequencies than layer2/3 excitatory neurons (Niell and Stryker, 2008), the population preference would shift towards lower frequencies with the inclusion of interneurons. Nevertheless, it is unclear whether such a difference between excitatory and inhibitory neurons also exist in higher visual areas, and consequently lead to the lower preferred spatial frequencies as a population. Further comparisons of neuronal response properties in wakefulness and anesthesia (Greenberg et al., 2008; Haider et al., 2013; Durand et al., 2016; Adesnik, 2017) will shed light to the influence of different brain states on the neural representation and transformation among visual cortical areas. Nevertheless, studying neuronal physiology in the awake brain will be of great value for understanding neural computations for perception and behavior.
Selectivity for stimulus anisotropy
We found cortical neurons are highly selective for stimulus anisotropy (Figure 3). As increasing stimulus anisotropy increases the length of oriented bars, this selectivity might reflect a form of length tuning. Many neurons prefer oriented stimuli, with increased responses to longer lengths, resembling a phenomenon known as ‘length summation’ (Schumer and Movshon, 1984). Meanwhile, many neurons prefer short stimuli, reminiscent of ‘end-stopping’ cells, who preferentially respond to stimuli of limited lengths (Hubel and Wiesel, 1965; Gilbert, 1977). This length tuning property may be attributed to ‘surround suppression’, where one receptive field is inhibited by the stimulation at the surround. Diverse length tuning curves may emerge from differential ratios of the excitation on classical receptive fields and the inhibitory effect of the receptive field surround (Adesnik et al., 2012; Vaiceliunaite et al., 2013; Adesnik, 2017).
We found diverse length tuning in all test areas. V1 and LM presented relatively strong biases for non-oriented stimuli, suggesting strong surround suppression that was also found in the primate (Hubel and Livingstone, 1987; Shushruth et al., 2009; El-Shamayleh et al., 2013), cat (DeAngelis et al., 1994) and mouse (Van Den Bergh et al., 2010; Adesnik et al., 2012; Nienborg et al., 2013; Vaiceliunaite et al., 2013; Adesnik, 2017). Surround suppression was also reported in the primate middle temporal visual area (MT/V5) (Born and Bradley, 2005) and suggested to be used to solve the aperture problem (Tsui et al., 2010). Mouse RL was proposed to be the mouse analogue of MT (Juavinett and Callaway, 2015); hence the surround suppression in RL and other dorsal areas (AL, AM and PM) might facilitate unambiguous encoding of motion directions. In the primate, ‘end-stopping’ behavior increases along the ventral processing stream and is suggested to benefit the coding for curvatures (Ponce et al., 2017). Area LI, showing a pronounced preference for non-oriented stimuli, shares an interesting similarity to the primate ventral areas, suggesting its potential role in the processing of visual shapes.
Functional organization of mouse higher visual areas
Our data demonstrated a complementary representation of visual motion speed in the mouse higher visual areas (Figure 6). Anterior areas (AL, RL and AM) contain abundant speed-tuned cells and encode fast local motion speed (Tohmi et al., 2009, 2014; this study). As anterior areas mainly represent the lower visual field (Zhuang et al., 2017), high-speed tuning properties of these areas might benefit the encoding for fast optic flows near the ground during animal’s navigation. Indeed, area RL, who covers the lower nasal field, where has the fastest optic flows, showed the highest preferred speed. In consistency, anterior area A was reported to preferentially respond to high temporal and low spatial frequencies (Murakami et al., 2017), suggesting the preference for fast speed. In contrast, area PM mainly represents visual peripheries, where objects are usually distant and optic flows are slow. Ethologically, PM neurons show the preference for slow speeds (this study; Andermann et al., 2011; Roth et al., 2012). Interestingly, PM neurons were reported to show strong speed-tuned responses to drifting gratings (Andermann et al., 2011), but showed less prevalence in this study. One explanation is speed-tuned cells in PM are selective for global, coherent motion than local, non-coherent motion (e.g. isotropic noise stimuli). In fact, using noise stimuli embedded in a global directional motion flow, we found PM neurons responded more robustly to coherent motion (data not shown), suggesting the specialization of area PM for encoding slow optic flows in the visual periphery during navigation. PM might provide information about optic flows via the strong direct inputs to retrosplenial cortex (Wang et al., 2012), which preferentially responses to slow motion (Murakami et al., 2015) and is involved in the spatial navigation (Mao et al., 2017). Besides PM, anterior and medial areas mainly target parietal, motor and limbic cortices (Wang et al., 2012), coinciding with the representation of visuospatial functions in the dorsal stream in rats (Kolb and Walkey, 1987). Altogether, the distinct preference for visual motion speed in dorsal areas (AL, RL, AM and PM) suggest an ethological coding of optic flows during navigation, reminiscent of the dorsal streams in the primate visual system (Van Essen and Maunsell, 1983).
Lateral area LI presents preferences for spatial details and non-oriented stimuli (Figure 3-5). These properties are suitable for encoding spatial details and curvatures, and are critical for object recognition in the primate ventral stream (Ponce et al., 2017; Lu et al., 2018). In addition, anatomical studies suggested area LI is a node in the ventral stream: most of the projections terminate in temporal and parahippocampal regions (Wang et al., 2012). These pieces of evidence suggest area LI belongs to the ventral stream for shape processing (analogous area in the rat: Vermaercke et al., 2014; Tafazoli et al., 2017; in the primate: Van Essen and Maunsell, 1983). Rat lateral areas, including LI, presenting increased transformation-tolerant representation of visual objects(Vermaercke et al., 2014; Tafazoli et al., 2017). It poses interests for future studies to determine if a similar representation exists in the mouse lateral areas, and how does it make use of the specialized tuning properties of simple features for more complex representations, such as visual objects.
Area LM was proposed to be the gateway for the ventral stream (AL as the gateway for the dorsal stream), given its relatively denser projections to the ventral areas (Wang et al., 2011, 2012) and population responses relatively similar to the ventral areas (wide field imaging in Murakami et al., 2017; Smith et al., 2017). Our data showed, however, LM contains highly diverse neurons that present dorsal and ventral properties, resembling the primate V2 (Van Den Bergh et al., 2010). In addition, LM neurons send strong projections to all other higher visual areas (Wang et al., 2011, 2012), conveying target-specific information (Glickfeld et al., 2013). The functional properties of LM neurons suggest its role as the divided gateway of dorsal and ventral streams.
Conclusion
The current study provides a comprehensive characterization of stimulus presences of layer 2/3 neurons in V1 and higher visual areas in awake mice. The results reveal the segregation of spatiotemporal features in the visual cortex that might underpinning the processing of visual motion and shape. Facing the accumulating evidence of higher order computations in the mouse higher visual cortex (Olcese et al., 2013; Burgess et al., 2016; Morcos and Harvey, 2017), it is essential to understand how area-specific representations of visual features arise along the visual hierarchy, and how basic features of visual and other sensory information are integrated in the higher order cortex for complex computations. The results and implication of this study, provide a necessary basis for future studies investigating circuitry mechanisms for visual perception and behaviors.
Materials and Methods
Animals and Surgery
All experiments were conducted with the approval of the Animal Ethics Committee of KU Leuven. Standard craniotomy surgeries were performed to gain optical access to the visual cortex through a set of cover glasses (Goldey et al., 2014). Thy1-GCaMP6s-WPRE (Dana et al. 2014) mice (n = 10, 5 male and 5 female) between 2 and 3-month-old were anesthetized with isoflurane (2.5%−3% induction, 1%−1.25% surgery). A custom-made titanium frame was mounted to the skull, and a craniotomy over the visual cortex was made for calcium imaging. The cranial window was covered by a 5mm cover glass. Buprenex and Cefazolin were administered postoperatively (2 mg/kg and 5 mg/kg respectively) when the animal recovered from anesthesia after surgery.
Widefield Calcium Imaging
Widefield fluorescent images were acquired through a 2× objective (NA = 0.055, Edmund Optics). Illumination was from a blue LED (479nm, ThorLabs), the green fluorescence was collected with an EMCCD camera (EM-C2, QImaging) via a bandpass filter (510/84 nm filter, Semrock). The image acquisition was controlled with a customize software.
Two-photon Calcium Imaging
A customized two-photon microscopy (Neurolabware) was used. GCaMP6s were excited at 920nm wavelength with a Ti:Sapphire excitation laser (MaiTai DeepSee, Spectra-Physics). The green fluorescence of GCaMP6s was collected with a photomultiplier tube (PMT, Hamamatsu) through a bandpass filter (510/84 nm, Semrock). Images (720×512 pixel per frame) were collect at 31 Hz with a 16× objective (Nikon). Volume imaging was performed using a focus tunable lens (EL-10-30-TC, Optotune; staircase mode). We simultaneously recorded neuronal activities in large volumes (0.8 × 0.8 × 0.15 mm3) of layer 2/3 of the targeted visual cortical areas. During imaging, mice were head-clamped on a platform while consciously viewing the visual stimuli on the display. Eye movements were monitors using a camera and infrared illumination (720-900 nm bandpass filters, Edmund).
Visual Stimulation
Visual stimuli were displayed on a gamma-corrected LCD display (22”, Samsung 2233RZ). The screen was oriented parallel to the eye and placed 18 cm from the animal (covering 80 degree in elevation by 105 degree in azimuth). Spherical correction was applied to the stimuli to define eccentricity in spherical coordinates.
Spectral noise stimuli (Figure supplement 2) were created by applying a set of parametrized filters on random pink-noise movies. Bandpass filters (bandwidth: 1 octave) with different center spatial frequencies (0.02, 0.04, 0.08, 0.16 and 0.32 cpd) and temporal frequencies (0.5, 1, 2, 4, 8, 16 Hz) gave 30 combinations of spatiotemporal noises. Cutoff frequencies were set at 0.5 octave lower or higher than the center frequencies. A Von Mise filter was used to control the orientation bandwidth (no filter, 40, 10, 5 degree) in the spatial frequency domain, and thus the stimulus anisotropy in the space domain. Isotropic (ISO) and anisotropic (ANISO) noise stimuli were used for the spatiotemporal frequency assay. ISO stimuli contained non-oriented patterns, resembling clouds of dots with alternating contrast. In contrast, ANISO stimuli, with a narrow orientation bandwidth (5 degree), presented oriented patterns that resembles sinusoidal gratings. Each stimulus set comprises an array of 30 combinations of spatial frequencies (0.02, 0.04, 0.08, 0.16, 0.32 cycle per degree) and temporal frequencies (0.5, 1, 2, 4,8,16 Hz), spanning a broad frequency spectrum. In addition, ANISO stimuli smoothly rotated 180 degree to sweep the orientation space within each stimulus epoch. For the stimulus anisotropy assay, the stimulus set comprised pairs of four degrees of orientation bandwidth (infinite, 40, 10, 5 degree) and four center spatial frequencies (0.04, 0.08, 0.16, 0.32 cpd), with a fixed center temporal frequency (2Hz). A global rotation was also applied to sweep the orientation space. Each stimulus condition was presented for 4 second, intertwined with 4-second equiluminent gray screen. In each of the four pseudorandomized trials, a different seed was used to generated unique random noise, resulting different phases yet constant frequency spectra across trials.
For retinotopic mapping, we presented two sets of stimuli. Circling patch stimuli had a small patch (20-degree in diameter) circling along an elliptic trajectory (azimuth: −40 to 40 deg; elevation: −30 to 30 deg) on the display. Traveling bar stimuli comprised a narrow bar (13 deg wide) sweeping across the screen in 4 cardinal direction. An isotropic noise background (0.08 cpd, 2Hz) was embedded in the patch or bar. Each stimulus condition lasted for 10 seconds and repeated for 20 times.
Data Analysis
All subsequent data analysis was performed in MATLAB (The Mathworks, Natick, MA).
Retinotopy analysis
The phase maps were measured form the fluorescent responses to the phase/position of the circling patch stimuli (Figure supplement 1). Each area has a representation of the elliptic trajectory of the patch, resulting pinwheel-like retinotopic maps. Sign maps were obtained with the methods described in the previous study (Garrett et al., 2014). Azimuth and elevation position maps were measured as the temporal phase of the peak florescent responses to a traveling bar for each pixel. They were used to generate a visual field sign map, where each patch represented one cortical area.
Selection for visual cells
For the cellular imaging, raw images were reconstructed and corrected for brain motion artefacts using custom MATLAB routines (Bonin et al. 2011). Regions of interest (ROI) were selected with custom semi-automated segmentation algorithms. Cellular fluorescence time courses were generated by averaging all pixels in a cell mask, followed by subtracting the neuropil signals in the surround shell. Responses were defined as the averaged dF/F during the stimulus epoch, where dF is the change in the fluorescent signal and F is the baseline fluorescence.
Neurons were considered responsive to a given stimuli if responding to at least one stimulus condition (median response surpass 3x standard deviation of the baseline fluctuation for over 1 second). Response reliability was measured in two ways, (1) coefficient variation of the responses to the peak frequencies across trials, and (2) the average trial-to-trial correlation of the fluorescent time courses. Neurons above threshold (trial-to-trial correlation ≥ 0.4) were deemed reliable.
Clustering
The population response matrices for each area were generated with random selection of 2000 neurons, each of which was represented as a vector of responses to ISO and ANISO stimuli (normalized to range from 0 to 1). The response matrix of the populations across all areas were used in a spectral clustering algorithm (Matlab Central). To determine whether there are discrete functional types, we plotted the overall variances (within-cluster sum-of-square) as a function of number of clusters. We didn’t observe abrupt decrease of variance with increased cluster size (Figure supplement 4A), suggesting there are no discrete functional response types but rather a continuum. Anyways we used the clustering approach to study the population response profiles across areas. We decided on the selection of 12 clusters for low heterogeneity within clusters without excessive splitting (Figure supplement 4B).
Tuning curves
For the spatial and temporal frequency analysis, responses were fit to two-dimensional elliptical Gaussian models (Priebe et al. 2003; Andermann et al. 2011): where log2 tfp(sf) = ξ(log2 sf − log2 sf0) + log2 tf0, and A is the peak response amplitude, sf0 and tf0 are the preferred spatial and temporal frequencies, and σsf and σtf are the spatial and temporal frequency tuning widths. The dependence of temporal frequency preference on spatial frequency is captured by a power-law exponent x. Estimates of cutoff values for spatial and temporal frequency were obtained from the half maxima of cross-sections at R(sf, tf0) and R(sf0, tf), respectively. Neurons responding to the lowest tested frequencies with over 50% peak responses were categorized as lowpass, and thus the low cutoff values were set to the lowest frequencies. In the same manner, the high cutoff values of highpass cells were set to the highest tested frequencies. Half-width bandwidths were estimated from bandpass cells.
Tuning curves for stimulus anisotropy were measured as the average responses to different degrees of stimulus anisotropy at the preferred spatial frequencies. The stimulus anisotropy index (SAI) was the difference/sum ratio of response amplitudes to the most anisotropic stimuli and the isotropic stimuli.
Population coding analysis
We used linear classifiers to decode stimulus categories from neuronal population activities (Vermaercke et al., 2014). SVM (support vector machine) was trained and tested in pair-wise classification for each possible pair (30 stimulus conditions in 435 unique pairs). Population used for frequency classification were composed of neurons responding to isotropic noise stimuli. Data were split into training and testing groups (half-half) and performance was measured as the proportion of correct classification decisions to the testing groups (standard cross validation). To test the scaling of decoding performance as a function of population size, we measured the decoding performances with subsamples of different numbers of neurons (logistic increase, 1 to 1000; without replacement) across areas. For each iteration, we resampled neuronal populations with specific population size. We averaged over 100 iterations to obtain confidence intervals for the performance. To compare the decoding capacity between stimulus pairs and across areas, we measured the number of neurons required for classification accuracy over 90% by interpolate the growth curves of performances as a function of population size.
Author Contributions
XH and VB designed the research; XH acquired the data; XH, BV performed the data analysis and wrote the paper with input from VB. The authors declare no competing interests.
Acknowledgements
We thank Steffen Kandler for surgery training and Joao Couto for help with imaging setup and data analysis. XH was supported by a PhD scholarship from the Chinese Scholarship Council. BV was supported by postdoctoral scholarship from FWO (12E4314N). VB acknowledges support from FWO (Grant G0D0516N), KU Leuven Research Council (Grant C14/16/048) and NERF Institutional Funding. NERF is funded by Imec, VIB and KU Leuven.