Abstract
Observers perceive their visual environment in unique ways. How ventral visual stream (VVS) regions represent subjectively perceived object characteristics remains poorly understood. We hypothesized that the visual similarity between objects that observers perceive is reflected with highest fidelity in neural activity patterns in perirhinal and anterolateral entorhinal cortex at the apex of the VVS object-processing hierarchy. To address this issue with fMRI, we administered a task that required discrimination between images of exemplars from real-world categories. Further, we obtained ratings of perceived visual similarities. We found that perceived visual similarities predicted discrimination performance in an observer-specific manner. As anticipated, activity patterns in perirhinal and anterolateral entorhinal cortex predicted perceived similarity structure, including those aspects that are observer-specific, with higher fidelity than any other region examined. Our findings provide new evidence that representations of the visual world at the apex of the VVS differ across observers in ways that influence behaviour.
Introduction
The ability to perceive similarities and differences between objects plays an integral role in cognition and behaviour. Perceived similarities are important, for example, for categorizing a fruit at the grocery store as an apple rather than a pear. The appreciation of more fine-grained similarities between exemplars of a category also shapes behaviour, such as when deciding which apple among many to select for purchase. Indeed, experimental work in psychology has confirmed that the similarity of objects influences performance in numerous behavioural contexts, including but not limited to categorization, object discrimination, recognition memory, and prediction (see Medin et al 1993; Goldstone & Son, 2012; Hebart et al 2020, for review). Yet, despite the well-established links to behaviour, how the brain represents these similarities between objects is only beginning to be understood. A central question that has received limited investigation so far is how similarities between objects that observers subjectively perceive and report map onto neural object representations. Answering this question can provide insight as to what brain regions provide the ‘read-out’ for such subjective reports. Moreover, it holds promise for understanding how differences in the way in which observers perceive their visual environment are reflected in variations in functional brain organization (Charest & Kriegeskorte, 2015).
Functional neuroimaging, combined with pattern analysis techniques, provides a powerful tool for examining the mapping between similarity relationships in visual perception of objects and similarities in corresponding neural representations (Kriegeskorte & Kievit, 2013). A significant body of research addressing this issue has focused on the characterization of category structure and other coarse object distinctions, such as animacy. Findings from this research indicate that activity patterns in a large expanse of the ventral visual stream (VVS), often referred to as inferotemporaI cortex (IT) or ventral temporal cortex, capture much of this structure in the environment (Kriegeskorte et al., 2008; Connolly et al., 2012; Mur et al., 2013; Proklova et al., 2016; Cichy et al., 2019). For example, numerous studies have revealed a similarity-based clustering of response patterns for objects in IT that is tied to category membership (e.g., Kriegeskorte et al., 2008; Proklova et al., 2016; see Grill-Spector & Weiner., 2014 for review). In these studies, and in most related work, the primary focus has been on similarity in relation to object distinctions that are defined in objective terms, and on the characterization of neural representations that is shared by observers. As such, they do not address whether activity patterns in IT also capture similarity relationships among objects that characterize subjective aspects of visual perception that differ across individuals’ reports. When neural activity that corresponds to subjectively perceived visual similarities has been examined, extant research has mostly focused on specific object features, such as shape or size (Op de Beeck et al., 2008; Haushofer et al., 2008; Schwarzkopf et al., 2011; Moutsiana et al., 2016) rather than on similarities between complex real world objects that differ from each other on multiple dimensions. A notable exception to this research trend is an fMRI study that focused on perceived similarities among select real-world objects that are personally meaningful (e.g., images of observers’ own car, their own bicycle; Charest et al., 2014), which demonstrated links between observer-specific perceived similarity and the similarity structure embedded in activity patterns in IT.
It is unlikely that observers only perceive similarities between those exemplars in a unique manner that are highly familiar and have special personal meaning attached to them. Recent behavioural research suggests, for example, that even for novel computer-generated object categories, reports of perceived similarities among exemplars change with just a few training sessions (Collins & Behrmann, 2020). In order to reveal the mapping between visual similarities that observers subjectively perceive and similarities in neural activity among category exemplars, it is important to also consider brain regions in the VVS that are situated downstream from IT on the medial surface of the temporal lobe, namely perirhinal cortex (PrC), and the primary region to which it projects, i.e., lateral entorhinal cortex. According to an increasingly influential view in neuroscience, PrC constitutes the apex of the VVS processing hierarchy. Visual objects are thought to be represented in PrC in their most highly integrated form based on complex feature conjunctions (Murray & Bussey, 1999; Buckley & Gaffan, 2006; Bussey & Saksida, 2007; Graham, Barense, & Lee, 2010; Kent et al., 2016), with lateral entorhinal cortex (or its human homologue anterolateral entorhinal cortex; alErC) complementing this role through integration with additional spatial information (Connor & Knierim, 2017; Yeung et al., 2017). These coding properties make PrC and alErC ideally suited for providing the read-out for subjective reports of perceived similarity among exemplars, and for capturing even those aspects of perceived similarity structure that are observer specific. At present, however, there is no evidence available that directly addresses whether these regions play such a functional role.
In the current fMRI study, we tested the idea that the visual similarity structure among exemplars of real-world categories that can be derived from observers’ subjective reports, including its observer-specific characteristics, is behaviourally relevant, and is predicted by the similarity structure of neural activity in PrC and alErC. During scanning, we administered a novel experimental task that required visual discrimination between images of exemplars from multiple real-world categories across consecutive trials (see Fig. 1). In addition, we obtained ratings of subjectively perceived visual similarities between these stimuli from each observer offline (see Fig. 2). At the behavioural level, we found that reported perceived similarities between exemplars influenced discrimination performance in an observer-specific manner. Representational similarity analyses (RSA) of ultra-high resolution fMRI data revealed, in line with our hypotheses, that activation patterns in PrC and alErC do indeed predict these subjectively perceived visual similarities among exemplars with high precision, and that they do so in a way that also captures those aspects of similarity structure that are unique to individual observers.
Results
Perceived Visual Similarity Structure Among Exemplars Varies across Observers
We used inverse multidimensional scaling (iMDS, Kriegeskorte & Mur, 2012) to create participant-specific models of perceived visual similarity for 4 exemplars from 10 different categories (see Fig. 1). Specifically, participants were instructed to arrange images of objects in a circular arena by placing those they perceived to be more visually similar closer together, and those they perceived to be less visually similar farther apart.
Participants completed these arrangements offline, i.e., outside of the scanner, in two phases, with the first phase involving sorting of the full set of 40 objects in a single arrangement (Supplementary Fig. 1A). The second phase required sorting of exemplars within categories in 10 separate arrangements (one per category; Fig. 1A). Given our interest in representations that capture fine-grained object similarities within categories, our primary fMRI analyses relied on the similarity structures computed based on sorting in this second phase. The distances between all pairwise exemplars within each category were used to create a behaviour-based (i.e., subjective-report) representational dissimilarity matrix (RDM; Fig. 1B), which included a split of the range of similarities into three levels for sensitivity analyses in behaviour and neural activation patterns (see Methods for further detail). Examination of intersubject correlations of each participant’s RDM and the mean of all other participants’ RDMs (excluding their own) revealed a mean value of r = 0.69, with noticeable variability across observers as reflected in their range r = 0.54-0.81 (see Fig 1C, and Supplementary Fig. 1 for data on individual categories). We leveraged this variability across observers in subsequent analyses in order to determine whether behavioural performance on the modified 1-Back task during scanning, and neural representations in PrC and alErC, are predicted by the perceived similarity structure of individual observers.
Behavioural Performance is Sensitive to Observers’ Own Perceived Visual Similarity Structure
Participants underwent ultra-high resolution fMRI scanning while completing a novel Category-Exemplar 1-Back Task designed to tax fine-grained visual object discrimination (see Fig. 2A). This task required responding to two different types of repetition, namely repetition of Same Exemplars or of different exemplars from the Same Category, across consecutive trials. Participants were asked to provide a button-press response when they noticed repetitions, with different buttons for each type of repetition. On all other trials, participants were not required to provide a response. Importantly, this task was designed to ensure that participants engaged in categorization, while also discriminating between exemplars within categories. Performance on the Category-Exemplar 1-Back Task was sensitive to perceived visual similarity between exemplars as reflected in observers’ offline ratings and formalized in the behavioural RDMs with 3 different levels of similarity (Fig 2. B-C; see also Supplementary Table 1). Specifically, response errors on Same Category trials increased with increasing visual similarity (significant linear slope; t(22) = 18.35, p < .0001). Moreover, response times for correct responses on Same Category trials were positively correlated with perceived visual similarity level (t(22) = 13.47, p < .0001). Critically, task performance was also sensitive to the unique perceived similarity structure within categories expressed by observers. When we compared the influence of participant’s own similarity ratings with that of others on behaviour (Fig. 2B-C), we found a significantly larger positive slope in error rate (t(22) = 8.30; p<.0001) and in response times (t(22) = 9.68; p<.0001) for participants’ own ratings. This pattern of behaviour suggests that perceived visual similarity between exemplars influenced participants’ discrimination performance during scanning, and, critically, that it did so in an observer-specific manner.
Patterns in Multiple VVS Regions Predict Perceived Visual Similarity Structure Among Exemplars
To investigate whether the similarities between activation patterns in PrC and downstream alErC predict perceived similarities between exemplars in observers’ reports, we employed anatomically defined ROIs and created participant-specific models of neural similarity between all 40 object exemplars on the no-response trials. In order to examine the anatomical specificity of our findings, we also created such models for ROIs in other VVS and medial temporal-lobe regions. Specifically, these ROIs included early visual cortex (EVC), lateral occipital cortex (LOC), posteromedial entorhinal cortex ErC (pmErC), parahippocampal cortex (PhC), and temporal pole (TP) for comparison (see Fig. 5 for visualization; note that LOC and PhC have typically been included in ROI definitions of IT in prior work; e.g., Charest et al., 2014). Pairwise dissimilarities of neural patterns were employed to compute the brain-based RDMs (Fig. 3A-B); Pearson’s correlations were calculated so as to examine whether these RDM’s predicted participants’ own behaviour-based RDMs that were derived from their offline reports of perceived similarity (Fig. 3C). Our analyses revealed that neural activation patterns in PrC and alErC did indeed correlate with participants’ perceived visual similarity RDMs (Bonferroni-corrected p<.01). Patterns in other regions of the VVS (EVC; p<.003; LOC p<.002) were also significantly correlated with these behaviour-based RDMs. Critically, patterns in regions previously implicated in scene processing, specifically PhC and pmErC (Schultz et al., 2015; Maass et al., 2015; Schroder et al., 2015), did not predict the perceived similarity structure for objects (all p > 0.5). Having established that activity patterns in multiple VVS regions predict perceived visual similarities between exemplars, we followed up on this finding by asking whether PrC and alErC capture these similarities with higher fidelity than earlier regions (EVC and LOC). Towards this end, we next examined whether these regions predict similarity structure even when exemplars only differ from each other in subtle ways.
PrC and alErC are the Only Regions Whose Patterns Predict Perceived Visual Similarity Structure Among Exemplars When Similarity Is High
In this set of analyses, we examined the mapping between perceived similarity structure and neural activity patterns at a more fine-grained level within categories. Given prior suggestions that PrC allows for the disambiguation of highly similar objects (Murray & Bussey, 1999; Buckley & Gaffan, 2006; Bussey & Saksida, 2007; Graham, Barense, & Lee, 2010; Kent, Hvoslef-Eide, Saksida, & Bussey, 2016), we anticipated that patterns in PrC, and possibly downstream alErC, would represent even the most subtle visual differences that observers perceive between exemplars, whereas earlier VVS regions would not. To address this issue with RSA in our stimulus set, participants’ behaviour-RDMs, which were based on 6 pairwise distances between exemplars in each category, were divided into the three levels of similarity (low, medium, high; see Figure 1B and Supplementary Figure 1C for range comparison). Recall that our behavioural analyses revealed, as previously described, that discrimination performance on the Category-Exemplar 1-Back task was highly sensitive to these different levels of similarity. Figure 4 displays the results of our level-specific fMRI analyses, which were conducted for those regions whose activity patterns showed significant correlations with participants’ full perceived visual similarity space between exemplars (as shown in Fig. 3D). Correlations between perceived similarity structure and activity patterns in PrC and alErC were significant at all three levels of similarity (Bonferroni-corrected p < .01; Fig. 4A). In contrast, correlations for activity patterns in posterior VVS regions were significant only at the lowest level of perceived similarity (Bonferroni-corrected p < .01). This pattern of results cannot be attributed to differences in stability of activity patterns across levels of similarity for different regions; supplementary analyses revealed that stability was significant for all regions and did not interact with level of similarity (see Supplementary Figure 2). Direct comparison between regions also revealed that correlations in Prc and alERC were higher than in LOC and EVC at medium and high levels of perceived similarity (Bonferroni-corrected p<.05; Fig. 4A).
The described results suggest that object representations in posterior VVS regions may not have sufficient fidelity to allow for differentiation of exemplars that are perceived to be highly similar by observers, and that are most difficult to discriminate on our Category-Exemplar 1-Back task. We followed up on this idea with complementary classification analyses of our fMRI data using a linear support vector machine (Mur, Bandettini, & Kriegeskorte, 2009). These analyses were conducted so as to examine in which VVS regions activity patterns associated with exemplars of high perceived similarity would be sufficiently separable so as to allow for classification as distinct items. They confirmed that activity patterns associated with specific exemplars can indeed be successfully classified in PrC and alErC at higher levels of perceived similarity than in posterior VVS regions (Figure 4B for further detail).
The analyses presented so far only focused on specific regions of interest. To answer the question of whether PrC and alErC are the only regions whose activity patterns predict perceived visual similarity between exemplars within categories, we also conducted whole-volume searchlight-based RSA. As expected based on our ROI analyses, patterns in PrC and alErC, as well as in earlier VVS regions, showed a predictive relationship in these searchlight analyses when the full range of perceived visual similarity values between exemplars was considered; no regions in the scanned brain volume outside of the VVS exhibited this predictive relationship (Threshold-Free Cluster Enhancement corrected p<.05; Fig. 5C-D). Critically, our searchlight analyses revealed that PrC and alErC were indeed the only regions in the entire scanned brain volume whose patterns correlated with observer’s reports when similarity between exemplars was perceived to be high, and objects were most difficult to discriminate on the Category-Exemplar 1-Back task (TFCE-corrected p<.05; Fig. 5E-F).
Patterns in PrC and alErC Predict Perceived Visual Similarity Structure Among Exemplars in an Observer-Specific Manner
In order to determine whether the described brain-behaviour relationships in PrC and alErC were observer-specific, we leveraged the variability across participants’ similarity RDMs as reflected in their offline reports of perceived similarities (Fig. 1C). We reasoned that if neural patterns in a region represent the observer’s unique perceived similarity structure, brain-behaviour correlations should be higher when calculated within rather than between participants (Fig. 6A black vs grey arrows; see Charest et al., (2014) for further discussion of rationale). In other words, if there are observer-specific relationships, activity patterns should predict participants’ own perceived similarity structure better than someone else’s. Such analyses would reveal that interindividual differences are not only present in the perceptual reports of observers and in their discrimination performance, as demonstrated in our behavioural analyses, but also in corresponding neural representations in PrC and alErC. Indeed, initial analyses of our fMRI data demonstrated the presence of stable observer-specific activity patterns for the object exemplars probed in our study in all regions of interest (see Supplementary Materials Figure 3). The i-index introduced by Charest et al. (2014), which directly measures differences in correlations for the same (i.e., own) versus other observers, allowed us to examine which of these observer-specific activation patterns predict observer-specific structure in reports of perceived similarities. Critically, these analyses confirmed our expectation that the neural activation patterns in PrC and alErC predict observer-specific perceived visual similarity structure (Bonferroni-corrected P[within-participant r > between-participant r]<.01). In contrast, activation patterns in posterior VVS regions, EVC and LOC (Bonferroni-corrected P[within-participant r> between-participant r]>.05) did not uniquely predict participants’ own perceived similarity structure (Figure 6B). Not surprisingly, regions that did not predict participants’ perceived similarity structure at all (pmErC, PhC, and TP; Figure 3D), also did not have significantly above zero i-indices (Bonferroni-corrected P[within-participant r > between-participant r]<.05). Therefore, although patterns in multiple VVS regions predicted perceived similarity structure within categories, these results suggest not all regions represent this structure in a manner that honors those aspects of similarity perception that are unique to observers.
In our final set of analyses, we asked whether the relationships we found between object representations in PrC, alErC and observer-specific reports were uniquely tied to the fine-grained similarity structure that is present among exemplars within categories. Towards this end we modeled perceived visual similarities at the category (rather than exemplar) level, generating perceived similarity RDMs based on pairwise distances between category-centroids. These centroids were calculated using the participants’ similarity ratings obtained from the initial sorting of all objects in the same circular arena (see Supplementary Materials 4A for further methodological detail). Because we also noted variability in perceived similarities between category-centroids across observers (Supplementary Materials 4B), we determined whether activity patterns in these regions predict the observers’ unique perceived similarity structure at the category level. Results obtained with the brain-behaviour i-index revealed that only patterns in LOC showed a significant correlation that reflected an observer-specific relationship (Bonferroni-corrected P[within-participant r > between-participant r]<.01) (Supplementary Materials 4 D). Critically, this correlation for LOC was larger than the (non-significant) correlation in PrC (Bonferroni-corrected p<.05). These results suggest that object representations in PrC and alErC only capture the observer-specific structure of visual similarity at the level of exemplars, and that the larger-scale structure in object representations that characterizes perceived similarity relationships between categories is reflected in earlier, i.e., more posterior VVS regions.
Discussion
Vision neuroscience has made great strides in understanding how objects are represented in the VVS. How these VVS representations relate to qualities of objects that observers subjectively perceive and report has received limited examination so far. In the current study, we addressed this question by focusing on perceived similarities among exemplars of real world-categories. Using a novel Category-Exemplar 1-Back Task, we found that visual discrimination performance is sensitive to the unique visual similarity structure that is reflected in the observers’ own reports. Combining this task with fMRI scanning at ultra-high resolution allowed us to show, in line with our general hypotheses, that activity patterns in PrC and alErC at the apex of the VVS hierarchy predict perceived visual similarities among exemplars with higher fidelity than any other VVS region, including prediction of those aspects of similarity structure that are unique to individual observers.
Perception of similarities is the outcome of an active comparison process between multiple objects. For complex real-world objects, it relies on the integration of information across multiple feature dimensions, and this integration is characterized by a high degree of flexibility (Medin et al., 1993; Hebart et al., 2020). As such, it is perhaps not surprising that reported perceived similarities between objects vary across observers (see also Goldstone, 1994; Goldstone & Son, 2012), and have behavioural relevance in performance of object discrimination, as shown in the current study. Prior research revealing neural representations associated with observer-specific differences in object perception utilized select, highly familiar objects with unique personal significance, such as participants’ own shoes or their own vehicles (Charest et al, 2014). In the present study, by contrast, we focused on generic category exemplars that were selected from a normative data base without any reference to participants’ autobiographical context.
Thus, our findings extend this prior work by showing that even for category exemplars that lack the high familiarity of personally meaningful objects, perceived similarities vary across observers in ways that impact behaviour, and that can be predicted based on the structure of neural representations in the VVS. Critically, the VVS regions that predicted observer-specific perceived similarity structure among exemplars in our data were situated upstream from those regions in IT in which Charest et al. (2014) observed a predictive relationship. That we found this observer-specific relationship in PrC and alErC is consistent with the notion that object representations at the apex of the VVS are particularly important for fine-grained visual discrimination when observers have limited familiarity with specific exemplars (Erez et al., 2016; Liang et al., 2020).
While activity in posterior VVS regions, including LOC as part of the large swath of cortex that is often referred to as IT, predicted some aspects of perceived visual similarity structure in the present study as well, this relationship was observed primarily at the category level. For exemplars, it only held in our data when perceived visual similarity was low, and when behavioural performance revealed that exemplars were easily distinguishable. Indeed, our searchlight analyses revealed that PrC and alErC were the only regions in the cortical volume we scanned in which activity patterns were related to perceived visual similarity among highly similar exemplars. Furthermore, complementary pattern classification analyses revealed that activity patterns associated with exemplars of high perceived similarity were not sufficiently separable in more posterior VVS regions so as to allow for classification as distinct items. Overall, this pattern of findings across regions suggests that different regions along the VVS hierarchy may provide the readout for reports of perceived similarity depending on the level of precision that is required by the task at hand.
Work on representations in the VVS focusing on object shape has also shown that the distinction between perceived and objectively defined (physical) visual similarity is an important one to consider when aiming to elucidate the organization in this pathway.
Evidence from these studies suggest that coding of perceived shape similarity emerges gradually in the transformation of representations from posterior to more anterior occipital regions (Op de Beeck et al., 2008; Haushofer et al., 2008; see Schwarzkopf et al., 2011, Moutsiana et al., 2016 for findings on object size). In the current study, the exemplars employed for each category were not selected to differ on one specific feature dimension, such as shape. Instead, they were images of complex real-world objects that differed naturally on multiple dimensions, including their shape envelope, texture, and spatial configuration of internal shape features. That we observed a relationship between perceived visual similarity and activation patterns in PrC, as well as in alErC, suggests that VVS representations that capture subjective aspects of visual object perception are clearly not limited to occipital cortex. Rather, when complex real-world objects need to be discriminated they extend to regions in the temporal lobe, including regions at the apex of the VVS.
Prior fMRI research that has aimed to characterize the nature of representations in PrC with pattern analysis techniques has shown that the degree of feature overlap between objects is captured by activation patterns in this region. Such a relationship has been revealed in multiple task contexts, with images of real-world objects and with words denoting such objects; moreover, it has been observed for feature overlap at the perceptual as well as the semantic level (Clarke and Tyler, 2014; Erez et al., 2016; Bruffaerts et al., 2013; Martin et al., 2018). In a study by Martin et al. (2018), PrC was found to be the only region that captured this organization for both perceptual and semantic features of objects in an integrative manner. By contrast, activity in an anterior VVS region that covered the medial and lateral extent of the temporal pole (TP) was selectively related to degree of overlap in the semantic domain semantic features.
Notably, the iMDS task employed to probe perceived similarities in the current study specifically required object comparisons based on their visual (i.e. perceptual) similarity.
The absence of a predictive relationship between similarities in activity patterns in TP and perceived visual similarities in our data may therefore not be surprising. Although, as a negative finding this absence demands caution in interpretation, our results are consistent with the idea that it reflects the emphasis placed on visual rather than on semantic object characteristics in our study.
The summarized observed links between feature overlap and activity patterns in PrC, in combination with findings from neurophysiology and computational modeling, have been interpreted to suggest that PrC integrates features of objects with complex conjunctive coding into representations of whole objects, and that the resulting conjunctive representations allow for disambiguation of objects that are highly similar (Bussey & Saksida, 2002; Murray et al., 2007; Cowell et al., 2010) based on their degree of feature overlap. Indeed, it is this type of conjunctive coding that has motivated the idea that PrC, together with alErC (Connor & Knierim, 2017; Yeung et al., 2017), can be considered the pinnacle of the VVS object-processing hierarchy. It is worth noting that in prior fMRI studies on the role of feature overlap in PrC, similarity of objects has typically been measured based on feature statistics derived from normative data in feature generation tasks, which can be considered an objective marker of similarity. Yet, prior behavioural research has revealed that subjectively perceived similarity tends to show only modest correlations with objective measures of similarity defined by feature statistics (Iordan et al, 2018). The current findings show, in line with such evidence, that reported perceived visual similarity between exemplars varies across observers, and that this variability has behavioural relevance for performance in visual object discrimination. Our functional neuroimaging findings indicate that the structure of representations in PrC, and in downstream alErC, capture subjective aspects of perception that are reflected in this variability, even when objective similarity, as defined by normative feature statistics, is the same. Critically, this is not to say, however, that subjectively perceived aspects of objects may not be expressed in a conjunctive code. Rather, we propose that the coding of whole objects based on complex feature conjunctions in PrC and alErC could afford the flexibility that is required to capture these differences in perception across individual observers.
Findings from lesion studies conducted with oddity tasks also speak to a critical role of PrC in processes required for the visual comparison of complex real-world objects (including faces; Buckley et al., 2001; Barense et al., 2007; Bartko et al., 2007; Inhoff et al., 2019). Numerous studies conducted in humans and in other species have shown that performance on such tasks relies on the integrity of PrC specifically when objects with high visual feature overlap must be compared (c.f., Stark and Squire, 2000; Levy et al., 2005; Hales et al., 2015). Behaviour on visual oddity tasks is relevant for understanding perceived similarity, because such tasks require that participants judge which item in a set of simultaneously presented objects is perceived to be most different from the other ones (i.e., is the odd-one out). For example, Barense et al. (2007) compared performance on multiple visual oddity tasks between individuals with lesions in the medial temporal lobe that largely spared PrC and ErC, versus individuals with more widespread damage in the medial temporal lobes that included PrC and ErC. Most notably, individuals in the latter but not in the former group showed impairments in identifying the odd-one out item in a set of images of real-world objects that shared a high number of overlapping visual features. Performance was intact in these individuals when the images of objects in a set did not have significant feature overlap, as well when oddity judgements were required for sets of simple shapes that differed in size or colour. While the results of such lesion studies are compatible with the conclusions we draw in the current study, it is important to note that they do not allow for characterization of similarity structure of neural representations in PrC and alErC, nor do they address whether any such representations capture the perceived similarity structure that is unique to individual observers. Prior fMRI studies conducted with visual oddity tasks for objects have also not addressed similarity structure of representations, nor have they probed observer-specific effects (Lee, Scahill, & Graham, 2007; Barense et al., 2010; Lee, Broderson, & Rudebeck, 2013). Indeed, it would be difficult, if not impossible, to characterize neural representations of individual items with the simultaneous presentation of multiple objects in a typical oddity-task setup.
The current study was not designed to address the origin of the observer specific effects we revealed in behaviour and in the neural representational structure in PrC and alErC. One possibility is that they pertain to differences in expertise across observers with the categories examined. Extant evidence that supports such an interpretation comes from recent behavioural research revealing that the representational structure that is reflected in similarity judgments for exemplars changes even after just a few days of repeated exposure, and that the precise nature of this change depends on the familiarity of the category in question (Collins & Behrmann et., 2020). An important direction for future research therefore is to elucidate the role of expertise in observer-specific similarity structure among exemplars in PrC, alErC, and in other VVS regions. It is likely that expertise induces changes not only in how information along a specific feature dimension is processed, but also in how observers weigh different feature dimensions, such as shape and texture, in judgments of visual similarities among complex objects. A promising idea that deserves targeted investigation is that the conjunctive codes in regions at the apex of the VVS reflect this unique weighing of multiple object dimensions by different observers.
A final point worthy of discussion is the anatomical specificity of our findings in the medial temporal lobe. While activity patterns that reflected the similarity structure among exemplars were present in PrC and alErC, they were absent in medial-temporal regions that have previously been implicated in visual discrimination of scenes, specifically pmErC and PhC cortex (see Schultz et al., 2015, for review). This specificity is striking because there are also documented differences in functional connectivity between these regions that have been linked to object versus scene processing, with PrC being connected to alErC and PhC being connected to pmErC, respectively (Maass et al., 2015; see Schröder et al., 2015 for other differences in functional connectivity between these ErC subregions). Another promising avenue for future research will be to determine whether pmErC and PhC capture the visual similarity structure among exemplars of scene categories (e.g., forest scenes) that observers perceive and report.
In sum, the present findings show that perceived visual similarities among members of real-world object categories influence discrimination performance and can be predicted based on the similarity structure of neural activity in cortical regions at the apex of the VVS object processing hierarchy. As such, our findings provide new evidence that representations of the visual world differ across observers in ways that are behaviourally relevant, and that this observer-specific organization can be captured at the neural level with current neuroimaging techniques.
Material and Methods
Participants
A total of 29 participants completed the perceived similarity inverse multi-dimensional scaling arrangement task and fMRI experiment (12 females; age range = 18-35 years old; mean age = 24.2 years). All participants were right-handed, fluent in English, and had no known history of psychiatric or neurological disorders. Three participants were removed due to excessive head motion above the cut-off of 0.8 mm of framewise displacement, two participants were removed due to behavioural performance accuracy 2 SD below the average on the fMRI task, and two participants were removed due to temporal signal-to-noise ratio 2 SD below average. Therefore, 23 participants were included in the final analyses. All participants gave informed consent, were debriefed, and received monetary compensation upon completion of the experiment. This study was conducted with Western’s Human Research Ethics Board approval.
Stimuli
In order to investigate high-level object representations, we selected stimuli with varying levels of perceived visual similarity from the Migo Normative Database (Migo et al., 2013). We used 40 greyscale images of objects from 10 categories (Supplementary Figure 1A). Each category was made up of four exemplars that all shared the same name (e.g., apple, lipstick, stapler). Based on findings from a pilot study (n = 40), stimuli with similar perceived similarity ratings to the normative findings in the database were selected.
Modeling of Perceived Visual Similarity Structure
In order to obtain observer-specific models of perceived visual similarity structure for our stimuli, participants provided reports of perceived similarity between all stimuli on a computer outside of the scanner prior to scanning. Participants were seated in front of a monitor and completed a modified version of the inverse multi-dimensional scaling task (iMDS; Kriegeskorte & Mur, 2012). Specifically, participants were asked to drag-and-drop images into a white circle (i.e., arena), and arrange them according to perceived visual similarity (Kriegeskorte & Mur, 2012; see Figure 2A). Objects perceived to be more visually similar were placed closer together and objects perceived to be less visually similar were placed further apart. The iMDS task consisted of two phases. In the first phase, participants arranged all 40 stimuli according to perceived visual similarity (Supplementary Figure 3A). All unique pairwise distances were converted to dissimilarity percentile and used to compute observer-specific models of the between-category similarity space. In the second phase, participants completed 10 category-specific trials in which they sorted four exemplars from the same category according to their perceived visual similarity. They were instructed to use the entire space within the circle, and make sure they compared each stimulus to every other stimulus. A MATLAB-based toolbox was then used to calculate distances between each pair of exemplars, and to convert these distances to dissimilarity percentiles (Kriegeskorte & Mur, 2012). These dissimilarity percentiles were then used to create observer-specific behaviour-based RDMs that represented each observers’ perceived similarity space at the exemplar level.
The behaviour-based RDMs for the entire range included 6 dissimilarity percentiles for each of the 10 categories (i.e., 6×10=60 dissimilarity percentiles)
These behaviour-based RDMs, which captured the full range of perceived similarity, were used to create 3 behaviour-based RDMs to reflect 3 levels of perceived similarity (low, medium, high). The 6 pairwise distances (expressed as dissimilarity percentile) per category were sorted into the two largest, two medium, and two smallest dissimilarities to create behaviour-based RDMs for low, medium, and high perceived visual similarity, respectively. These behaviour-based RDMs for each of the levels included 2 dissimilarity percentiles for each of the 10 categories (i.e., 2×10=20 dissimilarity percentiles). In order to ensure that the different levels of perceived similarity were non-overlapping, any values that did not allow for at least 0.1 dissimilarity percentile between each of the successive levels (i.e., high-middle, middle-low) was excluded. The range of dissimilarity percentiles did not differ significantly between the different levels of perceived similarity (p>.05; Supplementary Figure 1C).
Category-Exemplar 1-Back Task
For the main experiment, participants completed a variation of a 1-back task, coined the “category-exemplar 1-back” in the 3T scanner. We created this new 1-back task to ensure that participants were attending carefully to each individual object, given our interest in fine-grained object discrimination. Like in a classic 1-back task, participants were shown a stream of individual objects and asked to indicate with a button press when the object was an exact repeat of the object previous to it, and no response was required when the object was from a different category as the one previous (Figure 2A). Our novel twist was the addition of a second response option, whereby participants were asked to indicate with a different button when the object was from the same category as the previous one, but a different exemplar. The two response trial types served as catch trials to ensure participants’ attention focused on differences between objects across consecutive trials, and to assess behavioural performance. These modifications of the classic 1-back task were introduced to ensure that participants attended closely to each individual object and engaged in object processing at the exemplar level. Specifically, successful identification of the repetition of different exemplars from the same category could not be based on local low-level features, such as changes in luminance, texture, or shape across consecutive trials. Participants used their right index and middle finger to respond, which was counterbalanced across participants. Of the three trial types— exemplar repeat, category repeat, different category—only the no response trials (i.e., different category) were used in the fMRI analysis to avoid motor confounds associated with button-presses. By extension, none of the trials considered for assessment of similarity in activation patterns were immediate neighbours.
Participants completed a total of eight functional runs that each lasted 4 min (stimulus duration = 1.2 s, inter-trial interval = 1 s). Run order was counter-balanced across participants. Within each run, each of the 40 exemplars were presented three times as no-response trials, and once as a catch trial, for a total of 23 presentations on no-response trials and 8 catch trials (excluded from fMRI analyses) per exemplar across the entire experiment. Prior to scanning, each participant completed a 5 min practice task with images from categories not included in the functional scanning experiment.
fMRI Data Acquisition
MRI data were acquired using a 3 T MR system (Siemens). A 32-channel head coil was used. Before the fMRI session, a whole head MP-RAGE volume (TE = 2.28 ms, TR = 2400 ms, TI = 1060 ms, resolution= 0.8 × 0.8 × 0.8 mm isometric) was acquired. This was followed by four fMRI runs, each with 300 volumes, which consisted of 42 T2*-weighted slices with a resolution of 1.7 × 1.7 mm (TE = 30 ms, TR = 1000 ms, slice thickness 1.7 mm, FOV 200 mm, parallel imaging with grappa factor 2). T2* weighted data were collected at this ultra-high resolution so as to optimize differentiation of BOLD signal in anterolateral versus posterior medial entorhinal cortex. The T2* slices were acquired in odd-even interleaved fashion in the anterior to posterior direction.
Subsequently, a T2-weighted image (TE = 564 ms, TR = 3200 ms, resolution 0.8 × 0.8 × 0.8 mm isometric) was acquired. Finally, participants then completed four more fMRI runs. Total duration of MRI acquisition was approximately 60 min.
Preprocessing and Modeling
MRI data were converted to brain imaging data structure (BIDS; Gorgolewski et al., 2016) and ran through fmriprep-v1.1.8 (Esteban et al., 2018). This preprocessing included: motion correction, slice time correction, susceptibility distortion correction, registration from EPI to T1w image, and confounds estimated (e.g., tCompCor, aCompCor, framewise displacement). Component based noise correction was performed using anatomical and temporal CompCor, aCompCor and tCompCor, by adding these confound estimates as regressors in SPM12 during first level GLM (Behzadi, Restom, Liau, & Liu, 2007). Each participant was co-registered to the participant-specific T1w image by fmriprep. First-level analyses were conducted in native space for each participant with no spatial smoothing to preserve ultra-high resolution patterns of activity for MVPA. Exemplar-specific multi-voxel activity patterns were estimated in 40 separate general linear models using the mean activity of the no-response trials across runs.
Region of Interest (ROI) Definitions for fMRI Analyses
Anatomical regions of interest were defined using multiple techniques. Automated segmentation was employed to delineate PrC, ErC, and PhC (ASHS; Wisse et al., 2016). We manually segmented each ERC obtained from ASHS into anterolateral and posteromedial entorhinal cortex following a protocol developed by Olsen and colleagues (2017), which is derived from a functional connectivity study (Maass et al., 2016). A probabilistic atlas was used to define EVC (Wang et al., 2015) and TP (Fischl, 2012). A functional localizer was used to define LOC as the contiguous voxels located along the lateral extent of the occipital lobe that responded more strongly to intact objects than scrambled objects (p<.01, uncorrected; Proklova et al., 2016).
In the VVS, we focused on lateral occipital complex and the temporal pole as they have previously been linked to object processing (e.g., Grill-Spector, Kourtzi, & Kanwisher, 2001; Martin et al., 2018), as well as early visual cortex. In the MTL, we included ROIs for the posteromedial ErC and parahippocampal cortex, both of which have been linked to scene processing (e.g., Maass et al., 2015; Schroder et al., 2015; Schultz et al., 2015; Epstein & Baker, 2020).
Representational Similarity Analyses of fMRI data
For each ROI, linear correlation distances (Pearson’s r) were calculated between all pairs of exemplar-specific multi-voxel patterns using CoSMoMVPA toolbox in Matlab (Oosterhof, Connolly, Haxby, 2016) across all voxels. These correlations were used to create participant-specific brain-based RDMs (1-Pearson’s r), which captures the unique neural pattern dissimilarities between all exemplars within each category (n=10), within each region (n=8).
Whole-volume RSA was conducted using surface-based searchlight analysis (Kriegeskorte et al., 2006; Oosterhof et al., 2011; Martin et al., 2018). Specifically, we defined a 100-voxel neighborhood around each surface voxel, and computed a brain-based RDM within this region, analogous to the ROI-based RSA. This searchlight was swept across the entire cortical surface (Kriegeskorte et al., 2006; Oosterhof et al., 2011). First, the entire perceived similarity RDM for all within category ratings was compared to each searchlight. These brain-behaviour correlations were Fisher transformed and mapped to the centre of each searchlight for each participant separately. Participant-specific similarity maps were then standardized and group-level statistical analysis was performed. Threshold-free cluster enhancement (TFCE) was used to correct for multiple comparisons with a cluster threshold of p<.05 (Smith and Nichols, 2009).
Author Contributions
Kayla M. Ferko, Conceptualization, Software, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing; Anna Blumenthal, Conceptualization, Investigation, Data curation, Methodology, Writing—original draft, Writing—review and editing; Chris B. Martin, Formal Analysis, Writing—review and editing; Daria Proklova, Methodology, Sotware, Data curation, Formal analysis; Lisa M. Saksida, Conceptualization, Writing— review and editing; Timothy J. Bussey, Conceptualization, Writing—review and editing; Ali Khan, Supervision, Funding acquisition, Software, Data curation, Methodology; Stefan Köhler, Conceptualization, Supervision, Funding acquisition, Methodology, Formal, analysis, Project administration, Writing—original draft, Writing—review and editing.
Competing Interests
The authors declare no competing interests.
Acknowledgments
This work was supported by a Canadian Institutes for Health Research Project Grant (CIHR Grant # 366062) to A.K. and S.K.. K.F. was funded through a Natural Sciences and Engineering Research Council doctoral Canadian Graduate Scholarship (NSERC CGS-D) and an Ontario Graduate Scholarship (OGS). A.B was funded through an Ontario Trillium Scholarship for Doctoral study in Canada.
Footnotes
↵1 Indicates shared first-authorship