Abstract
Mapping the origin of human cognitive traits typically relies on comparing behavioral or neuroanatomical features in human phylogeny. However, such studies rely on inferences from comparative relationships and do not incorporate neurogenetic information, as these approaches are restricted to experimentally accessible species. Here, we fused evolutionary genomics with human functional neuroanatomy to reconstruct the neurogenetic evolution of human brain functions more directly and comprehensively. Projecting genome-wide selection pressure (dN/dS ratios) in sets of chronologically ordered mammalian species onto the human brain reference space unmasked spatial patterns of cumulative neurogenetic selection and co-evolving brain networks from task-evoked functional MRI and functional neuroanatomy. Importantly, this evolutionary atlas allowed imputing functional features to archaic brains from extinct hominin genomes. These data suggest accelerated neurogenetic selection for language and verbal communication across all hominin lineages. In addition, the predictions identified strategic thought and decision making as the dominant traits that may have separated anatomically modern humans (AMH) from archaic hominins.
Introduction
Exploring the origins of the human mind holds the anthropocentric promise of uncovering the traits that may set us apart from our closest relatives, alive or extinct. Various initiatives have provided valuable insight into the evolution of the extraordinary size, shape (1–5) and function (6, 7) of the AMH brain. Comparative neuroanatomy traced cognitive evolution from incremental neuroanatomical changes along the mammalian lineage (8–13). Typically, descriptive quantitative measures, such as increases in regional volumes, gene expression boundaries or cell numbers, serve as proxies for evolutionary selection (Methods Table 1). However, these changes primarily reflect structural organization of a single parameter at gross resolution and omit a functional network context. Therefore, these approaches rather indirectly trace the functional evolution of the brain, which occurs often with similarly sized brains or brain areas that show different functional organization and/or have undergone different evolutionary pressures. Moreover, comparative neuroanatomy does not trace the underlying neurogenetic evolution per se and is limited to experimentally accessible species, mapped at deep anatomical resolution. The latter is a problem for archaic hominins as only endocasts of skulls remain, which restricts insights to inferences from the brain surface. Another line of research in evolutionary genetics, which can analyze archaic material, infers differences in cognitive traits from genetic mutations (14, 15) in human phylogeny. While extremely informative, these studies typically interpret these mutations in isolation, omitting the compound effect of multi-genic co-evolution of functional brain networks.
Here, we propose that the selection pressure on the mammalian brain over evolutionary history left genomic signatures that can be mined for direct and deeper insight into the neurogenetic functional evolution across the human lineage. Genetic information has the key advantage that it not only contains genome-wide signatures of evolutionary events but also correlates with the mesoscale functional organization of brain networks (16, 17). Newly developed, highly parallel sequencing techniques have generated whole genomes of both extant (18) and extinct species (19– 21) along the mouse-AMH evolutionary tree. Using genetic (22), connectomic (23) and behavioral or psychiatric (24) brain-data initiatives, we can now relate genetic features directly to functional brain networks and behavioral traits (16, 17, 25, 26). Importantly, these approaches can trace genetic and functional co-evolution directly within the AMH brain framework and do not rely on interpolating associations between analogous structures as in classical comparative strategies.
Results
Here, we adapted such computational workflows (17) to systematically integrate genome-wide phylogenetic selection pressure into a brain-wide evolutionary atlas of the AMH brain. These resources allow reconstruction of the neurogenetic and functional evolution from the common ancestor with rodents, along the primate lineage, to bipedal hominins and humans.
Initially, this strategy required tracking genome-wide patterns of selection pressure on the mammalian phylogeny linking the mouse with AMH (Fig. 1A). This evolutionary framework covered eight major diversification episodes along the primate lineage leading to AMH - from common ancestry with rodents (mouse) through prosimians (bush baby), New World monkeys (marmoset), Old World monkeys (macaque), lesser apes (gibbon), great apes (chimpanzee) and finally the known extinct hominins Denisovan and Neanderthal. We inferred gene-wise selection pressure between successive pairwise species comparisons (PSCs, Fig. 1A bottom) by calculating the ratio of nonsynonymous to synonymous nucleotide substitutions (dN/dS) from pairwise alignments of protein-coding gene orthologs (data retrieved from Ensembl (18) and JBrowse (27), see Methods). The dN/dS ratio is a robust, gene-wise measure of selection pressure that is sensitive to evolutionary adaptation (28, 29) and does not rely on population data, which are mostly unavailable. dN/dS values are also a suitable proxy for overall functional selection and should also reflect the overall phenotypic contributions from co-evolving non-coding regulatory sequences, indels, and species-specific genes/duplications. Using brain-expressed gene sets built on PSCs homologs provides sufficient details to model multi-genic neurocognitive traits of AMH (16). Thus, in our PSC approach, dN/dS values should reliably indicate the most dominant genetic evolution that occurred after the split from each Last Common Ancestor (LCA). Therefore, the combination of phylogenetically ordered, pair-wise comparisons is a straightforward means of tracing selection pressure along evolutionary lines, in this case, along the human lineage.
We then focused our analysis on neurocognitive functions by constraining the homolog pool to brain-expressed genes of the Allen Brain Atlas (ABA), a high resolution database of brain gene expression (Methods)(22). We thus obtained the continuous neuro-genetic evolution along the mammalian evolutionary sequence leading to AMHs using multiple species as reference. As expected, we observed diverse trends in selection pressure, as represented by several clusters of co-evolving genes (Fig. 1B). To trace selective forces, our further analyses utilized the rank-ordered dN/dS genes found in the ABA within each PSC to capture synergistic, multi-genic effects of coevolving genes in brain networks.
Fusing these genetic data with the AMH brain gene expression atlas approximates synergies in molecular evolution directly onto AMH neuroanatomy. So, we first generated cumulative neuroanatomical dN/dS weighted maps for every PSC (Fig. 2) in the AMH reference brain at subregional mesoscale resolution (23) using the ABA gene expression database (22). Then, we combined these data (Methods, Computational neuroanatomy, Generating evolutionary maps) into a temporo-spatial map of hot spots of high selection pressure (Fig. 2). Consequently, the resulting evolutionary landscape traced the phylogenetic history of the AMH brain through successive LCAs. We note this map is purely genetically driven and largely independent of establishing phylogenetic homology among brain areas. We found that the load of the most highly selected genes accumulated in different neuroanatomical areas for different PSCs (see the Methods Table 2 for anatomical-neurocognitive annotation). There was a gradual shift in selection pressure, acting more on subcortical regions in PSCs 1-5, then increasing over cortical regions (PSCs 6-8). Basal forebrain, basal ganglia, brain stem and cerebellum were under selection pressure in all comparisons. Different thalamic nuclei, hippocampal, amygdalar and cortical parts selection varied between PSCs (Fig. 2).
To highlight such changes lineage-wide, we computed a combined dN/dS atlas, weighted by the temporal profile of selection pressure amongst the PSCs. In general, subcortical regions (e.g. the striatum, basal forebrain and brain stem) showed signatures of selection from rodent to primates (e.g. Fig. 3, PSC1-5, dark blue-yellow). Notably, one structure evolving at the highest genetic rate early in the evolutionary transition to primates was the claustrum (Fig. 3, PSC1, dark blue), which is a central element in a framework for consciousness (30).
In addition, our approach can track neuroanatomical selection across evolutionary transitions from genomes where brain-wide (non-surface) neuroanatomical data is out of reach. In contrast to early PSCs, traces of late selection pressure in hominid evolution (great apes and hominins) predominantly accumulated in the cortex (Fig. 3, PSC6-8, orange-dark red). Accelerated selections within archaic hominins (Fig. 3, PSC7, light red) and AMHs (Fig. 3, PSC8, dark red) localized mainly within sub-regions of prefrontal, temporal or somatosensory associative cortex, but also in the thalamus and amygdala. It is striking that all these structures subserve higher cognitive functions, spiritual beliefs (31), language (speech comprehension (Wernicke’s area) and production (Broca’s area), reading (angular gyrus (32)) or facial recognition (33). Brain structures with the highest selection pressure marking divergence of Neanderthal and AMH involved cortical (auditory, parietal – especially precuneus, occipito-temporal), subiculum, thalamic and amygdalar areas (Fig. 3, PSC8, dark red). Collectively, these evolutionary maps are in line with comparative data from primate cognitive evolution and provide a neuroanatomical framework for many previous inferences about cognitive evolution in hominins (Methods Table 1).
Since these evolutionary brain maps suggested that neurogenetic selection follows specific functional needs, we utilized data from functional brain networks initiatives, in conjunction with evolutionary neuroanatomy data, to directly trace this functional evolution. To link temporal evolutionary patterns with spatially registered brain function, we performed a temporo-spatial analysis by GABi biclustering (34). Specifically, we clustered genes based on their ranked dN/dS values with their spatial expression correlated to multiple AMH functional brain networks. These functional networks consisted of task-evoked functional MRI (TNs) (Fig. S1, Methods Table 3, note TNs in left and associated traits in right columns) from the Human Connectome Project (HCP) (24) and networks for specific brain functions from literature (FNs) (Fig. S1, Methods Table 4, note FNs in left and associated traits in right columns). We tuned GABi biclustering with custom criteria to identify the largest diverse biclusters of co-evolving genes (i.e. genes that have >=0.90 ranked dN/dS over multiple PSCs) with high specificity for functional networks (i.e. genes with high network correlation for the same networks) from Methods Table 3 & 4 (Fig. 4A). This strategy yielded statistically stable biclusters (Methods), which could be further organized into modules (M1-7). Importantly, each module containing the same PSCs could then be chronologically ordered (Fig. 4A). To visualize these neurogenetic co-evolution networks in the brain, we generated a 3D representation of the most prominent networks within each bicluster module (Fig. 4B). These data illustrate the evolutionary selection of networks for attention, consciousness and emotion during early primate evolution and for language, working memory, strategic thought, and motor control in hominin and AMH lineages. In clusters representing the earliest LCAs in the mammalian lineage, most selection pressure accumulated in networks for attention (visuospatial and fronto-parietal FNs) (Fig. 4B, Module 1). In addition to attention (salience FN), selection pressure then extended to networks for consciousness (default mode FN (DMN)) and emotion (EM-FACES TN) (Fig. 4B, Module 2). Subsequent evolution selected for motor control (sensorimotor FN), higher cognition (central executive (CEN) and RE-AVG networks) and attention (ventral attention FN) (Fig. 4B, Module 3, 4 and 5). In contrast, great ape-hominin diversification selected for theory of mind (SO-TOM TN), cognitive emotional control (prefrontal-amygdala FN), motivation (prefrontal-accumbens FN) and awareness (dorsal attention FN) (Fig. 4B, Module 6). Most strikingly, in hominins language (LA-AVG TN), working memory (WM-FACE, WM-REST TNs), strategic thought (GA-REWARD, GA-AVG TNs) and motor control (MO-RH-LH, MO-RF-LF TNs) emerged as the traits under highest evolutionary pressure (Fig. 4B, Module 7).
The biclustering approach directly related neurocognitive evolution to functional co-evolution at the genetic level, identifying gene sets potentially driving the observed functional evolution within a given brain network. Among the genes correlating with language task, we identified a key gene FOXP1, which has been linked to speech impairment and intellectual disability (35, 36), and is the closest homolog of FOXP2, which shows similar expression patterns in the brain. While FOXP2 itself differs between chimpanzees and hominins (14), it is too similar among hominins for detection here. Several other genes from this cluster are associated with cognitive disabilities and neuroanatomical abnormalities like macrocephaly (DVL1(37)), microcephaly (MCPH1 (38)), autism (SHANK2 (39)), bipolar disorder-and circadian rhythm-linked gene CLOCK (40), which was also identified as a hub and enriched gene in AMH-specific transcriptional networks of the frontal pole (41). In addition, we performed Ingenuity Pathway Analysis (IPA) (42) on all clusters to gain a deeper insight into the genetic composition of the clustered networks. We focused our IPA analysis on Diseases & Functions in Nervous System, to find the functional associations within molecular-to-system-level neurobiology. IPA results show multiple behavioral, psychiatric, neurophysiological, structural and molecular functions which coincide with the diversification periods analyzed here. Many associations point to memory, cognition, brain size, and organization of synapses. The highlights of this study are shown in Table 1. Interestingly, genes in modules M1-4 are linked to neurodegenerative diseases. Module 6 covering chimp-Denisovan selection pressure shows associations with increasing brain size. The genetic component of Module 7 (PSCs 7&8), which correlates with language, working memory and strategic thought (Fig. 4B, Module 7), was associated with memory and learning. Complete analysis results can be found in Supplementary Tables 1-7.
To explore archaic-to-AMH brain evolution more deeply, we next ran the biclustering solely on the hominid data, which includes PSCs 6-8 (Fig. 5). With this, we achieved a better segregation of functional features under selection pressure among the closest relatives of AMH. The method delineated clusters of functional networks selected between great apes and hominins or among anatomically modern and archaic humans, which pointed to the similar findings as in the mammalian lineage approach (e.g. language, working memory and motor functions in hominins). Importantly, we obtained a PSC8 specific cluster highly correlating with the GA-REWARD TN, indicating strategic thought as the fastest evolving trait between Neanderthals and AMHs. Paralleling these patterns of functional neuroanatomical evolution, Ingenuity analysis (Table 2) functionally associated the genetic clusters with cognitive traits (working memory, learning, cognition) and psychiatric disorders (schizophrenia, impulsivity), among others (Supplementary Tables 8-10).
Together, these findings provide a critical neurogenetic framework explaining evolutionary ethology, archeological records and genetic data (Methods Table 1). Importantly, they identify neurocognitive traits critical in recent hominin evolution and the underlying gene sets driving their evolution.
Discussion
This study integrates rates of genetic evolution along the mammalian/primate lineage into an evolutionary atlas of the AMH brain. We exploited recent big data initiatives from genomics and brain functions to explore evolutionary events that ultimately shaped the mind and behavior of AMHs. Our findings utilize an innovative in silico approach, fusing genomic information with brain data to associate multi-genic behavioral traits with brain functional networks (43). We could map the cumulative load of evolutionary-weighted genes into AMH functional brain networks, genome-and brain-wide, which reconstructed relative selection pressures on brain functions at each evolutionary ‘transition’ from mouse to AMH.
This computational strategy uses and explores compound neurogenetic effects on brain networks, which are untraceable when studied in isolation. From the genetic perspective, the evolution of functional traits is inherently multi-genic. Therefore, single gene studies in mouse or organoid models, while mechanistically insightful (44), cannot easily assess such genetic synergies on cognitive traits emerging from brain-wide functional networks. This makes it difficult to unmask the evolution of complex cognitive traits and to identify potential candidate gene sets driving evolution in this way. Our study will therefore complement such functional evolutionary genetic models with an atlas of coevolving genes and functional neuroanatomical networks. Future functional exploration of these gene sets and networks in suitable experimental systems should reveal potential mechanisms driving the neurocognitive divergence between the respective species.
We showcase an approach for reconstructing the evolutionary history of functional selection using the genetic remains of species long extinct by projecting compound phylogenetic evolutionary weights onto a functional reference framework. This strategy may be useful to functionally explore biological systems not available for traditional experimental studies. Of note, the straightforward evolutionary genetic analysis could be refined by including an extended phylogeny into the computation of dN/dS values and/or other evolutionary measures, possibly leading to even deeper understanding of neurogenetic evolution. This notwithstanding, our workflow unraveled neurogenetic selection for complex neurocognitive traits in archaic hominin brains, like social interaction and communication, hand motor control, working memory for faces along with symbolic thought and abstract thinking (33). These findings delineate a critical neurogenetic framework for archaic cognitive abilities inferred from ancient art (45, 46). Strikingly, our computational neuro-archaeological data provide initial neurogenetic evidence that all ancient hominin and AMH lineages evolved networks for language. The active selection for language networks strongly suggests that all these archaic hominins could speak. Furthermore, this indicates that verbal communication evolved in the LCA to all three hominin species (i.e. Homo erectus) several hundred thousand years ago, adding critical insights to a longstanding debate about the timing and origin of human language (47). Surprisingly, the transition to AMHs further accelerated evolution of reward-related decision making and strategic thought as those features most prominently separating us from archaic hominins. It is tempting to speculate that these abilities may have contributed to the selective advantage for evolutionary success of anatomically modern humans.
Funding
FG and KB were supported by VRVis, funded by BMVIT, BMDW, Styria, SFG and Vienna Business Agency in the scope of the FFG COMET program (854174), the Research Institute of Molecular Pathology (IMP), Boehringer Ingelheim, and the Austrian Research Promotion Agency (FFG). W. H. was supported by the Research Institute of Molecular Pathology (IMP), Boehringer Ingelheim, the Austrian Research Promotion Agency (FFG), and a grant from the European Community’s Seventh Framework Programme (FP/2007-2013) / ERC grant agreement no. 311701.
Author contributions
JK designed and interpreted functional genetic analysis and computational neuroanatomy and wrote the manuscript. FG designed and performed computational neuroanatomy and wrote the manuscript. BG and YM designed and performed evolutionary genetic analyses. AH designed computational neuroanatomy. KB designed and supervised computational neuroanatomy. WH initiated the study, designed and interpreted the computational neuroanatomy and wrote the manuscript. All authors contributed to writing and commented on the manuscript.;
Competing interests
The authors declare no competing interests.; and
Data and materials availability
All data were analyzed with either published or custom code. Sources of published code are identified in the Method section.
Methods
Genetic analysis
Mammalian lineage setup
For our evolutionary analysis of the primate lineage, we selected nine species including the mouse (Mus musculus), bush baby (Otolemur garnetti), marmoset (Callithrix jacchus), macaque (Macaca mulatta), gibbon (Nomascus leucogenys), chimpanzee (Pan troglodytes), extinct hominins Denisovan (Homo Denisovan) and Neanderthal (Homo neanderthalensis) and anatomically modern human (AMH, Homo sapiens) (Methods Table 1).
To determine the general evolutionary relationship among the selected species, we reconstructed a phylogenetic tree from mitochondrial genomes using a Bayesian approach in BEAST 2.5 (48). First, the best nucleotide substitution model was determined using JMODELTEST (49). We then implemented this model in BEAST. To account for variable rates of evolution among different primate lineages, we used a relaxed lognormal prior on the clock rate. We used three independent normally-distributed and soft-bounded calibration priors (after Perez et al. 2013) (50) to place a timeframe onto our phylogeny. We used a AMH-chimpanzee mean divergence of 7.8 Myr (SD 1.2 Myr), and Old World monkey-ape divergence of 28 Myr (SD 3 Myr) and lastly we placed a 60 Myr mean (SD 2.8 Myr) time for the coalescence of all primate lineages. This fully parameterized model was run five times, each time for 200 million simulations, logging parameters every 20,000 steps, and discarding the first 20% as burn-in. MCMC convergence was assessed by viewing MCMC traces directly and by ESS values in TRACER 1.6(51). A maximum clade credibility tree was calculated and annotated in Figtree 1.4 (52).
The results of the mt-derived phylogeny, LCA and PSC order is given in Fig. 1A. Note that, formally, Denisovans and Neanderthals split after the split from the human ancestor. However, mtDNA derived phylogeny (Fig. 1A) places Neanderthal closer to AMH (53) and support a Chimp-to-Denisovan-Neanderthal-AMH species order.
Data collection
Pre-calculated non-synonymous (dN) and synonymous (dS) nucleotide substitutions for all homologous genes for the selected species were downloaded from Ensembl (version=78) (89) using the Bioconductor BiomaRt package (version 2.18.0) (90) except for Neanderthal and Denisovan genes. The pairwise values, based on the assembled lineage, were also obtained from Ensembl. For instance, bush baby vs. marmoset and marmoset vs. macaque. Other available entries like Ensembl gene IDs, Entrez IDs and HGNC symbols were downloaded for all genes for all species.
To obtain the dN/dS values for ancient hominins vs. chimpanzee and AMH, all protein coding genes for these species were downloaded from Ensembl (18). High coverage Denisovan genome alignment against the AMH (hg19/GRCh37) reference genome was downloaded from the http://cdna.eva.mpg.de/denisova/alignments/ repository. The data necessary for Neanderthal-AMH dN/dS calculation was gathered form the high coverage variant list from the Vindija Neanderthal genome (91).
Ancient hominin gene generation
Two different approaches were used to generate the AMH homolog, Denisovan and Neanderthal genes without any missing values.
The approach applied to the Denisovan dataset was based on mapping reads in BAM format to the AMH genome, and the data was segregated by chromosomes. Consensus sequences for Denisovan chromosomes were created using SAMtools mpileup (version 1.3.1) (92) and BCFtools (version 1.3) (93) utilities. Genes were extracted based on the hg19/GRCh37 Ensembl annotation (ftp://ftp.ensembl.org/pub/, version=78) using an in-house R script. The hominin genes were renamed using the AMH homolog Ensembl gene IDs for easier identification during the later processes.
To retrieve gene homologs from Vindija dataset, high coverage Neanderthal variants were downloaded in VCF format, and only SNPs predicted in protein coding genes were processed. Then, the Neanderthal genes were generated by exchanging AMH nucleotides to the high coverage SNPs in the corresponding AMH reference genes using the SNPs in the coding regions and its coordinates.
dN/dS calculation
For Ensembl downloaded dN and dS data, dN/dS values were calculated using a simple division. For ancient hominin vs. chimpanzee and AMH values, we downloaded and pairwise aligned codons from previously generated gene sequences using PRANK (v.140603) (94). Next, additional ratios were calculated using codeml tool from PAML package (version 4.9) (95) applying basic model.
Table organization
The combined dN/dS table was built from all dN/dS values and the corresponding Ensembl genes and Entrez IDs. The results from the mouse to AMH lineage including hominins were organized in the before mentioned order (Fig. 1A, Methods Table 1). The first mouse vs. bush baby step was used as the basis of the table. The values for the second step (bush baby vs. marmoset) were collected based on the gene IDs from the previous step. In this case, the bush baby gene IDs were used to organize the following column containing the results for marmoset genes. The same approach was applied in the following steps until the Neanderthal vs. AMH values.
Computational neuroanatomy
Table preparation
To compute the evolutionary signatures in the mammalian brain, we used the dN/dS lineage table from above (Methods, Genetic data, Table organization) with 37173 genes/rows along 8 PSCs/columns (Pairwise Species Comparisons). We next sought to generate a brain gene set compatible across the Allen Brain Atlas platform (mouse and AMH) to be most versatile for this and future applications. To this end, we restricted genes to those present in both Allen Mouse Brain Atlas (AMBA) and Allen Human Brain Atlas (AHBA). We merged this table with spatial gene expression data from the AHBA via gene Entrez ID. We avoided merging duplicate entries with AMBA/AHBA by conflating the table rows for unique mouse and AMH Entrez ID combinations (=mouse/AMH homologues).
For homologue genes dN/dS were averaged. We omitted infinite dN/dS ratios (dS=0) for this purpose (96). We further removed rows with all dN/dS ratios=0, since static absent selection pressure was not relevant for our analysis. After filtering the table for genes with spatial gene expression available in AMBA/AHBA, we ended up with 10445 rows.
Wolf et al. (97) states that mean dN/dS between two species decreases with evolutionary distance. The 8 PSCs are not equidistant in evolutionary time, so we normalized them individually (=column-wise normalization). We used a rank-normalization (=rank / amount of rows), since their dN/dS ratios are not equally distributed. This brings the dN/dS of each column to a uniform distribution between 0 and 1. The rank-normalized dN/dS can then be interpreted as their percentile (e.g. value of 0.9 means the dN/dS represents the 90th percentile) and a rank-normalized dN/dS of 0 is still a dN/dS of 0. The resulting table of dN/dS ranks is visualized in Fig. 1B.
Generating evolutionary maps
We visualized the evolutionary landscape throughout phylogenetic history of the brain by creating brain-region level evolutionary maps that color-code each region by its evolutionary time point. Time points were encoded by associating each brain region with its most specific PSC (pairwise species comparison), i.e. the LCA (last common ancestor) with the strongest structural association of genes under high selection pressure.
Therefore, we thought to predict the association of highly selective genes with functional neuroanatomical maps according to Ganglberger et al. (17) for each of the eight PSCs. To our better knowledge, there is no general threshold for highly selected genes based on (ranked) dN/dS ratio, so it is not possible to choose gene sets to predict these maps. Hence, we modeled selection pressure by weighting spatial gene expression from AHBA/AMBA by their dN/dS rank for each PSC. This creates 8 sets of 10445 genes for which genes with low selection pressure have low expression values, and genes with high selection pressure are similar to their original expression values (because dN/dS ranks are between 0 and 1). The resulting eight functional maps for mouse and AMH were all significantly different (FDR=0.1) from functional maps with randomly associated dN/dS ranks (i.e. we shuffled the dN/dS ranks before weighting). We computed the prediction of functional maps purely on expression site level (biopsy site), without applying higher order network measures. Since the 10445 genes do not represent single multigenic traits as used in Ganglberger et al. (17), the gene expression synergy was not region specific, which did not lead to region specific changes in gene expression synergy weighted networks. This indicated that the genes with the highest selection pressure for each PCS modulate multiple brain functions/networks.
The AMH data, assembled from gene expression of 3702 biopsy-sites from microarray data of the Allen Human Brain Atlas (22), was visualized on a region level (most significant p-value of all biopsy sites within a brain region) in Fig. 3.
Finally, we wanted to combine the maps for each PSC to a single evolutionary map. To generate these, the highest specificity for each biopsy site was represented by the PSC with the most significant p-value (Fig. 2). Please note that the individual PSC maps showed already accumulated biopsy site values at (sub)region level, thus it was not possible to derive the most specific PSC directly from Fig. 3. Theses maps showed the most significant p-value of the biopsy-sites within a brain region (and therefore only the most significant biopsy-site), while the evolutionary map was computed first on biopsy-level and then visualized the most frequent PSC of all biopsy sites.
Task-evoked functional brain activity
Task-specific brain activity maps were downloaded from Human Connectome Project (HCP) website (www.humanconnectome.org). We used data available for seven major domains, detailed description can be found in Barch et al. (98). Contrasts selected for comparison with dN/dS functional maps were collected in Methods Table 3. The contrasts labels and behavioral signatures descriptions correspond to Tavor et al. (99).
Functional network meta-comparison
Literature-based comparison of regions involved in several functional networks extracted from fMRI scans was correlated to functional maps (Methods, Computational neuroanatomy, Generating evolutionary maps). Networks used in the study are collected in Methods Table 4.
Subspace pattern mining for network evolution via biclustering
To identify genes linked to specific tasks or functional networks, we mined co-evolving genes with high spatial correlations to those networks. Therefore, we retrieved AMH spatial gene expression data from the AHBA for each of the 10445 genes in the dN/dS ranked table. Spatial gene expression data was available for ∼3700 biopsy sites in the brain. We mapped both, the task fMRI data of 11 networks (Methods, Computational neuroanatomy, Correlation against task-evoked functional brain activity) and the literature-based data on another 11 networks (Methods, Computational neuroanatomy, Functional network meta-comparison) to the biopsy sites and computed the Spearman rank correlation coefficient between every gene and network.
To make both types of functional data comparable, we normalized each block (104445×11 fMRI-to-gene-expression correlations and 104445×11 literature-to-gene-expression correlations) with z-score standardization. Functional network specificity for each gene was then computed by rank-normalization (rank / amount of networks) for each gene over all networks, so they were mapped to a range between 0 and 1. As a result, 0 was assigned to the network with the lowest correlation. The network with the highest correlation to a gene gets the value of 1. We concatenated this data with the dN/dS ranked table to receive a 10445 × 30 spatio-temporal network table (8 PSCs with dN/dS ranks, 11 fMRI-to-gene-expression and 11 literature-to-gene-expression correlation ranks). From this table, we removed all genes with an overall low correlation to all networks (genes for which all network correlations were smaller than 0.1), which reduced the table to 6239 rows.
We performed data mining by using GABi (34), a framework that facilitates a genetic algorithm for biclustering (simultaneous clustering of the rows and columns of a matrix), available as R package. Compared to other biclustering algorithms, the advantage of this framework allows the definition of a customized criteria. This enables the user to specify the properties a bicluster should have, such as coherence, consistency, size etc. We used these custom criteria, in terms of genetic algorithms also called “fitness function” to find biclusters of highly selected genes (genes that have high dN/dS ranks over multiple PSCs) with high specificity for similar functional networks (genes with high network correlation ranks for the same networks). Therefore, GABi creates a set of candidate solutions (a candidate solution is a set of genes) and applies the fitness function to see which PSCs and networks fit the custom criteria. It iteratively optimizes the candidate solutions to find the largest bicluster fitting these criteria by means of evolution inspired operators such as mutation, crossover and selection (34).
We defined the custom criteria for fitness function to find the largest bicluster:
-with at least one PSC and one network, since biclusters without one of them do not represent genes with high dN/dS ranks and high specificity for similar functional networks
- with PSCs having a mean dN/dS rank >=0.9 for the genes in the bicluster. This selects only genes that have PSCs with an average dN/dS rank above equal 0.9, which puts them in the top 10 percentile of dN/dS ranks. Therefore, they can be considered as the top 10 percent genes regarding their selection pressure, therefore highly selective.
- with networks having a mean network correlation specificity rank >=0.75 for the genes in the bicluster. Therefore, only genes with an average network specificity in the top 25% genes (the networks with the 1-8th highest correlation to each gene) were in the bicluster.
- where the gene’s dN/dS and network correlation ranks are above 0. The genetic algorithm optimizes the fitness function of the candidate solutions and therefore finds the largest bicluster. If a bicluster has a mean network correlation rank >0.75 for a specific network, the algorithm will add as many genes with a network correlation rank <0.75 until its size is maximized. We limited this by omitting genes that have zero network correlation.
If a bicluster has this property, then we define its fitness as where r is the number of rows, cdN/dS the amount of dN/dS rank columns and cnetwork the amount of gene-to-network correlation ranks columns. wdN/dS and wnetwork are weighting factors to account for the different amount of dN/dS rank columns (8) and network correlation ranks columns (22). Otherwise it would be more likely to find biclusters with network correlation ranks columns than dN/dS rank columns. wdN/dS = 1.875 and wnetwork =0.6818, to sum up to 30 again (similar to not weighted).
Stability tests were performed on random subsets of the data to empirically estimate parameters for the genetic algorithm utilized by GABi. We found that a population size of at least 10-20 times the ‘chromosome’ length (amount of rows of the data set (34)) with ∼2000 demes (separate subpopulations (34)) led to stable results over multiple runs and therefore reproducible results. This amount of demes was necessary to minimize the chances of locally-optima solutions for GABi(34), since it must converge to solutions that are relatively small compared to the search space (identified biclusters had about 1-2% of the total amount of genes). Each deme needed at least 32 individual solutions (34) (=chromosomes), therefore the 2000 demes constrained the population size to be at least ∼10 times the chromosome length (2000*32/6239 = 10.26).
We applied GABi with a population size of 124780 (20x the chromosome length of 6239) chromosomes divided into 2000 demes. We ran the biclustering with these parameters several times with similar results. The algorithm found for each bicluster up to 100 variations (i.e. biclusters with the same fitness/size, but differ at several genes), so we grouped biclusters with at least 75% overlapping genes to form one representative. A representative bicluster did not fulfil the criteria of the fitness function (otherwise it would have been found by GABi directly) but summarized all genes of all of its variations. Statistical evaluation was performed by permutation tests to verify that the representative biclusters had significantly higher mean dN/dS ranks, and gene-to-network correlation ranks than random sets of similar size. P-values were highly significant (<0.0001), which was expected since the fitness function was specifically designed to find highly consistent biclusters.
Biclusters, respectively bicluster modules, shown in Fig. 4, 5 and Fig. S1 were visualized with newly developed tool (100) (VRVis, Vienna). Nodes were selected from the bicluster’s networks (Methods Table 3 & 4, Fig. S1), the edges represented the strongest spatial gene expression correlation of the bicluster’s genes. The networks were created similarly to Methods, Computational neuroanatomy and Tracing the evolution of spatial gene expression correlation networks. Originally the edges showed a region-bias, i.e. regions that had higher correlation between them than others over all network, resulting from the amount of genes in a bicluster (the correlation of gene sets converges to the genome-wide spatial gene expression correlation with increasing size). We targeted this by generating an empirical distribution for each individual edge by 1000 random drawn gene sets from the genome (of same size as the bicluster). These distributions (i.e. their mean and standard distribution) were then used for z-score normalization of the bicluster edges.
Functional genetics
For functional profiling of genes clustered with brain networks we applied the knowledgebase from Ingenuity Pathway Analysis (IPA) (QIAGEN Inc., https://www.qiagenbioinformatics.com/products/ingenuitypathway-analysis). Each cluster from biclustering of full lineage or hominid branch was analyzed separately. We applied Nervous System filter to avoid non-specific functional associations. All results can be found in the Supplementary Materials.
Methodological remarks
Taken together this workflow maps dN/dS data onto spatial brain gene expression and correlates it with task-evoked fMRI and known functional networks (see Main text and Methods, Methods Tables 3 and 4). Both, cumulative correlation and biclustering, build on dN/dS-driven spatial correlations of highly selected genes per PSC with fMRI networks. Cumulative correlation focuses on functional correlations for each PSC top-selected genes, while biclustering relates ranked dN/dS of all PSCs and fMRI networks simultaneously, building clusters of highly selected genes (bound to any PSC) which highly correlate with fMRI networks. Jointly, these approaches are in support of each other and reconstruct a congruent ancestral history of the brain’s functional evolution.
Of note, overall, results obtained with PSCs along the AMH lineage, shown here, are similar to predictions from PSCs rooted in AMH (AMH vs all species).
Statistics
Statics are described in the corresponding paragraphs of the Methods section.
Acknowledgments
We thank Shiva Alemzadeh for bicluster visualization. We thank Attila Gyenesei for supervising the computation of evolutionary genetic data. We thank Life Science Editors for editing assistance.;
References
- 1.↵
- 2.
- 3.
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.
- 10.
- 11.
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.