Abstract
Throughout the past decade, studying ancient genomes provided unique insights into human prehistory, and differences between modern humans and other branches like Neanderthals can enrich our understanding of the molecular basis of the human condition. Modern human variation and the interactions between different hominin lineages are now well studied, making it reasonable to explore changes that are observed at high frequency in present-day humans, but do not reach fixation. Here, we put forward interpretation of putative single nucleotide changes in recent modern human history, focusing on 571 genes with non-synonymous changes at high frequency. We suggest that molecular mechanisms in cell division and networks affecting cellular features of neurons were prominently modified by these changes. Complex phenotypes in brain growth trajectory and cognitive traits are likely influenced by these networks and other changes presented here. We propose that at least some of these changes contributed to uniquely human traits.
Homo sapiens appears to be a “very special primate”1. Our position among animal species stands out largely thanks to the composite complexity of our cultures, social structures and communication systems. It seems very reasonable that this “human condition” is rooted, at least in part, in the properties of our brain, and that these can be traced to changes in the genome on the modern human lineage. This phenotype in the population called “anatomically modern humans” emerged in Africa likely before the deepest divergence less than 100,000-200,000 years ago2,3, although complex population structure may reach back up to 300,000 years ago4–6. Except of some early dispersals7, humans most likely peopled other parts of the world than Africa and the Middle East permanently only after around 65,000 years ago. It has been claimed that the brain of modern humans adopted a specific, apomorphic growth trajectory early in life that gave rise to the skull shape difference between modern humans and extinct branches of the genus Homo8, although the timing of this change is debated9. This ontogenic trajectory, termed the “globularization phase”, might have contributed to our singular cognitive abilities8,10,11.
We are now in a favorable position to examine the evolution of human biology with the help of the fossil record, in particular thanks to breakthroughs in paleogenomics: The recent reconstruction of the genomes of members of archaic Homo populations12–14 has opened the door to new comparative genomic approaches and molecular analyses. The split of the lineages leading to modern humans and other archaic forms (Neanderthals and Denisovans) is estimated to around 600,000 years ago2, setting the timeframe for truly modern human-specific changes after this split, but before the divergence of modern human populations (Fig. 1). Together with efforts to explore present-day human diversity15, this progress has allowed to narrow down the number of candidate point mutations from ~35 million differences since the split from chimpanzee when comparing only reference genomes16 to 31,389 fixed human-specific changes in a previous seminal study1.
Some of these changes have been linked to putative functional consequences1,13,17, and evidence is mounting that several molecular changes affecting gene expression in the brain were subject to selective pressures18–22. Furthermore, the genomic impact of interbreeding events is not evenly distributed across the genome. Genes expressed in regions of the brain regarded as critical for certain cognitive functions are depleted in introgressed archaic genetic material23–26, and introgressed alleles are downregulated in some brain regions, suggesting natural selection acting on tissue-specific gene regulation27. Thus, it seems reasonable to conclude that there were differences between anatomically modern human and Neanderthal brains, and that these underlie at least some of the characteristics of our lineage28. We want to emphasize that such recent differences are likely to be subtle when compared to those after the split from our closest living relatives on a scale of 6-10 million years 29, where fundamental changes arose since the divergence from chimpanzees and bonobos30. The observation of recurrent gene flow between modern human and archaic populations also implies a broad overall similarity, yet, such subtle differences may still have contributed to the evolutionary outcome31. Obviously, not all human-specific changes are beneficial: While most mutations may be rather neutral and have little effect on the phenotype, some may have had deleterious effects or side-effects, possibly increasing the risks for neurodevelopmental or neurodegenerative disorders in humans32–34.
The goal of this paper is to provide a set of recent single nucleotide changes in humans since their split from Neanderthals that could enrich our understanding of the molecular basis of the recent human condition. The previous focus on fixed alleles was reasonable given limited data1, but having a better grasp of the magnitude of modern human variation and the interaction between different hominin lineages seems a good reason to cast a wider net, and take into account not only fixed differences but also high-frequency changes shared by more than 90% of present-day individuals. Here, we present a revised list of 36 genes that carry missense substitutions which are fixed across 1,000s of human individuals and for which all archaic hominin individuals sequenced so far carry the ancestral state. In total, 647 protein-altering changes in 571 genes reached a frequency of at least 90% in the present-day human population. We attempt to interpret this list, as well as some regulatory changes, since it seems very likely that some of these genes would have contributed to the human condition.
We will discuss some of their known functions, and how these relate to pathways that might have been modified during human evolution (Fig. 1), in a bottom-up fashion. Beginning at the molecular level, changes found in genes associated to the mitotic spindle complex might be relevant, as has been suggested in previous studies1,13. The cellular features of neurons (axons and synapses) have been considered important in light of their role in traits such as vocal learning35–37, and possibly behavioral phenotypes38. Pathways influencing brain organization were modified during hominin evolution, and we suggest that this might have been extended further since the split from Neanderthals and other archaics39. Finally, we discuss implications for other complex phenotypic traits, with a focus on cognition and life history trajectory. We restrict our attention to genes where the literature may allow firm conclusions and predictions about functional effects, since many genes might have multiple different functions40. Obviously, experimental validation will be ultimately needed to confirm our hypotheses concerning alterations in specific functions.
Results
Genetic differences between present-day humans and archaic hominins
Using publicly available data on one Denivosan and two Neanderthal individuals and present-day human variation (Methods), we calculated the numbers of single nucleotide changes (SNCs) which most likely arose recently on the respective lineages after their split from each other, and functional consequences as predicted by VEP (Table 1). Previously, a number of 31,389 sites has been reported as recently fixed derived in present-day humans, while being ancestral in archaics1,13.
We find a smaller number of only 12,027 positions in the genome, in part caused by including another archaic individual and different filters, but mainly by a richer picture of present-day human variation. The 1,000 Genomes Project as well as other sources contributing to the dbSNP database now provide data for thousands of individuals, which results in very high allele frequencies for many loci instead of fixation. Indeed, 29,358 positions show allele frequencies larger than 0.995, demonstrating that the level of near-fixation is similar to the level of previously presented fixation. The number of loci with high frequency (HF) changes of more than 90% in present-day humans is an order of magnitude larger than the number of fixed differences. The three archaic individuals carry more than twice as many changes than present-day humans; however, we emphasize that much of this difference is not due to more mutations in archaics, but rather the fact that data for only three individuals is available, compared to thousands of humans. The variation across the archaic population is not represented equally well, which makes these numbers not directly comparable.
Present-day humans carry 42 fixed amino acid-changes in 36 genes (Table 2, Fig. 2), while Neanderthals carry 159 such changes. Additionally, modern humans carry 605 amino acid-changes at high frequency (human-lineage high-frequency missense changes, referred to as HHMCs), amounting to a total of 647 such changes in 571 genes (Table S1). Together with 323 SNCs on the human lineage with low confidence (Methods, Table S2), almost 1,000 putative protein-altering changes were found across most present-day humans. Generally, synonymous changes are found at a similar magnitude as missense changes, but only few SNCs altering start and stop codons, and thousands of changes in putative regulatory and untranslated regions. We admit that some of the loci presented here are variable across the phylogenetic tree, or less reliable due to low coverage in the archaics, but we accept this since our intention is retrieve an inclusive picture of possibly functional recent changes. The 42 protein-altering changes for which the ancestral allele has not been observed in any present-day human, most of which have been presented before1, constitute without doubt the strongest candidates for understanding the human condition. Only one gene, SPAG5, carries three such SNCs, and four genes (ADAM18, CASC5, SSH2 and ZNHIT2) carry two fixed protein-coding changes in all modern humans. We identified 15 SNCs (in AHR, BOD1L1, C1orf159, C3, DNHD1, DNMT3L, FRMD8, OTUD5, PROM2, SHROOM4, SIX5, SSH2, TBC1D3, ZNF106, ZNHIT2) that have not been previously described as fixed differences between humans and archaics. We note that another 12 previously described1 protein-altering substitutions were not found among the genotypes analyzed here (in C21orf62, DHX29, FAM149B1, FRRS1L, GPT, GSR, HERC5, IFI44L, KLF14, PLAC1L, PTCD2, SCAF11). These genotype calls are absent from the files provided for the three archaic genomes due to different genotype calling and filtering procedures compared to the original publication of the Altai Neanderthal genome13,14. Hence, some potentially relevant candidate changes were not included here, and future research is necessary to evaluate these as well. Despite attempting an extended interpretation, our data is thus still not fully exhaustive.
It is noteworthy that the number of fixed SNCs decreased substantially, and it is possible that single individuals will be found to carry some of the ancestral alleles. Hence, it is important to focus not only on fixed differences, but also consider variants at high frequency. When analyzing the 647 HHMCs, 68 genes carry more than one amino acid-altering change. Among these, TSGA10IP (Testis Specific 10 Interacting Protein) and ABCC12 (ATP Binding Cassette Subfamily C Member 12) carry four such changes, and seven more genes (MUC5B, NPAP1, OR10AG1, OR5M9, PIGZ, SLX4, VCAN) carry three HHMCs. 1,542 genes carry at least one HF missense change on the archaic lineage (archaic-lineage high-frequency missense change, referred to as AHMC, Tables S3, S4). We find an overlap of 122 genes with HHMCs and AHMCs, which is more than expected considering that among 1,000 sets of random genes of a similar length distribution, no overlap of this extent was observed. The same genes seem to have acquired missense changes on both lineages since their divergence more often than expected. We find a high ratio of HHMCs over synonymous changes for chromosome 21 (1.75-fold), and a very small ratio (0.18-fold) for chromosome 13. We do not find such extreme ratios for AHMCs and corresponding synonymous changes, suggesting differences in the distribution of amino acid changes between both lineages (Fig. S1).
Ranking and enrichment
We assessed the impact of mutations for different deleteriousness scores (Table 2), finding 12 genes with deleterious HHMCs according to SIFT, three according to PolyPhen, and 16 when using the Grantham score (>180), measuring the physical properties of amino acid changes. The C-score and GWAVA can be used to rank all mutation classes, and we present the top candidates.
Then, we attempted a ranking of genes by the density of lineage-specific changes in the dataset. As expected, the total number of segregating sites is correlated with gene length (Pearsons’ R = 0.93). This correlation is weaker for HF human SNCs (R = 0.73) and fixed human-specific SNCs (R = 0.25), as well as for fixed (R = 0.37) and HF (R = 0.82) SNCs in archaics. We conclude that some genes with a large number of human-specific changes might carry these large numbers by chance, while others are depleted. Indeed, 17,453 (88.9%) of these genes do not carry any fixed human-specific change, and 80.5% do not carry fixed archaic-specific changes. Of note, genes that have attracted attention in the context of traits related to the “human condition” like CNTNAP2 and AUTS2 are among the longest genes in the genome, hence changes in these genes should be interpreted with caution as they are not unexpected. We ranked the genes by the number of HF changes in either modern humans or archaics, divided by their genomic lengths, and categorize the top 5% of this distribution as putatively enriched for changes on each lineage (Table S5). We note that 191 genes (30.9%) fall within this category for both human HF changes and archaic HF changes, as a result of differences in mutation density. In order to distinguish a truly lineage-specific enrichment, we calculated the ratios of HF changes for humans and archaics, defining the top 10% of genes in this distribution as putatively enriched (Table S5). Among the genes enriched for changes on the modern human lineage, 18 carry no HF changes on the archaic lineage, and ten of these also fall within the 5% of genes carrying many changes considering their length (ARSJ, CLUAP1, COL20A1, EPPIN, KLHL31, MKNK1, PALMD, RIC3, TDRD7, UBE2H). These might be candidates for an accumulation of changes, even though this is not identical to selective sweep signals. Among these, the collagen COL20A1 and the Epididymal Peptidase Inhibitor EPPIN carry HHMCs. ACAD10, DST and TTC40, which carry two HHMCs, might be other notable genes with a human-specific enrichment.
No Gene Ontology (GO) categories are enriched for HHMCs on the human lineage when using HF synonymous changes as background in a hypergeometric test. This is also true for genes carrying AHMCs, or HF changes in UTRs or transcription factor binding sites on either lineage. We applied a test for the ratio of the number of gene-wise HF changes on one lineage over the other lineage. For changes on the modern human lineage, this yields an enrichment for 12 GO categories (Table S6), with “soft palate development”, “negative regulation of adenylate cyclase activity”, “collagen catabolic process” and “cell adhesion” in the biological process category. Among the cellular components category, the “postsynaptic membrane”, “spermatoproteasome complex”, “collagen trimer”, “dendrite” and “cell junction” show enrichment, as well as the molecular functions “calcium ion binding”, “histone methyltransferase activity (H3-K27 specific)” and “metallopeptidase activity”. We find no GO enrichment for genes with an excess of changes on the archaic lineage. In order to approach a deeper exploration of genes with associated complex traits in humans, we explored the NHGRI-EBI GWAS Catalog41, containing 2,385 traits. We performed a systematic enrichment screen, finding 17 unique traits enriched for genes with HHMCs, and 11 for genes with AHMCs (Table S7). Changes in genes associated to “Cognitive decline (age-related)”, “Rheumatoid arthritis” or “Major depressive disorder” might point to pathways that could have been influenced by protein-coding changes on the human lineage. In archaics, genes are enriched, among others, for associations to traits related to body mass index or cholesterol levels, which might reflect differences in their physiology. We also find an enrichment of genes associated to behavioral disorders on the archaic lineage.
We find a significant enrichment of protein-protein interactions (P = 0.006) among the gene products of HHMC genes (Fig. S2), meaning that these proteins interact with each other more than expected. Functional enrichment is found for the biological process “cellular component assembly involved in morphogenesis”, most strongly for the cellular components cytoskeleton and microtubule, as well as the molecular function “cytoskeletal protein binding”. Three proteins have at least 20 interactions in this network and might be considered important nodes: TOP2A, PRDM10 and AVPR2 (Table S8). However, proteins encoded by genes with synonymous changes on the modern human lineage seem to be enriched for interactions as well (P = 0.003), as are proteins encoded by genes with AHMCs (P = 1.68 × 10−14), with an enrichment in GO categories related to the extracellular matrix and the cytoskeleton, and the most interacting proteins with more than 40 interactions being GART, LRGUK, ARRB1, SPTAN1 and ATM (Table S8). We caution that these networks might be biased due to more mutations and possibly more interactions in longer, multi-domain genes.
Regulatory changes might have been important during our evolution42, hence we tested for an overrepresentation of transcription factors (TFs). We find 78 known or putative TFs among the HHMC genes (Table S9) on the modern human lineage43, which is not overrepresented among genes with HHMCs (with 49.2% of random genes sets containing fewer HHMCs). Despite lack of enrichment, single TFs on the modern human lineage might have been important, particularly those with an excess of modern human over archaic HF changes (AHR, MACC1, PRDM2, TCF3, ZNF420, ZNF516). Others TFs, like RB1CC113 or PRDM10 and NCOA619 have been found in selective sweep screens, suggesting contributions of individual TFs, rather than TFs as a class. We also tested for an enrichment of gene expression in different brain regions and developmental stages44,45, using the HF synonymous changes on each lineage as background sets. We find an enrichment of gene expression in the orbital frontal cortex at infant age (0-2 years) for genes with HHMCs, but no enrichment for genes with AHMCs. Furthermore, when testing the genes with HHMCs and using the set of genes with AHMCs as background, “gray matter of forebrain” at adolescent age (12-19 years) is enriched, while no enrichment was found for genes with AHMCs.
Discussion
The kinetochore and spindle complex
It has been proposed previously that protein-coding changes in cell cycle-related genes are highly relevant candidates for human-specific traits1,13. Indeed, three genes (CASC5, SPAG5, and KIF18A) have been singled out as involved in spindle pole assembly during mitosis1. Other genes with protein-coding SNCs (NEK6 and STARD9/KIF16A) turn out to be implicated in the regulation of spindle pole assembly as well46,47. Furthermore, it has been claimed13 that genes with fixed non-synonymous changes in humans are also more often expressed in the ventricular zone of the developing neocortex, compared to fixed synonymous changes. Since the kinetochore-associated genes CASC5, KIF18A and SPAG5 are among these genes, it has been emphasized that this “may be relevant phenotypically as the orientation of the mitotic cleavage plane in neural precursor cells during cortex development is thought to influence the fate of the daughter cells and the number of neurons generated48”13. Several fixed SNCs on the modern human lineage are observed for CASC5 (two changes) and SPAG5 (three changes), which is also among genes with a relatively high proportion of HF changes (Table S5). The changes in KIF18A, KIF16A and NEK6 can no longer be considered as fixed, but occur at very high frequencies (>99.9%) in present-day humans.
We attempted to determine whether an enrichment of genes with HHMCs on the human lineage can be observed in the ventricular zone for the same data45, but instead find an enrichment in the intermediate zone, where less than 5% of random gene sets of the same size are expressed. However, synonymous HF changes also show an enrichment in this layer, as well as genes with AHMCs (Table S10), suggesting an overrepresentation of genes that carry mutations in the coding regions rather than lineage-specific effects. However, we were able to broadly recapitulate the observation of an enrichment of expression in the ventricular zone if restricting the test to genes with non-synonymous changes at a frequency greater than 99.9% in present-day humans, which is not observed for corresponding synonymous and archaic non-synonymous changes (Table S10). Expression of genes with AHMCs is enriched in the intermediate zone. Among the 28 genes expressed in the ventricular zone that carry almost fixed HHMCs, four might be enriched for HF changes in humans (HERC5, LMNB2, SPAG5, VCAM1), and one shows an excess of HF changes on the human compared to the archaic lineage (AMKMY1). Other notable genes discussed in this study include ADSL, FAM178A, KIF26B, SLC38A10, and SPAG17. We find 126 genes (Table S9) with 143 HHMCs that putatively interact with proteins at the centrosome-cilium interface49, which is more than expected using 1,000 random gene sets of a similar length distribution, for which 98.9% contain fewer genes with HHMCs. However, 99.9% of random sets also contain fewer genes with AHMCs, suggesting that differences between humans and archaics might lie in the particular genes rather than their numbers. The centrosome-cilium interface is known to be critical for early brain development, and centrosome-related proteins are overrepresented in studies on the microcephaly phenotype in humans50, which we will discuss below. Some of the genes listed here and discussed elsewhere in this study, such as FMR1, KIF15, LMNB2, NCOA6, RB1CC1, SPAG5 and TEX2, harbor not only HHMCs, but an overall high proportion of HF changes on the human lineage.
Among the 15 fixed protein-coding changes identified here but absent from previous analyses1,13, some might also contribute to complex modifications of pathways in cell division: The AHR protein is involved in cell cycle regulation51 and shows an excess of HF changes on the human lineage, the dynein DNHD1 might be recruited to the kinetochore52 and is overexpressed in fetal brain53, and the SSH2 protein (two fixed changes, one of which is first described here; and one on the archaic lineage) might interact with spindle assembly checkpoint proteins54. SHROOM4, which is associated to a mental retardation syndrome with delayed speech and aggressive behavior55, may also be relevant56. Other proteins that carry two HHMCs are involved in mitosis, for example the spindle checkpoint regulator CHEK157, the Dynein Axonemal Heavy Chain 1 (encoded by DNAH1), the mitotic regulator AZI1 (CEP131)58, the Cyclin D2 (CCND2) and the Protein Tyrosine Phosphatase Receptor Type C (PTPRC)59. Other genes with HHMCs that could be part of the same functional network are FOXM160 and FMR161, which carry a putative enrichment of HF changes, and TOP2A62. The TOP2A protein shows the largest number of interactions (53) with other HHMC-carrying proteins, while CHEK1, KIF18A, KIF15 and PTPRC are among highly-interacting proteins with more than ten interactions, which suggests that these proteins might function as interaction hubs in modifications of the cell division complex. Furthermore, enrichment in cell-cycle related GO categories has been found for candidate regions for ancient positive selection20, and ANAPC10 has been highlighted, containing two potentially disruptive intronic changes that are fixed derived in modern humans and ancestral in both Neanderthals and Denisovans. This gene carries a total of 39 HF changes (11 of them fixed) specific to modern humans, but none for archaics.
All of this suggests that the cell cycle machinery might have been modified in a specific way in humans compared to other hominins. One particular example of specific consequences for a relevant SNC on the human lineage is SPAG5: One of the three fixed non-synonymous changes in the SPAG5 protein is a Proline-to-Serine substitution at position 43. This position is phosphorylated in humans63 during the mitotic phase of the cell cycle, directly through the protein phosphatase 6 (PPP6C) at the Serine at this position64, with the effect of a modification of the duration of the metaphase. PPP6C regulates the mitotic spindle formation65, and the PPP6C gene itself carries five HF SNCs on the modern human lineage, one of which is a TF binding site (for HNF4A/HNF4G), and only one SNC on the archaic lineage. This specific substitution in SPAG5 seems likely to influence the duration of the metaphase through phosphorylation, as a molecular consequence of this HHMC.
On the archaic lineage, we find an AHMC in the ASPM gene, along with 24 other HF changes, but none in modern humans, resulting in an excess of archaic SNCs. The proteins ASPM and CIT, which carries an AHMC that is listed among the most disruptive non-synonymous derived SNCs in archaics17 (Table S31), are known to co-localize to the midbody ring during cytokinesis and regulate spindle orientation by affecting the dynamics of astral microtubules66. These proteins regulate astral microtubules and thus the orientation of cell division in archaics, whereas in modern humans we find proteins regulating kinetochore microtubules, thus the timing of cell division. This difference could indicate two alternative ways of modulating cell division on the different lineages.
Cellular features of neurons: Axons, the myelin sheath and synapses
Moving on from differences on the molecular level to cellular features of neurons, it is clear that wiring is key to cognitive ability, and in this context, axon guidance is relevant: To form critical networks during the early development of the brain, axonal extensions of the neurons in the cortical region must be sent and guided to eventually reach their synaptic targets. Studies conducted on avian vocal learners37,67 have shown a convergent differential regulation of axon guidance genes of the SLIT-ROBO families in the pallial motor nucleus of the learning species, allowing for the formation of connections virtually absent in the brains of vocal non-learners. In modern humans, genes with axon-guidance-related functions such as FOXP2, SLIT2 and ROBO2 have been found to lie within deserts of archaic introgression25,26,68, suggesting incompatibilities between modern humans and archaics for these regions. SLITRK1 might have been under positive selection19, and carries one HHMC, and SLIT3 carries an excess of HF changes in modern humans. Several genes involved in wiring carry HHMCs, which we want to delineate here.
Some of the aforementioned microtubule-related genes, specifically those associated with axonal transport and known to play a role in post-mitotic neural wiring and plasticity69, are associated with signals of positive selection, such as KIF18A70 or KATNA171,72. Furthermore, an interactor of KIF18A, KIF1573, might have been under positive selection in modern humans19, and contains two HHMCs (but also five AHMCs). Versican (VCAN), which promotes neurite outgrowth74, carries three HHMCs, and SSH2 (two HHMCs) might be involved in neurite outgrowth75. PIEZO1, which carries a non-synonymous change that is almost fixed in modern humans, is another factor in axon guidance76, as well as NOVA177, which is an interactor of ELAVL478, a gene that codes for a neuronal-specific RNA-binding protein and might have been under positive selection in humans19,22. Furthermore, among genes with the most deleterious regulatory SNCs, we find the Netrin receptor UNC5D, which is critical for axon guidance79.
The establishment of new connections requires protection, particularly as some of these connections reach long distance and are associated with enhanced activity following rewiring events, like for vocal motor neurons in songbirds67. The gene MAL, which is implicated in myelin biogenesis and function, shows up in selective sweep regions19,20 and is enriched for HF changes on the human lineage, while its orthologue MAL2 carries a HHMC. A gene with HHMCs that is associated with the organization of the axon initial segment and nodes of Ranvier during early development is NFASC80. The protein encoded by this gene is a L1 family immunoglobulin cell adhesion molecule, and we find that also the L1CAM gene carries an AHMC81. NFASC is also an interactor of DCX82, which might have been under positive selection in humans19 and is enriched for HF SNCs on the human lineage, but carries an AHMC as well. At least two genes associated with the process and timing of myelination, PTEN83, and NCMAP84 are among genes with an excess of HF SNCs in modern humans. Other genes carrying HHMCs in our dataset associated with myelination include SCAP85, RB1CC186, TENM487, CDKL188 and ADSL89, and genes with an excess of changes on the human lineage with similar functions include FBXW790, KIFAP391, and AMPH92. The AMPH protein interacts closely with the huntingtin protein HTT (which also carries a HHMC) and is involved in myelination processes93.
Another interesting class that emerges from the set of genes is related to synaptic vesicle endocytosis, critical to sustain a high rate of synaptic transmission. We find a formal enrichment of genes with an excess of HF changes on the human compared to the archaic lineage with gene products located in the postsynaptic membrane and dendrites. PACSIN194 carries a HHMC, is among genes with an excess of HF changes, and has been highlighted as putatively under positive selection on the human lineage, along with other synaptic plasticity related genes such as SIPA1L119,20,22, SH3GL295 and STX1A96. Among genes harboring HHMCs and related to synaptic vesicle endocytosis, we find LMNB297 and SV2C98. Finally, SYT1, which is critical for synaptic vesicle formation99, carries a deleterious HHMC (Table 2). Synaptic properties have been mentioned before in the context of human specific traits, for instance in postnatal brain development in humans, chimpanzees and macaques100, with a focus on synaptogenesis and synaptic elimination in the prefrontal cortex. A period of high synaptic plasticity in humans has been related to a cluster of genes around a transcription factor encoded by the MEF2A gene. Even though this gene neither carries a protein-altering change nor shows a particular pattern in our analysis, any of the 26 HF SNCs it harbors on the modern human lineage could have had a functional impact not captured here. Apart from that, several of the genes with an excess of HF changes in modern humans do belong to this cluster: CLSTN1, FBXW7, GABBR2, NRXN3, PTPRJ, PTPRN2, SLIT3, and STX1A, three of which (CLSTN1, FBXW7 and STX1A) are associated with signals of positive selection19. In addition, the above-mentioned AMPH interacts via CDKL5101 with HDAC4102. The latter exhibits an excess of HF changes in modern humans, and is known to repress the transcriptional activation of MEF2A103. A putative signature of positive selection upstream of MEF2A104 suggests that this may be part of a broader network which might be supported by our analysis. Finally, ENTHD1/CACNA1I, which contains a HHMC that can no longer be considered as fixed, but occurs at a very high frequency (>99.9%), lies in a selective sweep region19. The protein encoded by this gene is involved in synaptic vesicle endocytosis at nerve terminals105 and is regulated by the MEF2 gene family106.
The brain growth trajectory
The number of neurons in the brain might be influenced by some of the changes in kinetochore-associated genes13, and their organization and neuronal wiring clearly impose structural demands on the organization of the brain. We have presented candidate genes and networks for these features above, but brain organization might involve broader networks. Brain growth factors identified by disease phenotypes in modern humans such as micro- and macrocephaly have been highlighted previously as potentially relevant for physiological differences between humans and archaics18. Although an early analysis suggested several candidate genes associated to microcephaly, not all of these could be confirmed by high-coverage data. Among eleven candidate genes18, only two (PCNT, UCP1) are among the HHMC gene list presented here, while most of the other changes are not human-specific, and only PCNT has been related to microcephaly107. Nevertheless, more such changes are found on both lineages: For example, our data reveals that in archaics there are AHMCs in the microcephaly candidate genes ASPM108 and CIT109. The ASPM-katanin complex controls microtubule disassembly at spindle poles and misregulation of this process can lead to microcephaly110, which is of interest given the presence of a HHMC in KATNA1 and a fixed non-coding change in KATNB1, while no such changes were observed in archaics111. Other genes associated with microcephaly that harbor non-synonymous SNCs are CASC5 (two in humans, one in archaics)112, CDK5RAP2 (in humans), MCPH1 (in archaics)113, ATRX (one in humans and archaics each)114, and NHEJ1115 (a deleterious one in humans, and one in archaics). The SPAG5 protein, which carries three fixed HHMCs, has been claimed to interact with CDK5RAP2116, is a direct target of PAX6117, via which it affects cell division orientation, and therefore is critical in the course of brain development. Disease mutations in SCAP or ADSL have also been associated with microcephaly phenotypes as well118,119, and Formin-2 (FMN2), which carries a deleterious regulatory change in modern humans, influences the development of the brain causing microcephaly in mice120.
Genes associated with brain growth trajectory changes lead not necessarily to a decrease but also an increase of brain size121, suggesting that the disease phenotype of macrocephaly might point to genes relevant in the context of brain growth as well. One of the few genes with several HHMCs, CASC5, has been found to be associated with gray matter volume differences122. It has been claimed that mutations in PTEN alter the brain growth trajectory and allocation of cell types through elevated Beta-Catenin signaling123. This gene is also present among differentially expressed genes in human neurons compared to chimpanzee neural progenitor cells during cerebral cortex development, which may relate to a lengthening of the prometaphase-metaphase in humans compared to chimpanzees that is specific to proliferating progenitors and not observed in non-neural cells124. We find that PTEN falls among the genes with the highest number of HF SNCs on the human lineage per length, and also among the genes with an excess on the modern human over the archaic lineage, suggesting that regulatory changes in this gene might have contributed to human-specific traits. This is also the case for the HHMC-carrying transcription factor TCF3, which is known to repress Wnt-Beta-Catenin signaling and maintain the neural stem cell population during neocortical development125. Among other macrocephaly-related genes with HHMCs in RNF135126, CUL4B127 and CCND2128, the latter also shows a large number of HF changes on the human lineage, and the HHMC in CUL4B is inferred to be deleterious (Table 2). Other macrocephaly candidates such as NFIX129, NSD1130 and GLI3131 have been claimed to have played an important role in shaping the distinctly modern human head132 and show numerous SNCs in non-coding regions. GLI3 might have been under positive selection19 and carries 20 HF SNCs on the human, but only one on the archaic lineage. Two of the very few genes hypothesized to regulate expansion and folding of the mammalian cerebral cortex by controlling radial glial cell number and fate, TRNP1133 and TMEM14B134, exhibit HF 3’-UTR changes in modern humans, and TRNP1 shows an excess of changes on the modern human lineage. The expression of these two genes in the outer subventricular zone might be important135, since this is an important region for complexification of neocortical growth in primates136, and for which an enriched activation of mTOR signaling has been reported137. In addition to other genes in the mTOR-pathway, such as PTEN138 or CCND2, two possibly interacting modulators139 of the mTOR signaling pathway stand out in our dataset: ZNHIT2 with one deleterious SNC (Table 2) might have been under positive selection19, and CCT6B carries a deleterious change according to both SIFT and C-score. The TF encoded by RB1CC1 is essential for maintaining adult neuronal stem cells in the subventricular zone of the cerebral cortex140. This gene carries a HHMC, a regulatory SNC that has been suggested to modify transcriptional activity141, and a signature of positive selection13.
Changes in the genes mentioned here could have contributed to the brain growth trajectory changes hypothesized to give rise to the modern human-specific globular braincase shape during the past several 100,000 years8,10,39. On the archaic side, an enrichment of genes with AHMCs associated to “Corneal structure” may relate to archaic-specific changes in brain growth-trajectories since the size and position of the frontal and temporal lobes might affect eye and orbit morphology142, and the macrocephaly-associated gene RIN2143 carries an AHMC. Finally, changes that might have affected the size of the cerebellum can be found in our dataset, such as the HF regulatory SNCs found in ZIC1 and ZIC4144, and the deleterious HHMC in ABHD14A, which is a target of ZIC1145.
The craniofacial phenotype
Differences other than brain-related properties are likely to have emerged after the split from archaic humans, some of which may have had an impact on cognition more indirectly146,147. Among the genes harboring HHMCs and found in selective sweep regions, the gene encoding the TF PRDM10 stands out, since this is the second-most interacting protein within the HHMC dataset. Although little is known about PRDM10, it may be related dendrite growth148 and to neural crest related changes that contributed to the formation of our distinct modern face149. Changes in genes related to craniofacial morphology would complement previous observations17,132, and we find an enrichment of genes with an excess of HF SNCs on the modern human lineage for the GO term “soft palate development”, which might relate to craniofacial properties that are relevant for language150. Among genes harboring an excess of HF SNCs associated with specific facial features, we find RUNX2, EDAR, and GLI3151, NFATC1152, SPOP153, DDR2154 and NELL1155, possibly due to changes in regulatory regions, while mutations in the HHMC-carrying gene encoding for the TF ATRX cause facial dysmorphism156. In addition, genes with HHMCs such as PLXNA2157, EVC2158, MEPE159 and SPAG17160 are known to affect craniofacial bone and tooth morphologies. These genes appear to be important in determining bone density, mineralization and remodeling, hence they may underlie differences between archaic and modern human facial growth161. Some of these facial properties may have been present in the earliest fossils attributed to H. sapiens, like the Jebel Irhoud fossils4, deviating from craniofacial features which emerged in earlier forms of Homo162, and may have become established before some brain-related changes discussed here39,163.
Other craniofacial morphology-related genes, such as DCHS2151, HIVEP2164, HIVEP3165, FREM1166, and FRAS1167 harbor AHMCs, while another bone-related gene, MEF2C168, shows an excess of HF changes on the archaic lineage. These changes may underlie some of the derived facial traits of Neanderthals169.
The impact on cognition and language
It has long been hypothesized that language and its neurological foundation were important for the evolution of humans and uniquely human traits, closely related to hypotheses on the evolution of cognition and behavior. It is noteworthy that among traits associated with cognitive functions such as language or theory of mind, the timing of myelination appears to be a good predictor of computational abilities170,171. Computational processing might have been facilitated by some of the changes presented here, at least in some of the circuits that have expanded in our lineage172, since subtle maturational differences early in development173 may have had a considerable impact on the phenotype. This might be linked to the specific brain growth trajectory in modern humans11, and reflected in the morphology of the parietal and temporal lobes174,175, as well as in the size of the cerebellum8. Archaic hominins likely had certain language-like abilities176,177, and hybrids of modern and archaic humans must have survived in their communities178. However, another important hint for human-specific features is that genes associated with axon guidance functions, which are important for the refinement of neural circuits including those relevant for speech and language, are found in introgression deserts36,179, and especially the FOXP2 region is depleted for archaic introgression, which seems to be a unidirectional and human-specific pattern68. This, together with putative positive selection after the split from Neanderthals180 and regulatory changes affecting FOXP2 expression181, could indicate modifications of a complex network in cognition or learning, possibly related to other brain-related, vocal tract132 or neural changes182. We suggest that some other genes with changes on the human lineage might have contributed more specifically to cognition-related changes, although we admit that the contribution of single SNCs to these functions is less straightforward than their contribution to molecular mechanisms, since disease mutations in many genes may have disruptive effects on cognitive abilities.
The basal ganglia are a brain region where FOXP2 expression is critical for the establishment and maintenance of language-related functions183,184. Several genes carrying HHMCs have been described previously as important for basal ganglia functions related to language and cognition. The HTT protein has long been implicated in the development of Huntington’s disease, which is associated with corticostriatal dysfunction, and is known to interact with FOXP2185. Mutations in SLITRK1 have been linked to Tourette’s syndrome, a disorder characterized by vocal and motor tics, resulting from a dysfunction in the corticostriatal-thalamocortical circuits186. NOVA1 regulates RNA splicing and metabolism in a specific subset of developing neurons, particularly in the striatum187. As pointed out above, NOVA1 is an interactor of ELAVL4, which belongs to a family of genes known to promote the production of deep layer FOXP2-expressing neurons188–190, and part of a neural network-related cluster that has been highlighted as putatively under positive selection in humans22. Within this network, α-synuclein (encoded by SCNA) might serve as a hub and is specifically expressed in brain regions important for vocal learning regions in songbirds67. SCNA and SV2C, which carries a HHMC, are involved in the regulation of dopamine release, with SV2C expression being disrupted in SCNA-deficient mice and in humans with Parkinson’s disease191. Genes in the cluster of selection signals22 are implicated in the pathogenesis of Alzheimer’s disease, which (together with Huntingon’s and Parkinson’s diseases) is linked to a FOXP2-driven network192. Some introgressed archaic alleles are downregulated in specific brain regions27, especially pronounced in the cerebellum and basal ganglia. One notable example is NTRK2, which shows an excess of HF changes on the human lineage and a signature of positive selection19, and is also a FOXP2 target35, a connection which has been highlighted for the vocal learning circuit in birds193. Other genes harboring HHMCs such as ENTHD1106 and STARD9194, as well as genes in introgression deserts25, have been associated with language deficits. It may indeed have taken a complex composite of changes to make our brain fully language-ready195, where not all changes needed to reach fixation.
In the broader context of cognition, we find an enrichment of HHMCs in genes associated to “Alzheimer’s disease (cognitive decline)” and “Cognitive decline (age-related)”, with seven associated genes (COX7B2, BCAS3, DMXL1, LIPC, PLEKHG1, TTLL2 and VIT). Two other genes linked to Alzheimer’s are PTEN196, and RB1CC1197. Among genes with deleterious HHMCs, SLC6A15 has been associated to emotional processing in the brain198, and may be part of modifications in glutamatergic transmission199, a category found in selective sweep regions147. GPR153, which carries one HHMC and two AHMCs, influences behavioral traits like decision making in rats, and is associated with various neuropsychiatric disorders in humans200. Another interesting candidate change in the context of cognitive abilities might affect the Adenylosuccinate Lyase (ADSL), for which the ancestral Neanderthal-like allele has not been observed in 1,000s of modern human genomes. This gene has been associated to autism201, is part of behavioral traits like “aggressive behavior” which have been found to be enriched on the human lineage17, and several studies detected a signal of positive selection in modern humans19,20,202. These observations make ADSL a strong candidate for human-specific features, particularly in light of the fact that the relevant HHMC is located in a region that is highly conserved and lies close to the most common disease mutation leading to severe adenylosuccinase deficiency20. Other relevant genes, similar to ADSL in carrying a fixed HHMC and being frequently found in selective sweep screens, are NCOA6, which might be related to autism as well203, and SCAP. Downregulation of the cholesterol sensor encoded by this gene has been shown to cause microcephaly, impaired synaptic transmission and altered cognitive function in mice119. We want to emphasize that the networks presented in the previous sections influencing brain growth and neural wiring are likely to impact cognitive functions, since disruptions in these networks would impair the healthy human brain. Furthermore, we find an enrichment of AHMCs in genes associated to Parkinson’s disease and “Attention deficit hyperactivity disorder and conduct disorder”, suggesting that changes may have taken place in related networks on the archaic lineage as well.
Life history and other phenotypic traits
Apart from their consequences for cognitive functions, it has been suggested that changes involved in synaptic plasticity might be interpreted in a context of neoteny19,100,204,205, with the implication of delayed maturation in humans206 and a longer timeframe for brain development. However, given their similar brain sizes207, humans and Neanderthals might both have needed a long overall maturation time208,209. Accordingly, notions like neoteny and heterochrony are unlikely to be fine-grained enough to capture differences between these populations, but early differences in infant brain growth between humans and Neanderthals8,210 could have rendered our maturational profile distinct during limited developmental periods and within specific brain regions, imposing different metabolic requirements211. One of the brain regions where such differences are found is the orbitofrontal cortex (OFC)174, and we find that the OFC at infant age (0-2 years) is enriched for the expression of genes that carry HHMCs compared to synonymous SNCs. We suggest that the development of the OFC in infants might have been subject to subtle changes since the split from Neanderthals rather than a general developmental delay, which is particularly interesting given that this brain region has been implied in social cognition212 and learning213.
Genes carrying HHMCs are enriched for expression in the gray matter of the forebrain at the adolescent age compared to AHMC-carrying genes, hence additional human-specific modifications during this period might have taken place, possibly linked to changes in myelination described above. It has been suggested that differences in childhood adolescence time existed between humans and Neanderthals, after a general developmental delay in the hominin lineage214,215. Dental evidence suggests an earlier maturation in Neanderthals than modern humans216, and it has been claimed that Neanderthals might have reached adulthood earlier217. Furthermore, an introgressed indel from Neanderthals causes an earlier onset of menarche in present-day humans218, supporting at least the existence of alleles for earlier maturation in the Neanderthal population. Among the genes carrying fixed HHMCs, NCOA6 has also been linked to age at menarche and onset of puberty219, as well as placental function220. This putative TF is enriched in HF changes and has been suggested to have been under positive selection on the modern human lineage19,202. The HHMC is located nearby and three 5’-UTR variants within a putatively selected region22, with an estimated time of selection at around 150 kya (assuming a slow mutation rate). Even though this gene carries an AHMC as well, it remains possible that modern humans acquired subtle differences in their reproductive system through lineage-specific changes in this gene. A delay in reproductive age may influence overall longevity, another trait for which our data set yields an enrichment of genes with HHMCs (SLC38A10, TBC1D22A and ZNF516).
The male reproductive system might have been subject to changes as well, since we find that several proteins in spermatogenesis seem to carry two HHMCs: Sperm Specific Antigen 2 (SSFA2), Sperm Associated Antigen 17 (SPAG17), ADAM18221 and WDR52222, out of which ADAM18 and SPAG17 also carry AHMCs. Lineage-specific differences in genes related to sperm function or spermatogenesis might have been relevant for the genetic compatibility between humans and Neanderthals. Another gene harboring a HHMC with similar functions is EPPIN223, which shows no HF changes on the archaic, but 27 such SNCs on the modern human lineage. The gene encoding for the Testis Expressed 2 protein (TEX2) is enriched for HF changes in both humans and archaics, with one HHMC and five AHMCs, but its function is not yet known. Another possible SNC that might be relevant in this context is a splice site change in IZUMO4, since proteins encoded by the IZUMO family form complexes on mammalian sperm224. The adjacent exon is not present in all transcripts of this gene, suggesting a functional role of this splice site SNC. Finally, genes in the GO category “spermatoproteasome complex” are enriched for an excess of HF changes on the human compared to the archaic lineage.
It has been found that Neanderthal alleles contribute to addiction and, possibly, pain sensitivity in modern humans225,226. In this context, an interesting protein-truncating SNC at high frequency in humans is the loss of a stop codon in the opioid receptor OPRM1 (6:154360569), potentially changing the structure of the protein encoded by this gene in some transcripts. Other mutations in this gene are associated to heroin addiction227, and pain perception228, but also sociality traits229. Interestingly, a recent study found a pain insensitivity disorder caused by a mutation in ZFHX2230, which carries an AHMC, and three HHMCs are observed in NPAP1, which might be associated with the Prader-Willi syndrome, involving behavioral problems and a high pain threshold231. Such changes may point to differences in levels of resilience to pain between Neanderthals and modern humans.
Conclusion
The long-term evolutionary processes that led to the human condition1 is still subject to debate and investigation, and the high-quality genomes from archaic humans provide opportunities to explore the recent evolution of our species. We want to contribute to an attempt to unveil the genetic basis of specific molecular events in the time-window after the split from these archaic populations and before the emergence of most of the present-day diversity. We sought to combine different sources of information, from genome-wide enrichment analyses to functional information available for specific genes, to identify threads linking molecular needles in this expanded haystack. In doing so, we have mainly built on existing proposals concerning brain-related changes, but we have divided the observations into different biological levels, from cellular changes through brain organization differences to complex phenotypic traits. Only future experimental work will determine which of the changes highlighted here contributed significantly to making us “fully human”. We hope that our characterization and presentation of some new candidate genes will help prioritize inquiry in this area.
Author contributions
M.K. and C.B. analyzed data and wrote the manuscript.
Competing interests statement
The authors declare no competing interests.
Supplementary Table & Figure legends
Table S1: List of HHMCs and genomic features
Table S2: List of low-confidence HHMCs and genomic features
Table S3: List of AHMCs and genomic features
Table S4: List of low-confidence AHMCs and genomic features
Table S5: Top 5% of genes by HF SNC density on the modern human and archaic lineages, and top 10% of genes by relative excess of HF SNCs on one lineage over the other.
Table S6: GO enrichment for genes with relative excess of HF SNCs on the human over the archaic lineage
Table S7: GWAS enrichment for genes with HHMCs or AHMCs
Table S8: Number of interactions among genes with HHMCs or AHMCs
Table S9: Genes with HHMCs that are TFs, or at the centrosome interface
Table S10: Enrichment in developing brain zones for genes with HHMCs or AHMCs, proportion of random gene sets with larger overlap (Methods).
Figure S1: Distribution of missense and non-synonymous HF SNCs across chromosomes.
Figure S2: STRING graph of interactions among genes with HHMCs.
Methods
We used the publicly available high-coverage genotypes for three archaic individuals: One Denisovan12, one Neanderthal from the Denisova cave in Altai mountains13, and another Neanderthal from Vindija cave, Croatia14. The data is publicly available under http://cdna.eva.mpg.de/neandertal/Vindija/VCF/, with the human genome version hg19 as reference. We applied further filtering to remove sites with less than 5-fold coverage and more than 105-fold coverage in the Altai Neanderthal or 75-fold coverage in the other archaic individuals, if such cases occurred. We also removed sites with genotype quality smaller than 20, and heterozygous sites with strong allele imbalance (<0.2 minor allele frequency). Although these permissive filters increase power compared to previous studies, we caution that in some cases genotypes might be incorrect. We added the genotype and coverage for the exome and chromosome 21 sequences of the Vindija and El Sidrón Neanderthals from previous studies2,17, with 75-fold and 50-fold coverage cutoffs, respectively. These studies provided data for the same Vindija individual14.
We applied the Ensembl Variant Effect Predictor232 in order to obtain inferences for protein-coding and regulatory mutations, scores for SIFT233, PolyPhen234, CADD235 and GWAVA236, and allele frequencies in the 1000 Genomes and ExAC human variation databases15,237. We used the inferred ancestral allele from238, and at positions where this information was not available, the macaque reference allele, rheMac3239. We determined the allele frequencies in present-day humans using the dbSNP database build 147240. We retrieved the counts for each allele type, and summarized the counts of non-reference alleles at each position. Grantham scores241 were calculated for missense mutations.
Data processing and database retrieval was performed using bcftools/samtools v1.0242, bedtools v2.16.2243, and R/Bioconductor244, rtracklayer245 and biomaRt246, and plotting with RCircos247. We analyzed all positions where at least two alleles (human reference and alternative allele) were observed among the human reference and at least one out of three of the high-coverage archaic individuals in at least one chromosome. The 22 autosomal chromosomes and the X chromosome were analyzed, in the absence of Y chromosome data for the three female archaic individuals. The data for 4,409,518 segregating sites is available under [http:tbd.database]. The following subsets were created:
Fixed differences: Positions where all present-day humans carry a derived allele, while at least two out of three archaics carry the ancestral allele, accounting for potential human gene flow into Neanderthals.
High-frequency (HF) differences: Positions where more than 90% of present-day humans carry a derived allele, while at least the Denisovan and one Neanderthal carry the ancestral allele, accounting for different types of errors and bi-directional gene flow.
Extended high-frequency differences: Positions where more than 90% of present-day humans carry a derived allele, while one of the following conditions is true: a) Not all archaics have reliable genotypes, but those that have carry the ancestral allele. b) Some archaics carry an alternative genotype that is not identical to either the human or the ancestral allele. c) The Denisovan carries the ancestral allele, while one Neanderthal carries a derived allele, which allows for gene flow from humans into Neanderthals. d) The ancestral allele is missing in the EPO alignment, but the macaque reference sequence is identical to the allele in all three archaics.
We also created corresponding lists of archaic-specific changes. Fixed changes were defined as sites where the three archaics carry the derived allele, while humans carry the ancestral allele at more than 99.999%. High-frequency changes occur to less than 1% in present-day humans, while at least two archaic individuals carry the derived allele. An extended list presents high-frequency changes where the ancestral allele is unknown, but the macaque allele is identical to the present-day human allele.
A ranking of mutation density was performed for genes with protein-coding sequences and their genomic regions as retrieved from Ensembl. For each gene, unique associated changes as predicted by VEP were counted. A ranking on the number of HF changes per gene length was performed for all genes that span at least 5,000 bp in the genome and carry at least 25 segregating sites in the dataset (at any frequency in humans or in archaics), in order to remove genes which are very short or poor in mutations. The top 5% of the empirical distribution was defined as putatively enriched for changes on each lineage. The ratio of lineage-specific HF changes was calculated for the subset of genes where at least 20 lineage-specific HF changes were observed on the human and the archaic lineages combined. The top 10% of the empirical distribution was defined as putatively enriched for lineage-specific changes.
We performed enrichment tests using the R packages ABAEnrichment44 and DescTools248. We used the NHGRI-EBI GWAS Catalog41, and overlapped the associated genes with protein-coding changes on the human and archaic lineages, respectively. We counted the number of HF missense changes on each lineage and the subset of those associated to each trait (“Disease trait”), and performed a significance test (G-test) against the number of genes associated to each trait, and all genes in the genome, with a P value cutoff at 0.1. This suggests a genome-wide enrichment of changes for each trait. We then performed a G-test between the numbers of HF missense changes on each lineage, and the subset of each associated to each trait (P-value cutoff at 0.1), to determine a difference between the two lineages. We then performed an empirical test by creating 1,000 random sets of genes with similar length as the genes associated to each trait, and counting the overlap to the lineage-specific missense changes. At least 90% of these 1,000 random sets were required to contain fewer missense changes than the real set of associated genes. Only traits were considered for which at least 10 associated loci were annotated.
Gene Ontology (GO) enrichment was performed using the software FUNC249, with a significance cutoff of the adjusted p-value < 0.05 and a family-wise error rate < 0.05. When testing missense changes, a background set of synonymous changes on the same lineage was used for the hypergeometric test. When testing genes with relative mutation enrichment, the Wilcoxon rank test was applied. Enrichment for sequence-specific DNA-binding RNA polymerase II transcription factors (TFs) and TF candidate genes from43, and genes interacting at the centrosome-cilium interface49 was tested with an empirical test in which 1,000 random sets of genes were created that matched the length distributions of the genes in the test list. The same strategy was applied for genes expressed in the developing brain (Table S10)45. Protein-protein interactions were analyzed using the STRING online interface v10.5250 with standard settings (medium confidence, all sources, query proteins only) as of January 2018. The overlap with selective sweep screens considers HHMCs within 50,000 bp of the selected regions13,19,22.
Acknowledgments
We thank S. Han and T. Marques-Bonet for helpful discussions, and A. G. Andirkó and P .T. Martins for help with figures. M.K. is supported by a Deutsche Forschungsgemeinschaft (DFG) fellowship (KU 3467/1-1). C.B. acknowledges research funds from the Spanish Ministry of Economy and Competitiveness (grant FFI2016-78034-C2-1-P), Marie Curie International Reintegration Grant from the European Union (PIRG-GA-2009-256413), research funds from the Fundació Bosch i Gimpera, MEXT/JSPS Grant-in-Aid for Scientific Research on Innovative Areas 4903 (Evolinguistics: JP17H06379), and Generalitat de Catalunya (Government of Catalonia) – 2017-SGR-341.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.
- 22.↵
- 23.↵
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵
- 163.↵
- 164.↵
- 165.↵
- 166.↵
- 167.↵
- 168.↵
- 169.↵
- 170.↵
- 171.↵
- 172.↵
- 173.↵
- 174.↵
- 175.↵
- 176.↵
- 177.↵
- 178.↵
- 179.↵
- 180.↵
- 181.↵
- 182.↵
- 183.↵
- 184.↵
- 185.↵
- 186.↵
- 187.↵
- 188.↵
- 189.
- 190.↵
- 191.↵
- 192.↵
- 193.↵
- 194.↵
- 195.↵
- 196.↵
- 197.↵
- 198.↵
- 199.↵
- 200.↵
- 201.↵
- 202.↵
- 203.↵
- 204.↵
- 205.↵
- 206.↵
- 207.↵
- 208.↵
- 209.↵
- 210.↵
- 211.↵
- 212.↵
- 213.↵
- 214.↵
- 215.↵
- 216.↵
- 217.↵
- 218.↵
- 219.↵
- 220.↵
- 221.↵
- 222.↵
- 223.↵
- 224.↵
- 225.↵
- 226.↵
- 227.↵
- 228.↵
- 229.↵
- 230.↵
- 231.↵
- 232.↵
- 233.↵
- 234.↵
- 235.↵
- 236.↵
- 237.↵
- 238.↵
- 239.↵
- 240.↵
- 241.↵
- 242.↵
- 243.↵
- 244.↵
- 245.↵
- 246.↵
- 247.↵
- 248.↵
- 249.↵
- 250.↵