Abstract
Changes in gene regulation are broadly accepted as key drivers of phenotypic differences between closely related species. However, identifying regulatory changes that shaped human-specific traits is a very challenging task. Here, we use >60 DNA methylation maps of ancient and present-day human groups, as well as six chimpanzee maps, to detect regulatory changes that emerged in modern humans after the split from Neanderthals and Denisovans. We show that genes affecting vocalization and facial features went through particularly extensive methylation changes. Specifically, we identify silencing patterns in a network of genes (SOX9, ACAN, COL2A1 and NFIX), and propose that they might have played a role in the reshaping of the human face, and in forming the 1:1 vocal tract configuration that is considered optimal for speech. Our results provide insights into the molecular mechanisms that may have shaped the modern human face and voice, and suggest that they arose after the split from Neanderthals and Denisovans.
Introduction
The advent of high-quality ancient genomes of archaic humans (Neanderthal and Denisovan) opened up the possibility to identify the genetic basis of some unique modern human traits (Meyer et al., 2012; Prüfer et al., 2014). A common approach is to carry out sequence comparisons and detect non-neutral sequence changes. However, out of ∼30,000 substitutions and indels that reached fixation on the lineage of present-day humans after their separation from archaic humans, only ∼100 directly alter amino acid sequence (Prüfer et al., 2014), and as of today our ability to estimate the biological effects of the remaining ∼30,000 changes is very restricted. Although most of these noncoding changes are probably nearly neutral, many others may affect gene function, especially those in regulatory regions like promoters and enhancers. Such regulatory changes may have sizeable impact on human evolution, as alterations in gene regulation are thought to underlie much of the phenotypic variation between closely related groups (Fraser, 2013; King and Wilson, 1975). Thus, directly examining DNA regulatory layers such as DNA methylation could enhance our understanding of the development of human-specific traits far beyond what can be achieved using sequence comparison alone (Hernando-Herraez et al., 2015a).
To gain insight into the regulatory changes that underlie human evolution, we have previously developed a method to reconstruct pre-mortem DNA methylation maps of ancient genomes (Gokhman et al., 2014) based on analysis of patterns of damage to ancient DNA (Briggs et al., 2010; Gokhman et al., 2014; Pedersen et al., 2014). We have used this method to reconstruct the methylomes of a Neanderthal and a Denisovan, and compared them to a present-day osteoblast methylation map. However, the ability to identify differentially methylated regions (DMRs) between the human groups was confined by the incomplete osteoblast reference map (providing methylation information for ∼10% of CpG sites), differences in sequencing technologies, lack of an outgroup and a restricted set of skeletal samples (see Methods). Here, we used a comprehensive assembly of skeletal DNA methylation maps from modern humans, archaic humans and chimpanzees to identify DMRs that separate hominin groups. By testing the regulatory changes we identified against known anatomical effects of genes (Gokhman et al., 2017a; Köhler et al., 2014), we found that genes that affect vocal, facial and pelvic anatomy have gone through extensive DNA methylation changes that are unique to modern humans.
Results
To gain insight into the evolutionary dynamics of methylation along the hominin tree, we reconstructed ancient methylation maps of eight individuals: in addition to the previously published Denisovan and Altai Neanderthal methylation maps (Gokhman et al., 2014), we reconstructed the methylome of the ∼40,000 years old (yo) Vindija Neanderthal, and three methylomes of anatomically modern humans: the ∼45,000 yo Ust’-Ishim individual (Fu et al., 2014), the ∼8,000 yo Loschbour individual (Lazaridis et al., 2014), and the ∼7,000 yo Stuttgart individual (Lazaridis et al., 2014). We also sequenced to high-coverage and reconstructed the methylomes of the ∼7,000 yo La Braña 1 individual (22x) (which was previously sequenced to low-coverage (Olalde et al., 2014) and a 7,500 yo individual from Anatolia, Turkey (I1583, 24x), which was previously sequenced using a capture array (Mathieson et al., 2015).
To this we added 52 publically available partial bone methylation maps from present-day individuals, produced using 450K methylation arrays (Horvath et al., 2015; Lokk et al., 2014). To obtain full present-day bone maps, we produced whole-genome bisulfite sequencing (WGBS) methylomes from the femur bones of two present-day individuals (Bone1 and Bone2). Hereinafter, ancient and present-day modern humans are collectively referred to as modern humans (MHs), while the Neanderthal and Denisovan are referred to as archaic humans. As an outgroup, we produced methylomes of five chimpanzees (one WGBS and four 850K methylation arrays). Together, these data establish a unique and comprehensive platform to study DNA methylation dynamics in recent human evolution (Table S1).
Identification of DMRs
Methylation maps may differ due to factors such as sex, age, health state, environment, and tissue type. In addition, the comparison of DNA methylation maps that were produced using different technologies could potentially introduce artifacts in DMR-detection. In order to account for these confounding factors and to identify DMRs that reflect evolutionary differences between human groups, we took a series of steps. To minimize false positives that could arise from the comparison of maps produced using various technologies, we set the reconstructed Ust’-Ishim methylome as the MH reference, to which we compared the Altai Neanderthal and the Denisovan. We developed a DMR-detection method for ancient methylomes, which accounts for potential noise introduced during reconstruction, as well as differences in coverage and deamination rates (Figure 1 and Methods). To minimize the number of false positives and to identify DMRs that are most likely to have a regulatory effect, we applied a strict threshold of >50% difference in methylation across a minimum of 50 CpGs. This also filters out environmentally-induced DMRs which typically show low methylation differences and limited spatial scope (Gokhman et al., 2017b). Using this method, we identified 9,679 regions that showed methylation differences between these individuals. These regions do not necessarily represent evolutionary differences between the human groups. Rather, many of them could be attributed to factors separating the three individuals (e.g., Ust’-Ishim is a male whereas the archaic humans are females), or to variability within populations. To minimize such effects, we used the 59 additional human maps to filter out regions where variability in methylation is detected. We adopted a conservative approach, whereby we take only loci where methylation in one hominin group is found completely outside the range of methylation in the other groups. Importantly, our samples come from both sexes, from individuals of various ages and ancestries, from sick and healthy individuals, and from a variety of skeletal parts (femur, skull, phalanx, tooth, and rib; Table S1). Hence, the use of these samples to filter out DMRs is expected to cover much of the variation that stems from the above factors (Figure 1, Figure 2A-C). This step resulted in a set of 7,649 DMRs that discriminate between the human groups, which we ranked according to their significance level.
Next, using the chimpanzee samples, we were able to determine for 2,825 of these DMRs the lineage where the methylation change occurred (Figures 2D and 3A). Of these DMRs 873 are MH-derived, 939 are archaic-derived, 443 are Denisovan-derived, and 570 are Neanderthal-derived (Figure 3A, Table S2). The extensive set of MH maps used to filter out within-population variability led us to focus in this work on MH-derived DMRs.
Face and voice-affecting genes are derived in MHs
We defined differentially methylated genes (DMGs) as genes that overlap at least one DMR along their body or up to a distance of 5 kb upstream. The 873 MH-derived DMRs are linked to 588 MH-derived DMGs (Table S2). To gain insight into the function of these DMGs, we first analyzed their gene ontology (GO). As expected from a comparison between skeletal tissues, MH-derived DMGs are enriched with terms associated with the skeleton (e.g., endochondral bone morphogenesis, trabecula morphogenesis, palate development, regulation of cartilage development, chondrocyte differentiation and bone morphogenesis). Also notable are terms associated with the skeletal muscle, cardiovascular and nervous system (Table S3).
To acquire a more precise understanding of the possible functional consequences of these DMGs, we used Gene ORGANizer, which links human genes to the organs they phenotypically affect (Gokhman et al., 2017a). Unlike tools that use GO terms or RNA expression data, Gene ORGANizer is based entirely on curated gene-disease and gene-phenotype associations from monogenic diseases. Therefore, it relies on direct phenotypic observations in human patients whose condition results from known gene perturbations. Using Gene ORGANizer we found 11 organs that are over-represented within the 588 MH-derived DMGs, eight of which are skeletal parts that can be divided into three regions: the voice box (larynx), face, and pelvis (Figure 3B, Table S4). The strongest enrichment was observed in the laryngeal region (x2.11 and x1.68, FDR = 0.017 and 0.048, for the vocal cords and larynx, respectively), followed by facial and pelvic structures, including the teeth, forehead, jaws, and pelvis. Interestingly, the face and pelvis are considered the most morphologically divergent regions between Neanderthals and MHs (Weaver, 2009) and our results reflect this divergence through gene regulation changes. The enrichment of the vocal tract (the pharyngeal, oral and nasal cavities, where sound is filtered to specific frequencies) (Fitch, 2000; Lieberman, 2007) is also apparent when examining patterns of gene expression. This analysis shows that the pharynx and larynx are the most enriched organs within MH-derived DMGs (1.7x and 1.6x, FDR = 5.6 × 10−6 and FDR = 7.3 × 10−7, respectively, Table S3). We also found that 29 of the MH-derived DMRs overlap previously reported craniofacial development enhancers (4.97-fold compared to expected, P < 10−6, randomization test) (Prescott et al., 2015).
To test whether this enrichment remains if we take only the most confident DMRs, we limited the analysis only to DMGs where the most significant DMRs are found (top quartile). Here, the over-representation of voice-affecting genes became more pronounced, with the vocal cords enriched almost 3-fold (FDR = 0.028), and the larynx over 2-fold (FDR = 0.028, Figure 3C, Table S4).
Next, we reasoned that skeletal-associated genes are likely to be enriched when comparing DNA methylation maps originating from bones, hence introducing potential biases. To test whether the over-representation of the larynx, face and pelvis is a consequence of this, we compared the fraction of genes affecting the face, larynx and pelvis among all skeletal genes to their fraction within the skeletal genes in the MH-derived DMGs. We found that genes affecting the face, larynx and pelvis are significantly over-represented within skeletal MH-derived DMGs (P = 1.0 × 10−5, P = 1.3 × 10−3, P = 2.1 × 10−3, P = 0.03, for vocal cords, larynx, face, and pelvis, respectively, hypergeometric test). Additionally, we conducted a permutation test on the list of 129 MH-derived DMGs that are linked to organs on Gene ORGANizer, replacing those that are linked to the skeleton with randomly selected skeleton-related genes. We then ran the list in Gene ORGANizer and compared the enrichment. We repeated the process 100,000 times and found that the enrichment (significance) levels we observed within MH-derived DMGs are significantly higher (lower) than expected by chance for the laryngeal and facial regions, but not for the pelvis (Figure S1A,B) where DMGs that do not affect the skeleton remain unsubstituted, but genes involved in skeletal processes were randomly replaced with skeleton-affecting genes. We ran Gene ORGANizer on 100,000 such randomized lists and found that the enrichment levels we observed within MH-derived DMGs are significantly higher than expected by chance for the face, larynx and vocal cords, but not for the pelvis (Figure S1A). Thus, the fact that the DMGs were detected in a comparison of bone methylomes is unlikely to underlie the observed enrichment of the larynx, vocal cords and face, but it could potentially drive the enrichment of genes related to the pelvis. We therefore focus hereinafter on genes affecting the facial and laryngeal regions.
We next analyzed whether pleiotropy could underlie the observed enrichments. To some extent, Gene ORGANizer negates pleiotropic effects (Gokhman et al., 2017a). Despite the fact that the DMGs belong to different pathways, and some have pleiotropic functions (Gokhman et al., 2017a; Kanehisa et al., 2016; Köhler et al., 2014), their most significantly shared properties are still in shaping the vocal and facial anatomy. Nevertheless, we tested this possibility more directly, estimating the pleiotropy of each gene by counting the number of different Human Phenotype Ontology (HPO) terms that are associated with it across the entire body (Köhler et al., 2014). We found that DMGs do not tend to be more pleiotropic than the rest of the genome (P = 0.17, t-test), nor do voice- and face-affecting DMGs tend to be more pleiotropic than other DMGs (P = 0.19 and P = 0.27, respectively).
Potentially, longer genes have higher probability to overlap DMRs. To test whether variability in gene length might have contributed to the patterns we report, we took only DMGs with DMRs in their promoter region (−5 kb to +1 kb around the TSS). We observe very similar levels of enrichment (2.02x, 1.67x, and 1.24x, for vocal cords, larynx and face, respectively, albeit FDR values > 0.05 due to low statistical power), suggesting that gene length does not affect the observed enrichment in genes affecting the face and larynx.
Additionally, to test whether cellular composition or differentiation state could bias the results, we ran Gene ORGANizer on the list of DMGs, following the removal of 20 DMRs that are found <10 kb from loci where methylation was shown to change during osteogenic differentiation (Håkelien et al., 2014). We found that genes affecting the voice and face are still the most over-represented (2.13x, 1.71x and 1.27x, FDR = 0.032, FDR = 0.049, and FDR = 0.040, for vocal cords, larynx and face, respectively, Table S4).
We also investigated the possibility that (for unknown reason) the DMR-detection algorithm introduces positional biases that preferentially identify DMRs within genes affecting the voice or face. To this end, we simulated stochastic deamination processes along the Ust’-Ishim, Altai Neanderthal and Denisovan genomes, reconstructed methylation maps and ran the DMR-detection algorithm on these maps. We repeated the process 100 times for each hominin and found no enrichment of any body part, including the face, vocal cords or larynx (1.07x, 1.07x, and 1.04x, respectively, FDR = 0.88 for vocal cords, larynx and face). Perhaps most importantly, none of the other archaic branches shows enrichment of the larynx or the vocal cords. However, archaic-derived DMGs show over-representation of the jaws, as well as the lips, limbs, scapulae, and spinal column (Figure S1B, Table S4). In addition, DMRs that separate chimpanzees from all humans (archaic and modern, Table S2) do not show over-representation of genes that affect the voice, larynx or face, compatible with the notion that this trend emerged along the MH lineage. Lastly, we added a human bone reduced representation bisulfite sequencing (RRBS) map (Wang et al., 2012), and produced a RRBS map from a chimpanzee infant unspecified long bone (Table S1, see Methods). RRBS methylation maps include information on only ∼10% of CpG sites, and are biased towards unmethylated sites. Therefore, they were not included in the previous analyses. However, we added them in this part as they originate from a chimpanzee infant and a present-day human that is of similar age to the Denisovan (Table S1), allowing sampling from individuals that are younger than the rest. Repeating the Gene ORGANizer analysis after including these samples in the filtering process, we found that the face and larynx are the only significantly enriched skeletal regions, and the enrichment within voice-affecting genes becomes even more pronounced (2.33x, FDR = 7.9 × 10−3, Table S4). Overall, we observe that MH-derived DMGs across all 60 MH samples are found outside archaic human variability, regardless of bone type, disease state, age or sex, and that chimpanzee methylation levels in these DMGs cluster closer to archaic humans than to MHs, suggesting that these factors are unlikely to underlie the observed trends.
Taken together, we conclude that DMGs that emerged along the MH lineage are uniquely enriched in genes affecting the voice and face, and that this is unlikely to be an artifact of (a) inter-individual variability resulting from age, sex, disease or bone type; (b) significance level of DMRs; (c) the reconstruction or DMR-detection processes; (d) pleiotoropic effects of the genes; (e) the types of maps used in these processes; (f) the comparison of bone methylomes; or (g) gene length distribution.
Overall, we report 32 voice- and larynx-affecting DMGs. Disease-causing mutations in these genes have been shown to underlie various phenotypes, ranging from slight changes to the pitch and hoarseness of the voice, to a complete loss of speech ability (Table 1) (Gokhman et al., 2017a). These phenotypes were shown to be driven primarily by alterations to the laryngeal skeleton and vocal tract. Importantly, the laryngeal skeleton, and particularly the cricoid and arytenoid cartilages to which the vocal cords are anchored, are closest developmentally to limb bones, as these are the skeletal tissues that derive from the somatic layer of the lateral plate mesoderm. Methylation patterns in differentiated cells are often established during earlier stages of development, and the closer two tissues are developmentally, the higher the similarity between their methylation maps (Hernando-Herraez et al., 2015a, 2015b; Hon et al., 2013; Schultz et al., 2015; Ziller et al., 2013). Indeed, DMRs identified between species in one tissue often exist in other tissues as well (Hernando-Herraez et al., 2015b). Thus, it is likely that many of the DMRs identified here between limb samples also exist between laryngeal tissues. This is further supported by the observation that the methylation patterns in these DMGs appear in all examined skeletal samples, including femur, skull, rib, tibia and tooth.
Extensive methylation changes within face and voice-affecting genes
Our results suggest that methylation levels in many face- and voice-affecting genes have changed since the split from archaic humans, but they do not provide information on the extent of changes within each gene. To do so, we scanned the genome in windows of 100 kb and computed the fraction of CpGs which are differentially methylated in MHs (hereinafter, MH-derived CpGs). We found that the extent of changes within voice-affecting DMGs is most profound, more than 2-fold compared to other DMGs (0.132 vs. 0.055, FDR = 2.3 × 10−3, t-test, Table S5). Face-affecting DMGs also present high density of MH-derived CpGs (0.079 vs. 0.055, FDR = 2.8 × 10−3). In archaic-derived DMGs, on the other hand, the extent of changes within voice- and face-affecting genes is not different than expected (FDR = 0.99, Table S5). To control for possible biases, we repeated the analysis using only the subset of DMRs in genes affecting the skeleton. Here too, we found that voice-affecting MH-derived DMGs present the highest density of changes (+154% for vocal cords, +140% for larynx, FDR = 1.4 × 10−3, Table S5), and face-affecting DMGs also exhibit significantly elevated density of changes (+42% for face, FDR = 0.04).
Interestingly, when ranking DMGs according to the fraction of MH-derived CpGs, three of the top five, and all top five skeleton-related DMGs (ACAN, SOX9, COL2A1, XYLT1 and NFIX) affect lower and midfacial protrusion, as well as the voice (Frenzel et al., 1998; Lee and Saint-Jeannet, 2011; Meyer et al., 1997; Tompson et al., 2009) (Figure 4A,B). This is particularly surprising considering that genome-wide, less than 2% of genes (345) are known to affect the voice, ∼3% of genes (726) are known to affect lower and midfacial protrusion, and less than 1% (182) are known to affect both. We also found that DMRs in voice- and face-affecting genes tend to be located ∼20x closer than expected to MH-specific candidate positively selected loci (Peyrégne et al., 2017) (P < 10−5, permutation test), and 50% closer compared to other MH-derived DMRs (P < 10−5, Figure 4C). This is consistent with the possibility that some of these observations could have been driven by positive selection.
The extra-cellular matrix genes ACAN and COL2A1, and their key regulator SOX9, form a network of genes that regulate skeletal growth, the transition from cartilage to bone, and spatio-temporal patterning of skeletal development, including facial and laryngeal skeleton in human (Lee and Saint-Jeannet, 2011; Meyer et al., 1997) and mouse (Ng et al., 1997). SOX9 is regulated by a series of upstream enhancers identified in mouse and human (Bagheri-Fam et al., 2006; Sekido and Lovell-Badge, 2008; Yao et al., 2015). In human skeletal samples, hypermethylation of the SOX9 promoter was shown to down-regulate its activity, and consequently its targets (Kim et al., 2013). This was also demonstrated repeatedly in non-skeletal human (Aleman et al., 2008; Cheng et al., 2015; Wagner et al., 2014) and mouse tissues (Huang et al., 2017; Pamnani et al., 2016). We found substantial hypermethylation in MHs in the following regions: (a) the SOX9 promoter; (b) three of its proximal enhancers, including one that is active in mesenchymal cells (Yao et al., 2015); (c) four of its skeletal enhancers; (d) the targets of SOX9 – ACAN (DMR #80) and COL2A1 (DMR #1, the most significant MH-derived DMR); and (e) an upstream lincRNA (LINC02097). Notably, regions (a), (b), (c) and (e) are covered by the longest DMR on the MH-derived DMR list, spanning 35,910 bp (DMR #11, Figure 5). Additionally, a more distant putative enhancer, located 345kb upstream of SOX9, was shown to bear strong active histone modification marks in chimpanzee craniofacial progenitor cells, whereas in humans these marks are almost absent (∼10x stronger in chimpanzee, suggesting down-regulation, Figure 5B) (Prescott et al., 2015).
Importantly, human and chimpanzee non-skeletal tissues (i.e., brain and blood) exhibit very similar methylation patterns in these genes, suggesting they are bone-specific. Also, the amino acid sequence coded by each of these genes is identical across the hominin groups (Prüfer et al., 2014), suggesting that the changes along the MH lineage are purely regulatory. Together, these observations put forward the notion that SOX9 became down-regulated in MH skeletal tissues, likely followed by down-regulation of its targets, ACAN and COL2A1.
XYLT1, the 4th highest skeleton-related DMG, is an enzyme involved in the synthesis of glycosaminoglycan. Loss-of-function mutations and reduced expression of the gene were shown to underlie the Desbuquois dysplasia skeletal syndrome, which was observed to affect the cartilaginous structure of the larynx, and drive a retraction of the face (Hall, 2001). Very little is known about XYLT1 regulation, but interestingly, in zebrafish it was shown to be bound by SOX9 (Ohba et al., 2015).
NFIX methylation is inversely correlated with its expression
To further explore expression changes that are associated with changes in methylation, we scanned the DMRs to identify those whose methylation levels are strongly correlated with expression across 21 human tissues. We found 59 such MH-derived DMRs (FDR < 0.05). DMRs in voice-affecting genes are significantly more likely to be associated with expression compared to other DMRs (x2.05, P = 6.65 × 10−4, hypergeometric test). Particularly noteworthy is NFIX, one of the most derived genes in MHs (ranked 5th among DMGs affecting the skeleton, Figure 4A,B). NFIX has two DMRs (#24 and #167), and in both, methylation levels are tightly linked with expression, explaining 81.7% and 73.9% of its expression variation, respectively (FDR = 6.2x10−3 and 7.5x10−4, Figure 6A-C). In fact, NFIX is one of the top ten DMGs with the most significant correlation between methylation and expression in human. The association between NFIX methylation and expression was also shown previously across several mouse tissues (Carrió et al., 2015; Maunakea et al., 2010), and suggests that the observed hypermethylation reflects down-regulation that emerged along the MH lineage. Indeed, we found that NFIX, as well as SOX9, ACAN, COL2A1, and XYLT1 show significantly reduced expression levels in humans compared to mice (Figure 6D). Most of the disease phenotypes that result from NFIX dysfunction are in the craniofacial region, as NFIX influences the balance between lower and upper projection of the face (Malan et al., 2010). In addition, mutations in NFIX were shown to impair speech capabilities (Shaw et al., 2010). Taken together, these observations suggest that DNA methylation is a primary mechanism in the regulation of NFIX, and serves as a good proxy for its expression. Interestingly, NFI proteins were shown to bind the upstream enhancers of SOX9 (Pjanic et al., 2013), hence suggesting a possible mechanism to the simultaneous changes in these genes.
Discussion
Humans are distinguished from other apes in their unique capability to communicate through speech. This capacity is attributed not only to neural changes, but also to structural alterations to the vocal tract. The relative roles of anatomy and cognition in our speech skills are still debated, but it is nevertheless widely accepted that even with a human brain, other apes could not reach the human level of articulation (Fitch, 2000; Fitch et al., 2017; Lieberman, 2007, 2017). Nonhuman apes are restricted not only in their linguistic capacity (e.g., they can hardly learn grammar (Fitch, 2000)), but also in their ability to produce the phonetic range that humans can. Indeed, chimpanzees and bonobos communicate through sign language and symbols much better than they do vocally, even after being raised in an entirely human environment (Fitch, 2000). Phonetic range is determined by the different conformations that the vocal tract can produce. These conformations are largely shaped by the relative position of the larynx, tongue, lips and mandible. Modern humans have a 1:1 proportion between the horizontal and vertical dimensions of the vocal tract, which is unique among primates (Figure 6E) (Lieberman, 2007; Lieberman et al., 2001). It is still debated whether this configuration is a prerequisite for speech, but it was nonetheless shown to be optimal for speech (De Boer, 2010; Fitch, 2000; Lieberman, 2007; Lieberman et al., 2001). The 1:1 proportion was reached through retraction of the human face, together with the descent of the larynx (Lieberman, 2011).
A longstanding question is whether Neanderthals and modern humans share similar vocal anatomy (Boë et al., 2002; Fitch, 2000; Lieberman P. and McCarthy C., 2014; Steele et al., 2013). Attempts to answer this question using anthropological remains proved hard, as the larynx is mostly composed of soft tissues (e.g., cartilage), which do not survive long after death. The only remnant from the Neanderthal laryngeal region is the hyoid bone (Fitch, 2000; Steele et al., 2013). Based on this single bone, or on computer simulations and tentative vocal tract reconstructions, it is difficult to characterize the full anatomy of the Neanderthal vocal apparatus, and opinions remain split as to whether it was similar to modern humans (Boë et al., 2002; Fitch, 2000; Lieberman P. and McCarthy C., 2014; Steele et al., 2013). The results we report, which are based on reconstructions of ancient DNA methylation patterns, provide a novel means to analyze the mechanisms that underlie the evolution of the human face and vocal tract.
We have shown here that genes affecting vocal and facial anatomy went through extensive methylation changes in recent MH evolution, following the split from Neanderthals and Denisovans. These alterations are manifested both in the number of divergent genes and in the extent of changes within each gene. The DMRs we report capture substantial methylation changes (over 50% between at least one pair of human groups), span thousands or tens of thousands of bases, and cover regulatory regions such as promoters and enhancers. Many of these methylation changes were shown here and in previous works to be tightly linked with changes in expression levels. We particularly focused on changes in the regulation of the five most diverged genes on the MH lineage: SOX9, ACAN, COL2A1, XYLT1 and NFIX, which are all associated with a range of skeletal phenotypes, and whose downregulation was shown to underlie a retracted face, as well as changes to the structure of the larynx.
In this paper, we argue for possible interplay between methylation changes and phenotypic effects. Such connections are not straightforward, because almost all studies linking genes to diseases and phenotypes seek sequence mutations, and particularly those that affect protein sequence. Nevertheless, many diseases-causing genetic variants are loss-of-function mutations, especially those that cause haploinsufficiency, and their effect could be roughly paralleled to partial silencing of a gene. Therefore, phenotypes associated with such loss-of-function genetic variants could be regarded as consequences of reduced gene activity in humans. To support our inference on the facial and laryngeal phenotypic impacts of methylation changes in SOX9, ACAN, COL2A1, XYLT1 and NFIX we verified that these phenotypes are indeed a result of loss-of-function mutations, see below.
NFIX poses a particularly interesting example, as the methylation levels in its two DMRs strongly predict its expression level (Figure 6B,C). To investigate whether changes in NFIX expression could explain some specific morphological changes in MH face and larynx, we examined its skeletal phenotypes. Mutations in NFIX were shown to be behind the Marshall-Smith and Malan syndromes, whose phenotypes include various skeletal alterations such as hypoplasia of the midface, retracted lower jaw, and depressed nasal bridge (Malan et al., 2010), as well as limited speech capabilities (Shaw et al., 2010). In many of the patients, the phenotypic alterations are driven by heterozygous loss-of-function mutations causing haploinsufficiency, showing that changes in NFIX dosage affect skeletal morphology (Malan et al., 2010). Given that reduced activity of NFIX drives these symptoms, a possible hypothesis is that increased NFIX activity in the Neanderthal would result in phenotypic changes in the opposite direction. Such opposite phenotypic effects of under- and over-expression of genes has been demonstrated previously in hundreds of genes, and especially within transcription factors (Dang et al., 2008; Hamosh et al., 2005; Strande et al., 2017). Indeed, we found this to be the case in 18 out of 22 Marshall-Smith syndrome skeletal phenotypes, and in 8 out of 9 Malan syndrome skeletal phenotypes. In other words, from the syndromes driven by NFIX haploinsufficiency, through healthy MHs, to the Neanderthal, the level of phenotype manifestation corresponds to the level of NFIX activity (Figure 6F, Table S6). Interestingly, many cases of laryngeal malformations in the Marshall-Smith syndrome have been reported (Cullen et al., 1997). Some of the patients exhibit positional changes of the larynx, changes in its width, and structural alterations to the arytenoid cartilage – the anchor point of the vocal cords, which controls their movement (Cullen et al., 1997). In fact, these laryngeal and facial anatomical changes are thought to underlie the limited speech capabilities observed in some patients (Shaw et al., 2010).
In light of the role of facial flattening in determining speech capabilities, it is illuminating that flattening of the face is the most common phenotype associated with reduced activity of SOX9, ACAN and COL2A1 (Gokhman et al., 2017a). Heterozygous loss-of-function mutations in SOX9, which result in a reduction of ∼50% in its activity, were shown to cause a retracted lower face, and to affect the pitch of the voice (Lee and Saint-Jeannet, 2011; Meyer et al., 1997). ACAN was shown to affect facial prognathism and the hoarseness of the voice (Tompson et al., 2009). COL2A1 is key for proper laryngeal skeletal development (Frenzel et al., 1998), and its decreased activity results in a retracted face (Hoornaert et al., 2010). The lower and midface of MHs is markedly retracted not only compared to apes, but also to Australopithecines, and other Homo groups, including the Neanderthal (Lieberman, 2011). The developmental alterations that underlie the ontogeny of the human face, however, are still under investigation. Cranial base length and flexion were shown to play a role in the retracted face, as well as in vocal tract length (Aiello and Dean, 2002; Lieberman, 1998, 2011), but reduced growth rate, and heterochrony of spatio-temporal switches are thought to be involved as well (Bastir et al., 2007). Importantly, SOX9 and COL2A1 were implemented in the elongation and ossification of the cranial base (Horton WA, Rimoin DL, Hollister DW, 1979; Yan et al., 2005) and the methylation patterns we report all exist in the cranial base sample (I1583). Additionally, SOX9 is a key regulator of skeletal growth rate, and the developmental switch to ossification (Lee and Saint-Jeannet, 2011; Meyer et al., 1997). Importantly, facial retraction also occurred before the split of archaic and modern humans, and the faces of hominins are substantially shorter than those of chimpanzees and bonobos (Lieberman, 2011). Therefore, the DMGs we report could potentially be associated with recent facial retraction in MHs, but not with morphological changes that precede the split.
We identified DMRs in SOX9, ACAN, COL2A1, XYLT1 and NFIX as some of the most derived loci in MHs. These genes are active mainly in early stages of osteochondrogenesis, making the observation of differential methylation in mature bones puzzling at first glance. This could potentially be explained by two factors: (i) The DMRs might reflect early methylation changes in the mesenchymal progenitors of these cells that are carried on to later stages of osteogenesis. This possibility is supported by previous observations of many regulatory regions that are active during early development and maintain their active methylation marks in adult tissues, despite becoming inactive. In such regions, adult methylation states reflect earlier development, and DMRs in adult stages could reflect heterochrony or earlier alterations in activity levels (Hernando-Herraez et al., 2015a; Hon et al., 2013; Schultz et al., 2015). It is also supported by the observation that the methylation patterns of NFIX, SOX9, ACAN and COL2A1 are established in early stages of development and remain stable throughout differentiation from mesenchymal stem cells to osteocytes (Håkelien et al., 2014). Additionally, we show that the upstream mesenchymal enhancer of SOX9 (Yao et al., 2015) is differentially methylated in MHs (Figure 5B). (ii) Although expression levels of SOX9, ACAN and COL2A1 gradually decrease with the progress towards skeletal maturation, these genes were shown to be still expressed in later skeletal developmental stages in the larynx, vertebrae, limbs, and jaws, including in their osteoblasts (Moriarity et al., 2015; Ng et al., 1997; Rojas-Peña et al., 2014). Interestingly, these are also the organs that are most affected by mutations in these genes, implying that late stages of activity of these genes might still play important roles in morphological patterning (Frenzel et al., 1998; Hoornaert et al., 2010; Lee and Saint-Jeannet, 2011; Meyer et al., 1997; Tompson et al., 2009). It was also shown that facial growth patterns, which shape facial prognathism, differ between archaic and modern humans not only during early development, but also as late as adolescence (Lacruz et al., 2015).
To further investigate potential phenotypic consequences of the DMGs we report, we probed the HPO database (Köhler et al., 2014). For each skeleton-affecting phenotype, we determined whether it matches a known morphology separating Neanderthals and MHs. For example, FGFR3 was shown to affect the size of the iliac bones (HPO ID: HP:0000946) and in the Neanderthal, these bones are considerably hyperplastic compared to MHs (Weaver, 2009). We then counted for each gene (whether DMG or not) the fraction of its associated HPO phenotypes that are divergent between Neanderthals and MHs. We found that four out of the top five most differentially methylated genes (XYLT1, NFIX, ACAN and COL2A1) are found within the top 100 genes with the highest fraction of traits where Neanderthals and MHs differ (out of a total of 1,789 skeleton-related genes). In fact, COL2A1, which is the most differentially methylated gene, is also the gene associated with the most derived traits (63) compared to all genes throughout the genome (Table S7).
DNA methylation in some loci differs between cell types and sexes, changes with age, and might be affected by factors such as environment and diet (Gokhman et al., 2017b). In this work, we took measures to exclude such DMRs, and to remain with DMRs that likely represent evolutionary differences between the human groups. This was done by combining information from diverse methylation maps. In MH-derived DMRs, for example, we use only DMRs in which chimpanzees and archaic humans form a cluster that is distinct from the cluster of MHs (Figure 2A). Each of the two clusters contains samples from females and males, and from a variety of ages and bones (Table S1). Additionally, we show that these DMRs hold even when comparing methylation maps produced using the same technology, and from the same bone type, sex and age group (Figure S2A,B). Therefore, the observed differences are unlikely to be driven by these factors, but rather add credence to the notion that they reflect MH-specific evolutionary shifts. This is further supported by the phenotypic observations that facial prognathism in general, and facial growth rates in particular, are derived and reduced in MHs (Lacruz et al., 2015).
The results we presented here open a window to study the evolution of the MH face and vocal tract from a genetic perspective. Our data suggest shared genetic mechanisms that shaped these anatomical regions, and point to evolutionary events that separate MHs from the Neanderthal and Denisovan. The mechanisms leading to such extensive regulatory shifts, as well as if and to what extent these evolutionary changes affected speech capabilities are still to be determined.
Acknowledgements
We would like to thank Sagiv Shifman, Yoel Rak, Philip Lieberman, Rodrigo Lacruz, Erella Hovers, Anna Belfer-Cohen, Achinoam Blau, and Daniel Lieberman for their useful advice, Janet Kelso for providing data, and Maayan Harel for illustrations. L.C and E.M are supported by the Israel Science Foundation FIRST individual grant (ISF 1430/13). D.G. is supported by the Clore Israel Foundation. S.P. and K.P. were supported by ERC grant (No 694707) and the Max Planck Society. C.L.-F. is supported by FEDER and BFU2015-64699-P grant from the Spanish government. Funding for the collection and processing of the 850K chimp data was provided by the Leakey Foundation Research Grant for Doctoral Students, Wenner-Gren Foundation Dissertation Fieldwork Grant (Gr. 9310), James F. Nacey Fellowship from the Nacey Maggioncalda Foundation, International Primatological Society Research Grant, Sigma Xi Grant-in-Aid of Research, Center for Evolution and Medicine Venture Fund (ASU), Graduate Research and Support Program Grant (GPSA, ASU), and Graduate Student Research Grant (SHESC, ASU) to G.H. Collection of the chimpanzee bone from Tanzania was funded by the Jane Goodall Institute, and grants from the US National Institutes of Health (AI 058715) and National Science Foundation (IOS-1052693), and facilitated by Elizabeth Lonsdorf and Beatrice Hahn.