Molecular evolution of DNMT1 in vertebrates: duplications in marsupials followed by positive selection ====================================================================================================== * David Alvarez-Ponce * María Torres-Sánchez * Felix Feyertag * Asmita Kulkarni * Taylen Nappi ## Abstract DNA methylation is mediated by a conserved family of DNA methyltransferases (Dnmts). The human genome encodes five Dnmts: Dnmt1, Dnmt2, Dnmt3a, Dnmt3b and Dnmt3L. Despite their high degree of conservation among different species, genes encoding Dnmts have been duplicated and/or lost in multiple lineages throughout evolution, indicating that the DNA methylation machinery has some potential to undergo evolutionary change. However, little is known about the extent to which this machinery, or the methylome, varies among vertebrates. Here, we study the molecular evolution of Dnmt1, the enzyme responsible for maintenance of DNA methylation patterns after replication, in 79 vertebrate species. Our analyses show that all studied species exhibit a single copy of *DNMT1*, with the exception of tilapia and marsupials (tammar wallaby, koala, Tasmanian devil and opossum), each of which exhibits two apparently functional *DNMT1* copies. Our phylogenetic analyses indicate that *DNMT1* duplicated before the divergence of marsupials (i.e., at least ~75 million years ago), thus giving rise to two *DNMT1* copies in marsupials (copy 1 and copy 2). In the opossum lineage, copy 2 was lost, and copy 1 recently duplicated again, generating three *DNMT1* copies: two putatively functional genes (copy 1a and 1b) and one pseudogene (copy 1ψ). Both marsupial copies (*DNMT1* copies 1 and 2) are under purifying selection, and copy 2 exhibits elevated rates of evolution and signatures of positive selection, suggesting a scenario of neofunctionalization. This gene duplication might have resulted in modifications in marsupial methylomes and their dynamics. Keywords * *DNMT1* * gene duplication * marsupials * wallaby * koala * Tasmanian devil * opossum ## INTRODUCTION In vertebrate genomes, cytosine methylation is widespread (e.g., 60–90% of CpGs are methylated in mammals [1, 2]) and plays pivotal roles in the silencing of gene expression and transposable elements, gene imprinting, and X-chromosome inactivation [3]. DNA methylation is mediated by a conserved family of DNA methyltransferases (Dnmts). The human genome encodes five members of this family: Dnmt1, Dnmt2, Dnmt3a, Dnmt3b and Dnmt3L. Dnmt3a and Dnmt3b are responsible for *de novo* DNA methylation in germ cells and early embryos [4, 5]. An additional member of the Dnmt3 group, Dnmt3L, does not exhibit catalytic activity, but acts as a regulator of Dnmt3a and Dnmt3b. Once established by Dnmt3a and Dnmt3b, methylation patterns are maintained by Dnmt1, which copies them to the daughter DNA strand after replication [6]. Despite their sequence and structural similarity to Dnmt1 and Dnmt3s, Dnmt2 methylates the anticodon loop of aspartic acid transfer RNA, rather than DNA [7, 8]. Prior comparative analyses of distantly related organisms have revealed a number of gene duplications and losses in the evolutionary history of the genes encoding Dnmts. A number of organisms lack such genes (and DNA methylation), including the yeast *Saccharomyces cerevisiae* and the nematode *Caenorhabditis elegans*, and the number of Dnmts of each kind varies among lineages [2, 9 ,10 ,11 ,12 ,13 ,14]. For instance, *DNMT3C*, a mouse retrogene that evolved by duplication of *DNMT3B*, has been recently shown to be responsible for silencing young retrotransposons in the male germ line [15]. All three *DNMT* classes present in animals (classes 1, 2 and 3) are duplicated in some insect groups and completely absent from others [13]. Some insects, including Diptera, have lost DNA methylation, and insects with a methylome include some lacking Dnmt1s or Dnmt3s, indicating that neiher of the enzymes individually is essential for DNA methylation [13]. Little is known about the extent to which the DNA methylation machinery, or the methylome, may vary among vertebrates, and particularly among mammals. Molecular evolution studies of the DNA methylation machinery in vertebrates include some comparative analyses of members of the Dnmt3 group [16, 17], but less is known about the evolution of *DNMT1*. The human *DNMT1* gene has 40 exons and encodes a full, 1616-amino acid somatic isoform (Dnmt1s) and a truncated isoform expressed in oocytes (Dnmt1o), which lacks the first 118 amino acids. Dnmt1 proteins contain an N-terminal regulatory region and a C-terminal catalytic domain, separated by a KG repeat. In the Dnmt1s isoform, the regulatory region comprises a DNA methyltransferase associated protein (DMAP) binding domain, a nuclear localization signal (NLS), a replication foci targeting sequence (RFTS), a cysteine-rich DNA binding domain (CXXC), an autoinhibitory linker that prevents *de novo* methylation, and two bromo-adjacent homology domains (BAH1 and BAH2), among other protein-interaction domains (Fig. 1; for a comprehensive review, see ref. 18). A direct interaction between the N-terminal and the C-terminal domains seems to be necessary for enzyme activation [19]. Activated Dnmt1 shows high affinity for hemimethylated CG sites. *DNMT1*-null mouse embryos die soon after implantation and exhibit delayed development and structural abnormalities [20], and overexpression of *DNMT1* has been observed in multiple cancer tissues [21 ,22 ,23 ,24]. ![Fig. 1.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/01/13/247643/F1.medium.gif) [Fig. 1.](http://biorxiv.org/content/early/2018/01/13/247643/F1) Fig. 1. Structure of Dnmt1 proteins in human and in marsupials. The human Dnmt1s isoform is represented. Sites under positive selection specific to one of the sequences are represented in black. Sites under positive selection shared across multiple sequences (due to positive selection in an internal branch) are represented in green, and their coordinates are only indicated for the last sequence. Amino acid coordinates refer to the human protein. Dashed lines represent missing parts. DMAP, DNA methyltransferase associated protein-binding domain; PCNA, proliferating cell nuclear antigen-binding domain; NLS, nuclear localization signal; RFTS, replication foci targeting sequence; CXXC, cysteine-rich DNA binding domain; BAH, bromo-adjacent homology domains. Here, with the aim of identifying potential differences among the methylation machineries of verterates, we study the molecular evolution of *DNMT1* in 79 vertebrate species. Our analyses reveal that all studied species exhibit a single *DNMT1*, with the only exception of tilapia and marsupials (tammar wallaby, koala, Tasmanian devil and opossum), each of which exhibit two putatively functional *DNMT1* copies. Our phylogenetic analyses indicate that *DNMT1* duplicated before the divergence of marsupials (at least ~75 million years ago), thus giving rise to two *DNMT1* copies (copies 1 and 2) in marsupials. Copy 2 was subsequently lost in the opossum lineage, whereas copy 1 recently duplicated again twice in the opossum lineage, to generate three genes in this species: two putatively functional ones (copies 1a and 1b) and one pseudogene (copy 1ψ). Both marsupial copies (*DNMT1* copies 1 and 2) are under purifying selection, and copy 2 exhibits signatures of positive selection, suggesting a scenario of neofunctionalization. We discuss how the presence of two DNMT1s in marsupials might have affected their methylome. ## RESULTS ### *DNMT1* duplicated in a marsupial ancestor and one of the resulting copies further duplicated in an opossum ancestor We searched the complete genomes of 58 mammals, 5 birds, 2 reptiles, one amphibian and 13 fish (Table S1) for orthologs of the human *DNMT1* gene. The studied mammalian species included 53 placentarians, four marsupials (tammar wallaby [25], koala [26], Tasmanian devil [27] and opossum [28]) and one monotreme (platypus [29]) (Table S1). All studied genomes exhibit a single *DNMT1* copy, with the exception of tilapia and the four marsupials, each of which exhibits two putatively functional copies. In addition, 8 of the studied genomes (including opossum) exhibit pseudogenes maintaining homology to a substantial length of *DNMT1*. According to the annotations of the Ensembl database [30], the tilapia genome contains two DNMT1 copies (ensemble gene IDs: ENSONIG00000001574 and ENSONIG00000007221). The first copy encodes a full Dnmt1 protein (1505 amino acids). The second copy is located in a very small scaffold (AERX01074151.1, 3084 nucletides), which only covers exons 36–40 (184 amino acids). These exons are identical between both copies, but many differences (single-point mutations and indels) are observed in the introns. These observations indicate a very recent duplication of *DNMT1* in tilapia, but the fact that only a small portion of one of the copies is available prevents further analysis. Thus, it cannot be discarded that one of the copies is a pseudogene, or in the process of pseudogenization. Some of the *DNMT1* copies identified were unannotated, or their exon/intron structure was incorrectly annotated in the Ensembl [30] and nr databases. Where necessary, marsupial and platypus sequences were re-annotated manually using the human *DNMT1* as reference (see Methods), and incomplete sequences (due to their location in partially sequenced genomic regions) were completed using available RNA-seq data [31 ,32 ,33]. The resulting protein sequences are shown in Figs. 2 and 3. ![Fig. 2.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/01/13/247643/F2.medium.gif) [Fig. 2.](http://biorxiv.org/content/early/2018/01/13/247643/F2) Fig. 2. Alignment for the N-terminal part of Dnmt1 in human marsupials and platypus. The human sequence corresponds to the Dnmt1s isoform. Dashes represent alignment gaps or missing regions. Stretches of “X” symbols represent unsequenced regions. Single “X” symbols represent incomplete codons (e.g., due to frameshift mutations). ![Fig. 3.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/01/13/247643/F3.medium.gif) [Fig. 3.](http://biorxiv.org/content/early/2018/01/13/247643/F3) Fig. 3. Alignment for the C-terminal part of Dnmt1 in human, marsupials and platypus. The human sequence corresponds to the Dnmt1s isoform. Dashes represent alignment gaps or missing regions. Stretches of “X” symbols represent unsequenced regions. Single “X” symbols represent incomplete codons (e.g., due to frameshift mutations). In the case of opossum, the three *DNMT1* copies (two putatively functional genes and one pseudogene) are located in tandem in chromosome 3, suggesting two recent duplication events. The two koala sequences are also located in the same scaffold (NW_018344010.1, 26.8 Kb apart). Tasmanian devil’s scaffold GL841404.1 contains one of the copies and part (exons 37–39) of the other copy, 4.6 Kb apart; the other exons of the second copy are located in another two contigs, most likely due to assembly errors (see Methods). The wallaby copies are located in different scaffolds (GeneScaffold_10206 and GeneScaffold_8347); however, these scaffolds are small (45.9 and 90.7 Kb, respectively), and therefore we cannot discard the possibility that both wallaby copies are also closely linked (Table S1). The three opossum copies exhibit high sequence similarity (copy 1a vs. copy 1b: *d*N = 0.022; *d*S = 0.047; copy 1a vs. copy 1ψ: *d*N = 0.081; *d*S = 0.201; copy 1b vs. copy 1ψ: *d*N = 0.091; *d*S = 0.205; measures of divergence calculated using the Nei-Gojobori method [34] and the Jukes-Cantor correction [35] as implemented in DnaSP version 5.10.01 [36]), whereas the wallaby, koala and Tasmanian devil copies are much more divergent (wallaby’s copy 1 vs. copy 2: *d*N = 0.114; *d*S = 0.440; koala’s copy 1 vs. copy 2: *d*N = 0.120; *d*S = 0.395; Tasmanian devil’s copy 1 vs. copy 2: *d*N = 0.146; *d*S = 0.521) (Figs. 2 and 3). These observations, combined with the results of our phylogenetic analysis (Fig. 4), and the known marsupial phylogeny (among the studied species, wallaby and koala are the most closely related, followed by Tasmanian devil and opossum [37, 38]), suggest a scenario in which: (a) *DNMT1* duplicated in a common ancestor of marsupials, giving rise to copies 1 and 2; (b) copy 2 was lost from the opossum lineage; and (c) copy 1 was recently duplicated twice in the opossum lineage, giving rise to two putatively functional copies and one pseudogene. The relative order of the latter two events is unclear. Based on this inferred scenario, we named the three opossum copies as copy 1a (chromosome 3, positions 431,108,118–431,161,113), copy 1b (positions 431,298,625–431,342,040) and copy 1ψ (pseudogene, positions 431,228,446–431,291,545). Copy 1a was already reported by Ding et al. [39], and the presence of a second copy in opossum was noted by Nikkelsen et al. [28]. ![Fig. 4.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/01/13/247643/F4.medium.gif) [Fig. 4.](http://biorxiv.org/content/early/2018/01/13/247643/F4) Fig. 4. Phylogenetic tree showing the duplication of *DNMT1* in marsupials. Numbers in black represent bootstrap values. Numbers in blue or red above each branch represent *d*N/*d*S values according to the free-ratios model. For branches under positive selection according to the branch-site test, *d*N/*d*S ratios are represented in red and are followed by an asterisk. Internal branches are labelled with capitals letters. All marsupial and monotreme sequences lack exons 7–12 (using the human Dnmt1s isoform as reference), consistent with the opossum sequence reported by Ding et al. [39] (Fig. 1). A BLASTP search (*E-value* < 10−3) against all proteomes available in the Ensembl database failed to find any significant hit in non-placentarians, indicating that these exons, which encode amino acids 201–320, were acquired in placentarians. These amino acids overlap with the following regions: region of interaction with the PRC2/EED-EZH2 complex (amino acids 1–606), region of interaction with Dnmt3b (positions 149–217), NLS (positions 177–205) and homodimerization region (positions 310–502). In addition, koala’s copy 2 lacks exons 1–14 (first 347 amino acids), and Tasmanian devil’s copy 2 lacks exons 1–15 (amino acids 1–374). Thus, the encoded proteins lack the regions of interaction with DMAP (positions 18–103), Dnmt3a (positions 1–148), Dnmt3b (positions 149–217), and PCNA (positions 163–174), the NLS (positions 177–205), part of the homodimerization (positions 310–502) and RFTS (positions 331–550) regions, and the region of interaction with the PRC2/EED-EZH2 complex (positions 1–606). Nonetheless, all marsupial and monotreme Dnmt1s appear to include a complete CXXC domain, an autoinhibitory linker, the BAH1 and BAH2 domains, and the catalytic domain (Fig. 1), thus being potentially functional. The opossum pseudogene (copy 1ψ) lacks exons 1–16, 20, and 30–31, and contains five stop codons (two in exon 21, one in exon 23, one in exon 28, and one in the codon shared between exons 39 and 40) and two frameshift mutations (exons 18 and 26). ### Marsupial *DNMT1* copies are differentially expressed We next attempted to determine in which tissues, and to what extent, each copy is expressed. First, we searched the transcriptomes of a number of koala tissues [40] for transcripts corresponding to copy 1 and copy 2, finding only transcripts for copy 1. Second, we searched two Tasmanian devil transcriptomic datasets (lymph and spleen) for sequences similar to *DNMT1*, finding only reads for copy 1. Third, we mined RNA-seq data for 5 wallaby tissues (testes, male liver, female liver, male blood and female blood; ref. 32), and identified 11,267 reads specific to copy 1 and only 5 reads specific to copy 2 (another 735 reads matched both copies; Table 1). Finally, we mined RNA-seq data for 11 opossum tissues (testis and male and female brain, cerebral cortex, heart, kidney and liver; ref. 31). A total of 3831, 194 and 290 reads matched opossum’s copies 1a, 1b and 1ψ, respectively (Table 2). View this table: [Table 1.](http://biorxiv.org/content/early/2018/01/13/247643/T1) Table 1. Number of RNA-seq reads matching wallaby’s copies 1 and 2 View this table: [Table 2.](http://biorxiv.org/content/early/2018/01/13/247643/T2) Table 2. Number of RNA-seq reads matching opossum’s copies 1a, 1b and 1ψ ### Both marsupial *DNMT1* copies are under purifying selection We used PAML [41] to estimate the non-synonymous to synonymous divergence ratio (*d*N/*d*S) in each of the branches of the gene tree. We restricted this analysis to human and the four marsupials, as incomplete genomic data and annotation errors in many of the other species would have hindered our analyses. This ratio was substantially below one in all branches of the phylogeny, except in the internal branch leading to the most recent common ancestor (MRCA) of wallaby’s and koala’s copy 2 (Fig. 4). This indicates that nonsynonymous changes are under substantial purifying selection in all the sequences studied, suggesting that all copies are functional, or that they pseudogenized only recently – which is the case for opossum’s copy 1ψ (*d*N/*d*S = 0.618). The *d*N/*d*S ratios varied substantially among the different branches (Fig. 4). Indeed, the free-ratios model fit the data significantly better than the one-ratio model M0 (2Δ*ℓ* = 438.07, *P* = 3.71×10−83), indicating significant heterogeneity in the *d*N/*d*S ratios. Remarkably, *d*N/*d*S was substantially higher in copy 2 than in copy 1 (Fig. 4). In addition, *d*N/*d*S was 0.0019 in the branch leading to opossum’s copy 1a, and 0.7708 in the branch leading to opossum’s copy 1b. This increase in the *d*N/*d*S ratios of copy 2 (wallaby, koala and Tasmanian devil) and copy 1b (opossum) could be explained by a relaxation of purifying selection acting on protein sequences and/or by positive selection in these copies. ### Marsupials’ copy 2 of *DNMT1* is under positive selection We then used PAML to test for signatures of positive selection. The M8 vs. M7 test was significant (2Δ*ℓ* = 8.36, *P* = 0.015), indicating that a fraction of codons were under positive selection. We then used a branch-site test (model A vs. null model A1; refs. 42, 43) to infer the action of positive selection at each of the branches of the phylogeny, except the branch leading to the opossum pseudogene. The test was significant for the external branches leading to koala’s copy 2, Tasmanian devil’s copy 2, and opossum’s copy 1b, and for the internal branch leading to the MRCA of the copy 2 of wallaby, koala and Tasmanian devil. The *d*N/*d*S values for these branches are represented in red and marked with an asterisk in Fig. 4, and more detailed results are provided in Table 3. View this table: [Table 3.](http://biorxiv.org/content/early/2018/01/13/247643/T3) Table 3. Branch-site tests of positive selection A total of 9 codons were detected to be under positive selection: one in the opossum’s copy 1b, three in koala’s copy 2, three in Tasmanian devil’s copy 2, and 2 in the internal branch leading to the MRCA of copy 2 of wallaby, koala and Tasmanian devil. Sites under positive selection were different in each branch, and affected the catalytic domain (4 codons), the site of interaction with the PRC2/EED-EZH2 complex (6 codons), the BAH2 domain (2 codons) and the homodimerization domain (1 codon; Fig. 1; Table 3). ### Reanalysis removing incomplete sequences The marsupial *DNMT1* coding sequences (CDSs) used in this study are complete or almost complete (Figs. 2 and 3). The only notable exceptions are wallaby’s copy 2, for which 409 codons remain unsequenced due to limited genome coverage (2×; ref. 25), and the opossum pseudogene, which lacks 19 exons. This means that our natural selection analyses were limited to only 826 codons. We repeated our analysis after removing these sequences from our analysis, rendering 1172 codons analyzable(present in all sequences). We obtained similar results: First, the *d*N/*d*S ratio was substantially higher in copy 2 than in copy 1, and in opossum’s copy 1b (*d*N/*d*S = 0.808) than in opossum’s copy 1a (*d*N/*d*S = 0.000; Fig. 5). Second, positive selection was detected in the external branches leading to opossum’s copy 1b, koala’s copy 2 and Tasmanian devil’s copy 2, and in the internal branch leading to the MRCA of koala’s copy 2 and Tasmanian devil’s copy 2. This analysis detected a total of 21 codons under positive selection (including the 9 ones detected before), which affected the catalytic domain (4 codons), the site of interaction with the PRC2/EED-EZH2 complex (10 codons), the BAH2 domain (3 codons), the homodimerization domain (2 codons), the autoinhibiroty linker (1 codon), and the KG linker (1 codon; Fig. 1; Table S1). ![Fig. 5.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/01/13/247643/F5.medium.gif) [Fig. 5.](http://biorxiv.org/content/early/2018/01/13/247643/F5) Fig. 5. Phylogenetic tree showing the duplication of *DNMT1* in marsupials, removing wallaby’s copy 2 and opossum’s copy 1ψ. Numbers in black represent bootstrap values. Numbers in blue or red above each branch represent *d*N/*d*S values according to the free-ratios model. For branches under positive selection according to the branch-site test, *d*N/*d*S ratios are represented in red and are followed by an asterisk. Internal branches are labelled with capitals letters. ## DISCUSSION Our analyses indicate that the *DNMT1* gene duplicated in a common ancestor of marsupials, giving rise to two copies (copies 1 and 2). The opossum lineage and the wallaby/koala/Tasmanian devil lineage diverged ~75 million years ago [37, 38], implying that the *DNMT1* duplication occurred prior to that time. Copy 2 was subsequently lost in the opossum lineage. Copy 2 is expressed at very low, or even undetectable levels, at least in the wide range of wallaby (Table 1), koala [40] and Tasmanian devil tissues examined. However, both copies exhibit *d*N/*d*S ratios lower than one, and none exhibit signatures of pseudogenization (premature stop codons or frameshift mutations) indicating that they areexpressed—perhaps in tissues not included in our analyses, in early developmental stages or under certain environmental conditions— and functional. Otherwise, signatures of pseudogenization and a *d*N/*d*S close to 1 would be expected. Part of the regulatory region of koala’s and Tasmanian devil’s copy 2 appear to have been lost; however, all *DNMT1* copies retain the catalytic domain and a significant fraction of the regulatory region, suggesting that they are functional—of note, the human DNMT1o isoform is functional despite also lacking part of the regulatory region. Remarkably, copy 2 exhibits a high *d*N/*d*S ratio compared to copy 1, in addition to signatures of positive selection. These results suggest a scenario of neofunctionalization, in which copy 1 may have retained the function of the ancestral *DNMT1*, and copy 2 may have acquired a new or modified function. Signatures of positive selection can be detected in the branch leading to the MRCA of wallaby’s, koala’s, and Tasmanian devil’s copy 2, and in the external branches leading to koala’s and Tasmanian devil’s copy 2 (Figs. 4 and 5; Tables 3 and 4). These observations indicate that neofunctionalization occurred both before and after the divergence of wallaby, koala and Tasmanian devil (i.e., both before and after ~60 million years ago; refs. 37, 38). Substitutions under positive selection affect different domains, making it difficult to predict how they may have affected the function of copy 2. Copy 1 recently underwent another two duplication events in the opossum lineage, which resulted in three genes (copies 1a, 1b and the pseudogene 1ψ) located in tandem in chromosome 3. Their high degree of similarity, along with our phylogenetic analyses (Fig. 4), indicate that these sequences are the result of the duplication of the copy 1 of *DNMT1*, and that they are not remnants of the ancestral duplication identified in the other marsupials. Opossum’s copy 1b also exhibits an elevated *d*N/*d*S (compared to copy 1b) and signatures of positive selection, which would also suggest neofunctionalization in the copy 1b. However, in this case we are skeptical about our inference of positive selection, because the only codon inferred to be under positive selection with high probability (V513 in the human protein, a tryptophan in opossum’s copy 1b) is located near an unsequenced region of the opossum genome (Fig. 2), and such regions are prone to sequencing errors. Opossum’s copy 1b is expressed at lower levels than copy 1a in the tissues included in our analyses (Table 2). It is currently not possible to infer the functions of marsupial *DNMT1* derived duplicates (copy 2 of wallaby, koala and Tasmanian devil and copy 1b of opossum). We propose three different possible scenarios. First, as both marsupial *DNMT1* copies seem to be expressed in different sets of tissues (Tables 1 and 2), positive selection in the derived *DNMT1* copies may simply reflect subtle adjustments to the biochemistry of the tissue or tissues in which they are expressed. Second, assuming that the function of both marsupial *DNMT1* copies is similar to that of the ancestral *DNMT1*—maintenance of methylation patterns throughout the life of the animal after each DNA replication event— it is possible that an increased Dnmt1 abundance may cause marsupial methylomes to be particularly stable during aging—in other mammals methylation patterns change during the lifespan of an organism [44]. This, however would only apply to the unknown tissue or tissues (or developmental stages or environmental conditions) in which the derived copies are expressed at substantial levels. Third, the duplication of *DNMT1* may have caused marsupial genomes to be hypermethylated. Given that methylated cytosines have an increased mutation rate [45], this scenario might explain the low GC content of marsupial genomes [25, 27, 28, 46]. However, this scenario would require that the derived *DNMT1* copies would act as *de novo* DNMTs rather than maintenance DNMTs, which is at odds with the presence of an autoinhibitory linker in the proteins encoded by both copies. Additional functional studies of marsupial Dnmt1s, and methylome data for marsupials —which is currently unavailable— will be required to establish their functions. ## CONCLUSIONS Our analyses of 79 vertebrate genomes reveal that all studied species exhibit a single DNMT1, with the exception of tilapia and marsupials (wallaby, koala, Tasmanian devil and opossum), each of which exhibit two apparently functional *DNMT1* copies. Our phylogenetic analyses indicate that *DNMT1* duplicated before the divergence of marsupials (at least ~75 million years ago), thus giving rise to *DNMT1* copies 1 and 2. Copy 2 was lost in the opossum lineage, and copy 1 recently duplicated again to generate three opossum genes: two putatively functional ones and one pseudogene. Both *DNMT1* copies are under purifying selection, and copy 2 is under positive selection. These results suggest a scenario of neofunctionalization. ## METHODS ### Gene identification and annotation In order to identify *DNMT1* orthologs in the studied vertebrate genomes, we conducted TBLASTN searches against the Ensembl database (release 90; ref. 30), using the human Dnmt1s protein sequence as query and an *E-value* cut-off of 10−10. The koala genome was queried in the nr database, as it is not represented in Ensembl. Only scaffolds with at least 450 identities (added across the different TBLASTN hits) were considered. Where necessary, wallaby, koala, Tasmanian devil, opossum and platypus sequences were manually re-annotated using the intron/exon structure of human *DNMT1* as reference. For that purpose, incorrectly annotated exons (those not showing significant similarity to the human sequence) were removed, and missing exons were searched for using TBLASTN and BLASTN searches. Putative stop codons and frameshift mutations were confirmed by visualization of the corresponding original reads in the trace archive database. In the case of Tasmanian devil’s copy 2 and platypus’ *DNMT1*, exons present in different scaffolds were combined into a single gene annotation. The platypus *DNMT1* exons are distributed along two small contigs: Contig12710 (18.1 Kb) and Contig19880 (17.7 Kb) (Table S1). In the current Tasmanian devil assembly, the exons of copy 2 are distributed across three different scaffolds: exons 16–24 are located in scaffold GL841374.1 (4.0 Mb), exons 25–36 are located in GL843446.1 (17.2 Kb), and exons 37–39 are locate in GL841404.1 (1.6 Mb); this is probably the result of assembly errors. Some of the exons of wallaby’s copies 1 and 2, opossum’s copy 1b, and the single copy of platypus, could not be recovered (or completely recovered) from available genome assemblies because they were located in unsequenced regions. We thus attempted to recover these exons from available RNA-seq datasets [31-33]. In the case of wallaby’s copy 2, this was not possible due to the very few reads available (Table 1), and in the case of opossum’s copy 1b it was not possible either due to the high similarity between copies 1a and 1b. ### Gene expression levels in different tissues We used koala’s copy 2 as query in a TBLASTN search against the koala transcriptome [40]; all retrieved copies, however, corresponded to copy 1. Similarly, we used Tasmanian devil’s copy 2 as query in a TBLASTN search against all the RNA-seq reads available for two Tasmanian devil tissues (lymph and spleen; SRA accession numbers: ERR695583 and ERR695584), finding again only reads corresponding to copy 1. We next mined RNA-seq datasets for a number of tissues of wallaby [32] and opossum [31], in order to measure expression levels of each of the *DNMT1* copies in the different tissues. For each read, it was determined whether it perfectly matched (it was contained in) one or more of the copies in the genome of interest, using an in-house PERL script. Reads that matched more than one copy were not used to compute expression levels. ### Phylogenetic analyses The CDSs of human, wallaby, koala, Tasmanian devil, opossum and platypus were translated in silico into protein sequences. The protein sequences were aligned using ProbCons version 1.12 [47], and the resulting sequences were used to guide the alignment of the CDSs. Alignments were visualized and, where necessary, manually edited using BioEdit version 7.2.5 [48]. A phylogenetic tree was obtained using the maximum-likelihood method implemented in MEGA7 [49], using the Tamura-Nei model [50] and 1000 bootstraps. ### Natural selection analyses The codeml program in the PAML package, version 4.4d [41] was used to conduct natural selection analyses. The free-ratios model was used to calculate a separate *d*N/*d*S for each of the branches of the gene tree. Heterogeneity of *d*N/*d*S among branches was tested by comparing the likelihoods of the free-ratios model and model 0, which assumes a homogeneous *d*N/*d*S across all sites and branches. This comparison was conducted using a likelihood ratio test [51], assuming that twice the difference between the log-likelihoods of both models 2Δ*ℓ* = 2 × (*ℓ*FR − *ℓ*M0), where *ℓi* is the log-likelihood of model *i*, followed a chi-squared distribution with a number of degrees of freedom equivalent to the difference between the number of parameters of both nested models. To infer the presence of codons under positive selection, we first compared the likelihoods of models M8 and M7. Positive selection was inferred if model M8 (which allows for a class of codons with *d*N/*d*S > 1) fitted the data significantly better than mode M7 (which allows *d*N/*d*S to vary between 0 and 1). The statistic 2Δ*ℓ* = 2 × (*ℓ*M8 − *ℓ*M7) was assumed to follow a chi-squared distribution with two degrees of freedom. Next, for each of the branches in the gene tree, a branch-site test of positive selection (Test 2; [42, 43]) was conducted. Positive selection was inferred if model A fitted the data significantly better than null model A1. The statistic 2Δ*ℓ* = 2 × (*ℓ*MA − *ℓ*MA1) was assumed to follow a 50%:50% mixture of a point of mass 0 and a chi-squared distribution with one degree of freedom. The Bayes Empirical Bayes approach [42] was used to identify codons under positive selection (posterior probability ≥ 95%). ## Authors’ contributions DAP conceived the work and wrote the manuscript. All authors participated in the analysis, and read and approved the final submission. ## SUPPORTING INFORMATION Table S1: DNMT1 copies in vertebrate genomes. Table S2: Branch-site tests of positive selection excluding wallaby’s copy 2 and opossum’s copy 1ψ. ## ACKNOWLEDGEMENTS The authors are grateful to Soojin Yi and Julio Rozas for helpful feedback. Computational Resources were provided by Information Technology Operations of the University of Nevada, Reno. This work was supported by a Pilot Grant from the Smooth Muscle Plasticity COBRE of the University of Nevada, Reno, funded by the National Institutes of Health (grant 5P30GM110767-04). MTS was supported by a FPI predoctoral fellowship (BES-2013-062723) and a travel grant (EEBB-I-16-11395) from the Ministry of Economy and Competitiveness of Spain. * Received January 13, 2018. * Revision received January 13, 2018. * Accepted January 13, 2018. * © 2018, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## REFERENCES 1. 1.Bird AP. CpG-rich islands and the function of DNA methylation. Nature. 1986;321(6067): 209–213. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/321209a0&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=2423876&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1986C330700035&link_type=ISI) 2. 2.Hodges E, Smith AD, Kendall J, Xuan Z, Ravi K, Rooks M, Zhang MQ, Ye K, Bhattacharjee A, Brizuela L et al. High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Res. 2009;19(9): 1593–1605. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjE5LzkvMTU5MyI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE4LzAxLzEzLzI0NzY0My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. 3.Lee JT. Molecular links between X-inactivation and autosomal imprinting: X-inactivation as a driving force for the evolution of imprinting? Curr Biol. 2003;13(6): R242–254. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/S0960-9822(03)00162-3&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=12646153&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000181696800015&link_type=ISI) 4. 4.Okano M, Bell DW, Haber DA, Li E. DNA, methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell. 1999;99(3): 247–257. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/S0092-8674(00)81656-6&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=10555141&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000083440600003&link_type=ISI) 5. 5.Okano M, Xie S, Li E., Cloning and characterization of a family of novel mammalian DNA (cytosine-5) methyltransferases. Nat Genet. 1998;19(3): 219–220. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/890&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=9662389&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000074565900012&link_type=ISI) 6. 6.Leonhardt H, Page AW, Weier HU, Bestor TH., A targeting sequence directs DNA methyltransferase to sites of DNA replication in mammalian nuclei. Cell. 1992;71(5): 865–873. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/0092-8674(92)90561-P&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=1423634&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1992JZ63300015&link_type=ISI) 7. 7.Jeltsch A, Nellen W, Lyko F: Two substrates are better than one: dual specificities for Dnmt2 methyltransferases. Trends Biochem Sci. 2006;31(6): 306–308. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.tibs.2006.04.005&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=16679017&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000239064100002&link_type=ISI) 8. 8.Goll MG, Kirpekar F, Maggert KA, Yoder JA, Hsieh CL, Zhang X, Golic KG, Jacobsen SE, Bestor TH. Methylation of tRNAAsp by the DNA methyltransferase homolog Dnmt2. Science. 2006;311(5759): 395–398. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMTEvNTc1OS8zOTUiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 9. 9.Borkovich KA, Alex LA, Yarden O, Freitag M, Turner GE, Read ND, Seiler S, Bell-Pedersen D, Paietta J, Plesofsky N et al. Lessons from the genome sequence of Neurospora crassa: tracing the path from genomic blueprint to multicellular organism. Microbiol Mol Biol Rev. 2004;68(1): 1–108. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoibW1iciI7czo1OiJyZXNpZCI7czo2OiI2OC8xLzEiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 10. 10.Feng S, Cokus SJ, Zhang X, Chen PY, Bostick M, Goll MG, Hetzel J, Jain J, Strauss SH, Halpern ME et al. Conservation and divergence of methylation patterning in plants and animals. Proc Natl Acad Sci U S A. 2010;107(19): 8689–8694. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTA3LzE5Lzg2ODkiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 11. 11.Zemach A, McDaniel IE, Silva P, Zilberman D., Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science. 2010;328(5980): 916–919. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMjgvNTk4MC85MTYiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 12. 12.Jeltsch A. Molecular biology. Phylogeny of methylomes. Science. 2010; 328(5980): 837–838. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMjgvNTk4MC84MzciO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 13. 13.Bewick AJ, Vogel KJ, Moore AJ, Schmitz RJ., Evolution of DNA methylation across insects. Mol Biol Evol. 2017; 34(3): 654–665. 14. 14.Goll MG, Bestor TH. Eukaryotic cytosine methyltransferases. Annu Rev Biochem. 2005; 74: 481–514. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1146/annurev.biochem.74.010904.153721&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=15952895&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000231235100017&link_type=ISI) 15. 15.Barau J, Teissandier A, Zamudio N, Roy S, Nalesso V, Hérault Y, Guillou F, Bourc’his D. The DNA methyltransferase DNMT3C protects male germ cells from transposon activity. Science. 2016;354(6314):909–912. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNTQvNjMxNC85MDkiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 16. 16.Yokomine T, Hata K, Tsudzuki M, Sasaki H., Evolution of the vertebrate DNMT3 gene family: a possible link between existence of DNMT3L and genomic imprinting. Cytogenet Genome Res. 2006; 113(1–4): 75–80. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1159/000090817&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=16575165&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000238614500011&link_type=ISI) 17. 17.Campos C, Valente LM, Fernandes JM. Molecular evolution of zebrafish dnmt3 genes and thermal plasticity of their expression during embryonic development. Gene. 2012; 500(1): 93–100. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.gene.2012.03.041&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22450363&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) 18. 18.Qin W, Leonhardt H, Pichler G., Regulation of DNA methyltransferase 1 by interactions and modifications. Nucleus. 2011; 2(5): 392–402. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.4161/nucl.2.5.17928&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=21989236&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000208669200008&link_type=ISI) 19. 19.Fatemi M, Hermann A, Pradhan S, Jeltsch A., The activity of the murine DNA, methyltransferase Dnmt1 is controlled by interaction of the catalytic domain with the N-terminal part of the enzyme leading to an allosteric activation of the enzyme after binding to methylated DNA. J Mol Biol. 2001; 309(5): 1189–1199. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1006/jmbi.2001.4709&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=11399088&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000169586000014&link_type=ISI) 20. 20.Li E, Bestor TH, Jaenisch R., Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell. 1992; 69(6): 915–926. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/0092-8674(92)90611-F&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=1606615&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1992HY79200004&link_type=ISI) 21. 21.el-Deiry WS, Nelkin BD, Celano P, Yen RW, Falco JP, Hamilton SR, Baylin SB., High expression of the DNA methyltransferase gene characterizes human neoplastic cells and progression stages of colon cancer. Proc Natl Acad Sci U S A. 1991; 88(8): 3470–3474. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czo5OiI4OC84LzM0NzAiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8wMS8xMy8yNDc2NDMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 22. 22.Oh BK, Kim H, Park HJ, Shim YH, Choi J, Park C, Park YN. DNA methyltransferase expression and DNA methylation in human hepatocellular carcinoma and their clinicopathological correlation. Int J Mol Med. 2007; 20(1): 65–73. [PubMed](http://biorxiv.org/lookup/external-ref?access_num=17549390&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) 23. 23.Robertson KD, Uzvolgyi E, Liang G, Talmadge C, Sumegi J, Gonzales FA, Jones PA., The human DNA methyltransferases (DNMTs) 1, 3a and 3b: coordinate mRNA expression in normal tissues and overexpression in tumors. Nucleic Acids Res. 1999; 27(11): 2291–2298. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/nar/27.11.2291&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=10325416&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000080678100008&link_type=ISI) 24. 24.Hermann A, Gowher H, Jeltsch A., Biochemistry and biology of mammalian DNA methyltransferases. Cell Mol Life Sci. 2004; 61(19–20): 2571–2587. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1007/s00018-004-4201-1&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=15526163&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000224888600015&link_type=ISI) 25. 25.Renfree MB, Papenfuss AT, Deakin JE, Lindsay J, Heider T, Belov K, Rens W, Waters PD, Pharo EA, Shaw G et al. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol. 2011; 12(8): R81. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/gb-2011-12-8-r81&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=21854559&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) 26. 26.Johnson RN, Hobbs M, Eldridge MD, King AG, Colgan DJ, Wilkins MR, Chen Z, Prentis PJ, Pavasovic A, Polkinghorne A. The koala genome corsortium. Technical Reports of the Australian Museum. 2014; 24: 91–92. 27. 27.Murchison EP, Schulz-Trieglaff OB, Ning Z, Alexandrov LB, Bauer MJ, Fu B, Hims M, Ding Z, Ivakhno S, Stewart C., Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer. Cell. 2012; 148(4): 780–791. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2011.11.065&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22341448&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000300622400018&link_type=ISI) 28. 28.Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A et al. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature. 2007; 447(7141): 167–177. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature05805&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=17495919&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000246338700035&link_type=ISI) 29. 29.Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP, Grutzner F, Belov K, Miller W, Clarke L, Chinwalla AT et al. Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008; 453(7192): 175–183. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature06936&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=18464734&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000255592400030&link_type=ISI) 30. 30.Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P. Ensembl 2017. Nucleic Acids Res. 2016; 45(D1): D635–D642. 31. 31.Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M. The evolution of gene expression levels in mammalian organs. Nature. 2011; 478(7369): 343–8. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature10532&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22012392&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000296021100037&link_type=ISI) 32. 32.Cortez D, Marin R, Toledo-Flores D, Froidevaux L, Liechti A, Waters PD, Gruetzner F, Kaessmann H., Origins and functional evolution of Y chromosomes across mammals. Nature. 2014; 508(7497): 488–93. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature13151&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=24759410&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000334741600029&link_type=ISI) 33. 33.Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, Zeller U, Baker JC, Grützner F, Kaessmann H. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature. 2014; 505(7485): 635–40. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature12943&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=24463510&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000330321000031&link_type=ISI) 34. 34.Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986; 3(5): 418–426. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/oxfordjournals.molbev.a040410&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=3444411&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1986E136000004&link_type=ISI) 35. 35.1. Munro H Jukes TH, Cantor CR. Evolution of protein molecules. In: Munro H, editor. Mammalian Protein Metabolism. 1969. pp. 21–132. 36. 36.Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009; 25(11): 1451–1452. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp187&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=19346325&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000266109500026&link_type=ISI) 37. 37.Meredith RW, Westerman M, Case JA, Springer MS. A phylogeny and timescale for marsupial evolution based on sequences for five nuclear genes. J Mamm Evol. 2008; 15(1): 1–36. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=doi:10.1007/s10914-007-9062-6&link_type=DOI) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000256278200001&link_type=ISI) 38. 38.Meredith RW, Westerman M, Springer MS., A phylogeny of Diprotodontia (Marsupialia) based on sequences for five nuclear genes. Mol Phylogenet Evol. 2009; 51(3): 554–571. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.ympev.2009.02.009&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=19249373&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000266222900013&link_type=ISI) 39. 39.Ding F, Patel C, Ratnam S, McCarrey JR, Chaillet JR. Conservation of Dnmt1o cytosine methyltransferase in the marsupial Monodelphis domestica. Genesis. 2003; 36(4): 209–213. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1002/gene.10215&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=12929092&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000185197200006&link_type=ISI) 40. 40.Hobbs M, Pavasovic A, King AG, Prentis PJ, Eldridge MD, Chen Z, Colgan DJ, Polkinghorne A, Wilkins MR, Flanagan C. A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity. BMC Genomics. 2014; 15(1): 786. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-15-786&link_type=DOI) 41. 41.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007; 24(8): 1586–1591. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/molbev/msm088&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=17483113&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000248848400003&link_type=ISI) 42. 42.Yang Z, Wong WS, Nielsen R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005; 22(4): 1107–1118. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/molbev/msi097&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=15689528&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000228139400031&link_type=ISI) 43. 43.Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005; 22(12): 2472–2479. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/molbev/msi237&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=16107592&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000233361500014&link_type=ISI) 44. 44.Richardson B. Impact of aging on DNA methylation. Ageing Res Rev. 2003; 2(3): 245–261. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/S1568-1637(03)00010-2&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=12726774&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000183045400001&link_type=ISI) 45. 45.Mugal CF, Arndt PF, Holm L, Ellegren H., Evolutionary consequences of DNA methylation on the GC content in vertebrate genomes. G3 (Bethesda). 2015; 5(3): 441–447. 46. 46.Romiguier J, Ranwez V, Douzery EJ, Galtier N., Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res. 2010; 20(8): 1001–1009. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjIwLzgvMTAwMSI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE4LzAxLzEzLzI0NzY0My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 47. 47.Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005; 15(2): 330–340. [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjE1LzIvMzMwIjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMDEvMTMvMjQ3NjQzLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 48. 48.Hall TA. BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser. 1999; 41: 95–98. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1046/j.1462-2920.2002.00362.x&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=12460286&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) 49. 49.Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016; 33(7): 1870–1874. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/molbev/msw054&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=27004904&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) 50. 50.Tamura K, Nei M., Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993; 10(3): 512–526. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/oxfordjournals.molbev.a040023&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=8336541&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F01%2F13%2F247643.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1993LD11400002&link_type=ISI) 51. 51.Whelan S, Goldman N., Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics. Mol Biol Evol. 1999; 16:1292–1299. [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/oxfordjournals.molbev.a026219&link_type=DOI) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000082431300018&link_type=ISI)