Abstract
Background Cross-species studies of epigenetic regulation have great potential, yet most epige-nome mapping has focused on human, mouse, and a small number of other model organisms. Here we explore whether existing reference epigenome collections can be leveraged for analyzing other species, by extrapolation and predictive transfer of epigenome information from established model organisms to less well annotated non-model organisms.
Results We developed a methodology for cross-species mapping of epigenome data, which we used for predicting tissue-specific gene expression across twelve mammalian and one avian species. Specifically, we trained gradient boosting classifiers to predict gene expression status from reference epigenome data in human and mouse, and we applied these classifiers to epigenome profiles that were computationally transferred between species. The resulting predictions indeed identified tissue-specific differences in gene expression in the target species, thus providing initial validation of the concept of cross-species epigenome extrapolation.
Conclusions Our study establishes a workflow for cross-species epigenome mapping and epigenome-based prediction of gene expression, highlighting the future potential of using epigenome maps from reference species to annotate a potentially large number of target species.
Background
Cross-species genome analysis is widely used for dissecting evolutionary processes, identifying regulatory elements, improving genomic annotations, and studying the mechanisms underlying human diseases [1–7]. Recent progress with epigenome profiling technology has added a new dimension to genome comparisons. Large international consortia such as the International Human Epigenome Consortium [8] (which includes BLUEPRINT [9] and DEEP [10]) and the ENCODE project [11,12] produce an ever-increasing number of reference epigenomes, focused largely on human and mouse. The analysis of the resulting datasets has greatly enhanced our understanding of genomic functional elements and tissue-specific gene regulation [13–15].
Cross-species comparisons that incorporate epigenomics and functional genomics data have opened up new ways of investigating evolutionary processes, looking beyond genomic sequence conservation [16,17]. However, due to the high cost of generating epigenome data, as well as the cell-type specific and dynamic nature of epigenomic marks, current reference epigenome datasets have been limited to a handful of species, most notably human and mouse. Genome sequencing efforts for other vertebrate species rarely include epigenome profiling [18,19], which has hampered the investigation of epigenome regulation in non-model organisms and precluded systematic cross-species epigenome analyses (with a few notable exceptions [16,20,21]).
The goal of this study was to explore and evaluate cross-species extrapolation of epigenome data and epigenome-based inference of gene expression in a range of target species, based on existing reference epigenome maps for human and mouse. To that end, we established computational epigenome transfer and prediction of gene expression (Figure 1) for twelve mammalian and one avian species (Additional file 1: Figure S1). We first transferred epigenome data from our reference species (human or mouse) to the target species using whole-genome alignments. Based on the transferred epigenome data, we predicted tissue-specific gene expression in the target species using machine learning models trained and cross-validated on data from the reference species. For validation, the predictions were compared with tissue-specific expression profiles of the target species, confirming that such cross-species predictions can be useful and informative.
Results
Cross-species transfer of reference epigenome data using whole-genome alignments
To test the feasibility of extrapolating and transferring epigenome data across species, we assembled a dataset comprising two reference species with extensive epigenome data (human and mouse) as well as eleven additional target species, for which we required only a reference genome and whole-genome alignments with at least one of the two reference species (Additional file 1: Figure S1). While many more vertebrate genomes have been sequenced [19,22], our species selection was based on the availability of standardized genome resources for fair and consistent performance evaluation (see Materials and Methods). The epigenome data for the reference species were obtained from public sources and include the following cell types (Additional file 2: Table S1): embryonic stem cells (human and mouse) [11,23]; CD4+ T cells (human and mouse) [9,10,24]; hepatocytes (human) [10]; whole-organ samples of liver, kidney, and heart (mouse) [23]. For each cell type, ChIP-seq profiles for three histone marks were included in the analysis: histone H3K4me3 (which marks open chromatin at gene promoters), H3K27ac (open chromatin at promoter and enhancer regions), and H3K36me3 (actively transcribed regions).
Our bioinformatics pipeline uses epigenome profiles for the reference species as input and exploits whole-genome alignments to transfer these data to conserved genomic regions in the target species. To that end, we prepared reciprocally symmetric whole-genome alignments for each pair of reference species and target species, and we used these alignments to transfer histone signal intensities. Our approach is based on the hypothesis that sequence conservation is an indicator of broader regulatory conservation within a genomic region, and that tissue-specific patterns of epigenome regulation are frequently maintained across species in sequence-conserved regions. (We tested and confirmed this hypothesis by using the transferred epigenome data to predict tissue-specific gene expression in the target species – as described further below.)
Using alignment-based epigenome transfer, we produced genome-wide, cell-type specific epigenome profiles for each target species. The non-zero part of the transferred histone signal covers a substantial amount of the overall aligned bases in the target species (ranging from human and mouse to opossum and chicken), based on the two reference species human (Figure 2) and mouse (Additional file 1: Figure S2). The distribution of the transferred histone signal followed patterns similar to that of the measured histone signal in the reference species: H3K4me3 signal was present at the smallest number of bases, consistent with this mark’s focus on gene promoters. The more widespread occurrence of H3K27ac and H3K36me3 is similarly consistent with the focus of H3K27ac on a broader set of regulatory regions (including enhancer elements) compared to H3K4me3, and H3K36me3 covers the gene body of actively transcribed genes.
Further analysis of average histone signal strength in gene promoters, gene bodies, and other genomic regions showed that, after cross-species transfer, the histone signal strength in the target species still resemble the distribution in the reference species (Additional file 1: Figure S3A/B): The transferred signal was generally strongest in gene bodies and in gene promoters, while it was weak outside of protein-coding genes. Since we only used the histone signal in gene promoters and in gene bodies for predictive modeling of gene expression, the low histone signal outside of these two region types is unlikely to impact the prediction analysis described below.
As a first validation for the cross-species mapping of epigenome data (Additional file 3 and 4), we examined whether the transferred epigenomes retain detectable cell-type specificity in gene promoters (for H3K4me3 and H3K27ac) and gene bodies (for H3K36me3). To that end, we focused on the comparison between human and mouse, where we have reference epigenome data for both species, and we found that correlations between transferred and measured epigenomes were usually highest when the cell type was matched (Figure 3). For example, the comparison between transferred and measured epigenomes in mouse resulted in a Pearson correlation of 0.75 for H3K4me3, 0.61 for H3K27ac, and 0.71 for H3K36me3 in CD4+ T cells.
To substantiate these findings, we performed region set enrichment using the LOLA software [25] on the top 5% gene promoters with the strongest transferred signal for H3K4me3, and on the top 5% gene bodies with strongest signal for H3K36me3. We consistently observed cell type specific enrichments (Addition file 5), as illustrated by the LOLA enrichments for CD4+ T cells using mouse as reference and human as target species (Figure 4A/B) and by the LOLA enrichments for liver/hepatocytes using human as reference and mouse as target species (Figure 4C/D). In both cases, the LOLA results showed the expected enrichment for tissue-specific histone marks associated with active promoters (H3K4me3) and actively transcribed genes (H3K36me3), respectively. The annotated cell type of the enriched region sets was the same as (or closely related to) the cell type of the transferred epigenomes, supporting that our cross-species epigenome transfer method retained the characteristic cell-type specificity of epigenome data.
Prediction of gene expression using epigenome data transferred across species
To assess the biological information carried by the transferred epigenomes in a larger number of species (namely in those for which no suitable epigenome data are available for validation), we tested whether we could predict gene expression in the target species based on the transferred epigenomes. It is well-established that gene expression can be predicted from epigenome data [26–28], and we would expect to observe better-than-random accuracies predicting gene expression based on transferred epigenome profiles if these profiles indeed captured relevant regulatory biology. Moreover, to make the validation more stringent, we can exploit the cell-type specific character of epigenome data and try to predict cell-type specific patterns of gene expression.
As our experimental reference dataset against which we evaluated the epigenome-based predictions, we obtained transcriptome data from various public sources covering a range of species and cell types: embryonic stem cells for human and mouse [11,12]; CD4+ T cells for human and mouse [24,29]; hepatocytes for human [10]; whole-organ samples of liver, kidney and heart for all species except human [30–32]; and a blood sample for opossum (Additional file 2). In this transcriptome dataset, we observed consistent clustering by cell type rather than by species, both for 1-to-1 gene orthologs [33] between each pair of species (Figure 5) and for those genes that were conserved across all 13 species (Additional file 1: Figure S4).
We implemented a machine learning approach to predict gene expression profiles in the target species based on epigenome data in the reference species (see Figure 1 for an overview). Using only data from the reference species, binary classifiers were trained to predict gene expression as either on/high or off/low. The prediction attributes were obtained by averaging histone signals for H3K4me3/H3K27ac in gene promoters and for H3K36me3 within the gene body, restricted to those subregions that were covered by the cross-species alignment used for epigenome transfer. A threshold of one transcript per million (1 TPM) was applied to label genes as on/high or off/low. We observed consistently high prediction performance for the two reference species, with crossvalidated, test-set only, receiver operating characteristic (ROC) area under curve (AUC) values of 0.89 (human to mouse) and 0.90 (mouse to human), resulting in a sensitivity of 78% (human) and 76% (mouse) at a specificity of 90% (Figure 6A/B).
Having validated the epigenome-based classifiers in the reference species, we next applied them to all target species, using the transferred epigenome data as input. First, we predicted gene expression in each target species independent of cell type, averaging across transferred epigenome profiles and transcriptomes in each target species. We observed high prediction accuracies for all target species, with ROC AUC values ranging from 0.87 to 0.81 when using human as reference (Figure 7A), and from 0.89 to 0.83 when using mouse as reference (Figure 7B). The average sensitivity was 67% (human reference) and 73% (mouse reference) at a specificity of 90%.
As an additional evaluation, we tested to what degree our gene expression predictions were celltype specific, and whether they indeed showed the highest accuracy for the matched cell type in the target species. When comparing AUC values for tissue-matched prediction to those obtained by cross-tissue prediction, the tissue-specific predictions consistently outperformed the tissue-agnostic predictions across all investigated target species and for both human (Figure 8A) and mouse (Figure 8B) as our reference species. Aggregating AUC values across species, the difference between the mean AUC values for the tissue-matched tests and the cross-tissue controls was highly statistically significant (one-sided Mann-Whitney-U, p-value < 10e-9).
Comparison of epigenome-based and orthology-based prediction of tissue-specific expression
To benchmark our epigenome-based predictions of gene expression, we compared their performance to that of an alternative (and complementary) approach that is based on gene orthology. Specifically, for all 1-to-1 gene orthologs between each pair of species, the orthology-based method predicts a gene to be expressed in a certain cell type of the target species if – and only if – it is expressed in the corresponding cell type of the reference species. This method can predict expression only for genes that have an annotated 1-to-1 ortholog in the reference species, which limits its scope and applicability to a subset of genes (Annotation file 2: Figure S5).
For a systematic comparison between the epigenome-based and orthology-based methods, we evaluated their performance on various subsets of genes selected using the following approach: we ordered all genes in the target species by increasing levels of DNA sequence conservation in the gene body, and we used all genes above a certain threshold as our evaluation set. We then plotted the performance gain of the epigenome-based method (calculated as the surplus of correct predictions over the orthology-based method across all target species) for human and mouse as our reference. In this analysis, the epigenome-based method resulted in approx. 20% more correct predictions than the orthology-based method (Figure 9A).
Both methods displayed stable prediction accuracy over a broad range of gene conservation values (Figure 9B). While the performance gain of epigenome-based prediction was higher for stringent thresholds on gene body conservation (Figure 9A), only relatively few genes pass these stringent thresholds, and a more inclusive threshold may increase the scope and utility of our predictions. For example, lowering the threshold on gene body conservation to 10-15% makes it possible for the epigenome-based method to predict the expression status for approx. 90% of all genes in the target species, compared to approx. 80% for the orthology-based method. Given that both methods showed similar median accuracies of 75.1% and 79.9% (human reference) and of 71.5% and 78.1% (mouse reference), the epigenome-based provides a clear advantage over the orthology-based method by providing a substantially larger number of accurate predictions.
Finally, we evaluated the tradeoff between sensitivity and specificity in more detail for the epigenome-based method. We interpreted the class probabilities of its classifier as a measure of prediction confidence. We observed that the advantage of the epigenome-based method over the orthology-based method was strongest for relatively lenient thresholds (~0.5), at the cost of a slightly reduced accuracy (Figure 10; Additional file 1: Figure S6). For the most stringent thresholds (>0.9), the epigenome-based method continued to show a higher number of correct predictions than the orthology-based method, while the difference in accuracy between both methods approached zero. Based on the shape of the curves, a relatively stringent threshold of 0.75 is likely to constitute a suitable and broadly applicable tradeoff between sensitivity and specificity.
Limitations of cross-species epigenome data transfer and prediction of gene expression
Despite these promising results using cross-species epigenome transfer and epigenome-based prediction of gene expression, the approach has certain limitations. Most importantly, whole-genome alignments cover only those genomic regions for which there is discernable conservation of the DNA sequence. For example, roughly 1 Gbp of DNA sequence were aligned between the human and mouse genome (90 million years of evolutionary distance), which corresponds to approximately a third of the human genome. At a threshold of at least 100 conserved base pairs for both the gene promoter and gene body, 92.2% (mouse) and 97.2% (human) of genes were amenable to cross-species transfer of reference epigenome data. These values remained high across larger evolutionary distances (Figure 11; Additional file 1: Figure S7), for example amounting to 98.8% in the comparison between human and opossum (160 million years) and 90.7% between human and chicken (312 million years).
We investigated the 487 human and 1,435 mouse genes that failed to meet our minimum alignment thresholds and which we therefore cannot predict using the epigenome-based method. Functionally characterization of these gene sets using the enrichR web service [34,35] identified an enrichment for Gene Ontology categories related to olfactory reception (Additional file 1: Figure S8A/B; Additional file 6), which is consistent with large species-specific differences in the repertoire of olfactory receptor genes between human and mouse [36]. We also analyzed the corresponding promoter regions for region set enrichment using the LOLA software [25], and we detected an enrichment of repetitive DNA elements (genomic duplications, satellite repeats, long terminal repeats, LINE repeats) as well as regions characterized by the heterochromatin mark H3K9me3 (Additional file 7). These results indicate that cross-species epigenome transfer is not well suited for analyzing species-specific gene families and repetitive heterochromatin regions.
Finally, we investigated whether the evolutionary age of individual genes may be associated with the accuracy of the epigenome-based predictions. Using a recently published dataset of gene age annotations [37], we found that genes whose expression status was predicted incorrectly tend to have a younger evolutionary age than those for which the expression status was predicted correctly (Figure 12; Additional file 1: Figure S9A/B). For example, the group of genes specific to mammals contained 52% more incorrectly predicted genes than would be expected based on the background distribution of gene ages; in contrast, the much larger group of genes shared across eukaryotes contained 7% more correctly predicted genes compared to expectation.
Discussion
Epigenome profiling is costly and labor-intensive, and comprehensive tissue-specific epigenome maps are currently available for only a few species. Here we explored cross-species extrapolation of epigenome data as a new approach that utilizes the large catalogs of human and mouse reference epigenomes to support research in non-model organisms.
We have developed a method for transferring epigenomes across species based on whole genome alignments, which we applied and validated in two complementary ways. First, we compared transferred and measured epigenomes for matched cell types between human and mouse (extensive epigenome profiles are available for validation in these two species), and we found that our method indeed retained the epigenome’s characteristic cell type specificity. Second, we combined cross-species epigenome transfer with epigenome-based prediction of gene expression, which adds an important dimension in the cross-species analysis of gene regulation and allowed us to validate our predictions in a larger number of target species for which transcriptome but no epigenome data were available. Again, we obtained high prediction accuracies, and our method accurately retained cell type specific regulatory patterns across species.
We observed broadly consistent results across twelve mammalian and one avian species included in our study, which span a spectrum of 20 million (mouse-rat) to 312 million (human-chicken and mouse-chicken) years of evolutionary distance. While it may be surprising that there was not a stronger deterioration of prediction performance as evolutionary distances increased, all of the species included in our study share a highly conserved epigenetic machinery, and their genomes are sufficiently conserved that it is still possible to identify cross-species homology in gene promoters and gene bodies using whole genome alignments. Moreover, we found that evolutionary old genes were predicted with higher accuracy than more recently evolved genes, which is likely to contribute to the robustness of predictions across a wide evolutionary range.
Our method does not have major limitations that would hinder its broad application, except that it requires a reference genome for all included species. The more complete and more accurate the reference genome assemblies of the target species, the higher will be the quality of whole genome alignments, and – as a direct consequence – the performance of our method. Nevertheless, because our method transfers epigenome data between locally aligned regions, it is not restricted to high-quality genomes and can be expected to cope well with highly fragmented genome assemblies, thus facilitating its application to understudied non-model organisms.
Conclusion
Our results show that cross-species transfer of epigenome data is possible among mammalian (and avian) species, and that the transferred epigenomes not only retain tissue specificity but also enable tissue-specific prediction of gene expression. We can thus conclude that the tissue-specific links between epigenome profiles and gene expression are well conserved across the analyzed species. Cross-species epigenome transfer and prediction can help address the current dearth of epigenome data for non-model organisms. Importantly, computational prediction is not meant to replace experimental analysis, but to complement it by providing access to a larger number of species and more tissues within each species. It will also be interesting to compare predicted epigenome data with experimentally measured epigenome data, in order to find genomic regions in which the measurement deviates from the prediction. Such regions would be strong candidates for species-specific epigenome regulation and promising targets for in-depth biological investigation. We thus conclude that bioinformatic approaches for cross-species epigenome mapping can reasonably complement well-established experimental methods.
Materials and Methods
Included species, genome assemblies, and gene models
We included twelve mammalian species (human [hg19], rhesus [rheMac2], mouse [mm9], rat [rn5], rabbit [oryCun2], pig [susScr2], cow [bosTau7], sheep [oviAri3], horse [equCab2], dog [canFam3], cat [felCat5], opossum [monDom5]) and one avian species (chicken [galGal3]), based on the following selection criteria: (i) complete genome assemblies and whole-genome alignments with at least one of the two reference species (human, mouse) were available from the UCSC Genome Browser [39]; (ii) gene models for the relevant assemblies were available from one of three sources (GENCODE [40]: human v19, mouse vM1; The Bovine Genome Database [41]: cow Ensembl75; UCSC Genome Browser tables ensGene, ensemblSource and ensemblTo-GeneName: all other species); (iii) epigenome profiles including histone H3K4me3, H3K27ac, and H3K36me3 as well as transcriptome data were available for defined tissues / cell types in the reference species (Additional file 2); and (iv) transcriptome data were available for at least some of these tissues / cell types in the target species. An overview of the evolutionary relationships for all species included in this study is provided as a phylogenetic tree generated using the Time-Tree service [42] (Additional file 1: Figure S1). To alleviate the effect of differences in annotation quality, all gene models were reduced to protein-coding transcripts/genes. Additionally, only transcripts tagged as “Consensus CDS (CCDS)” were selected in the GENCODE annotations. All analyses were restricted to genes located on the autosomes, excluding the sex chromosomes.
Promoter regions were defined as 1.5 kilobase windows around the transcription start site (−1,000 bp to +500 bp), and genes with a gene body length of less than 750 bp were discarded.
Whole-genome alignments, gene orthologs, and evolutionary conservation
Whole-genome alignments between the reference species (human, mouse) and target species were downloaded from the UCSC Genome Browser in the form of chain files [43] and processed as described in the UCSC Genome Wiki to derive reciprocal best chains [44]. The reciprocal best chains were further processed using CrossMap [45] and custom scripts to build pairwise symmetric alignment blocks. Genes with less than 100 aligned bases in their promoter and in their body were considered weakly aligned. Information on gene orthologs was downloaded from OrthoDB [33], and lists of 1-to-1 orthologs for each pair of species and, separately, for all 13 species combined were extracted from the annotation data using custom scripts.
Epigenome and transcriptome data preprocessing
Publicly available reference epigenomes for the reference species (human, mouse) were obtained from ENCODE, DEEP, and BLUEPRINT projects (Additional file 2). The resulting dataset included three histone marks (H3K4me3, H3K27ac, H3K36me3), three cell types (embryonic stem cells, naïve CD4+ T cells and hepatocytes), and a total of nine epigenome profiles for human, as well as three histone marks (H3K4me3, H3K27ac, H3K36me3), five cell/tissue types (embryonic stem cells, naïve CD4+ T cells, whole liver, kidney and heart), and a total of 17 epigenome profiles for mouse. Tissue-specific transcriptome profiles were obtained from ENCODE, DEEP, and public repositories (SRA/ENA). Where available, epigenome profiles were downloaded in form of his-tone signal tracks (bigWig format). For the BLUEPRINT mouse data, which were not available in preprocessed form, reads were mapped using bowtie2 v2.3.3.1 with the preset --sensitive, and signal tracks were generated using bamCoverage v2.5.3 from the deepTools software suite [46]. To prepare the epigenome profiles for the analysis, biological replicates from the same laboratory were merged by taking the mean. All resulting epigenome signal tracks were quantile normalized per project and clipped at the 99.95 percentile to alleviate the effect of outliers. Transcriptome data were processed with Salmon v0.8.2 using the following parameters: --forgettingFactor 0.8 --useVBOpt --seqBias --gcBias --geneMap, aggregating transcript-level abundance estimates into gene-level estimates. Finally, the gene expression values were subjected to quantile normalization, resulting in transcript per million (TPM) values that were used for further analysis.
Epigenome-based prediction of gene expression
All prediction models were implemented in Python3, using libraries from the SciPy ecosystem for scientific computing [47,48]. Histone signal tracks were masked to exclude non-conserved regions according to pairwise genome alignments between the reference and target species. Prediction attributes were derived from these masked signal tracks by averaging the signal across each gene promoter (H3K4me3, H3K27ac) and gene body (H3K36me3). Gradient boosting classifiers from the scikit-learn library [49] were trained using these histone signal intensities as prediction attributes and the gene expression status (on/high: TPM ≥ 1; off/low: TPM < 1) as target variable (this binary thresholding strategy was motivated by previous studies [28, 50–52]). Each training dataset was randomly subsampled to balance class frequencies, and model hyperparameters were tuned using 5-fold cross-validation on this subsampled training dataset. The best model according to the results of the cross-validation was refit using the full set of training data.
Genomic region enrichment analysis
Region sets were analyzed for significant enrichment using the LOLA software [25]. For the human genome, the LOLA Core region set was used. In addition, we created a custom region set for both the human and mouse genome, comprising various sets of transcription factor binding sites as well as histone modification peaks from the DeepBlue repository [53]. For each LOLA analysis, we filtered the results and retained enriched region sets only if the support (i.e., number of regions covered) was at least 5 and the multiple-testing corrected statistical significance (q-value) was below 0.05. We manually selected 10-15 entries from the top ranking region sets for visualization and provide the full list of LOLA enrichments in Additional files 5 and 7.
Gene age annotation
To evaluate the evolutionary age of each gene, we obtained gene age annotations for the following species from a recent publication [37]: chicken, cow, dog, human, mouse, opossum, rat, and rhesus macaque. UniProt identifiers were mapped to Ensembl gene identifiers using a web-based service [54]. For visualization, we selected all species for which at least 80% of the identifiers could be mapped (human 97%; mouse 83%; rhesus 81%) and plotted the observed gene age distribution relative to the expected distribution based on the priors for the different gene age labels (Figure 12, Additional file 1: Figure S9A/B).
Declarations
Availability of data and material
All datasets used are listed in Additional file 2 and publicly available. Transferred epigenome are provided in Additional files 3 and 4. Program code and instructions to replicate all results are openly available in a github repository (https://doi.org/10.17617/1.69).
Competing interests
None declared.
Funding
This work was performed in the context of the German Epigenome Project (DEEP, German Science Ministry grant no. 01KU1216A) and the BLUEPRINT project (European Union’s Seventh Framework Programme grant 282510). C.B. is supported by a New Frontiers Group award of the Austrian Academy of Sciences and by an ERC Starting Grant (European Union’s Horizon 2020 research and innovation programme, grant agreement no. 679146).
Authors’ contributions
P.E. and C.B. conceptualized the project with input from T.L.; P.E. analyzed the data and wrote the first draft of the manuscript; all authors contributed to the writing of the manuscript.
Additional files
Name: Additional file 1
Format: doc text document (MS Word / LibreOffice Writer)
Title: Supplementary figures
Description: Supplementary figures
Name: Additional file 2
Format: tsv tab-separated values (MS Excel / LibreOffice Calc)
Title: Datasets overview
Description: Summary of epigenome and transcriptome datasets used in this study
Name: Additional file 3
Format: tsv tab-separated values (MS Excel / LibreOffice Calc)
Title: Epigenome predictions (human reference)
Description: Epigenome data for gene promoters and gene bodies transferred from human (as the reference species) to twelve target species
Name: Additional file 4
Format: tsv tab-separated values (MS Excel / LibreOffice Calc)
Title: Epigenome predictions (mouse reference)
Description: Epigenome data for gene promoters and gene bodies transferred from mouse (as the reference species) to twelve target species
Name: Additional file 5
Format: zip compressed tab-separated values (7-Zip, MS Excel / LibreOffice Calc)
Title: LOLA results for transferred epigenome data
Description: Region set enrichment analysis using LOLA for tissue-specific epigenome transfer between human and mouse (and vice versa)
Name: Additional file 6
Format: zip compressed tab-separated values (7-Zip, MS Excel / LibreOffice Calc)
Title: enrichR results for weakly aligned genes
Description: Gene set enrichment analysis using enrichR for weakly aligned genes in human and mouse
Name: Additional file 7
Format: zip compressed tab-separated values
Title: LOLA results for weakly aligned genes
Description: Region set enrichment analysis using LOLA for weakly aligned genes in human and mouse
Acknowledgements
We acknowledge the ENCODE, DEEP, and BLUEPRINT consortia for providing epigenome data for the reference species. Additionally, we thank Anna Hake (née Feldmann), Prabhav Kalaghatgi, Johanna Klughammer, and Nora Speicher for helpful discussions.