Abstract
In past decade, many reports have demonstrated that tissues in multi-cellular organisms may play important roles to shape the pattern of genome evolution. The tissue-driven hypothesis was then coined, claiming that tissue-specific factor as the common resource of functional constrain may underlie the positive correlations between tissue expression divergence, sequence divergence, or the expression tolerance of duplication divergence. However, the original version of tissue-driven hypothesis cannot rule out the tissue-specific effect of mutational variance. In this perspective, we solve this problem by modifying the evolutionary model that underlies the tissue expression evolution. Reanalysis of the microarray data reanalysis has revealed the relative importancebetween tissue-specific functional constraints and mutational variances in the tissue evolution. Finally, we outline how to utilize RNA-seq technology to further investigate the tissue expression evolution in the case of multiple tissues and species.
Tissue expression evolution
In multi-cellular organisms, understanding the roles of tissue-specific factors during the course of genome evolution is the first step to investigate the emergence of biological complexity (Arendt 2008) but the tissue evolution remains obscure and controversial (Chan et al. 2009; Yanai and Hunter 2009). Thanksto the invent of high throughput technologies especially the microarray chips, a number of reports in the past decade have revealed interesting evolutionary patterns of tissue expression divergence inprimates (Enard et al 2002; Gu and Gu 2003; Khaitovich et al. 2004a, 2004b; 2005a, 2005b, 2005c), in mammals (Duret and Mouchiroud 2000; Su et al. 2004; Gu and Su 2007), aswell as in fruitflies (Rifkin et al. 2003). The earliest work included Duret and Mouchiroud (2000) who showed that the rate of protein sequence divergence was negatively correlated with the tissue broadness of gene expression. In other words, broadly expressed proteins tend to evolve slowly, and vice versa. With the help of human Affymetric microarray chips, Enard et al. (2002) conducted a genome-wide expression analysis in the brains among primates, claiming a human lineage-specific acceleration of expression divergence. Follow-up analyses from our group (Gu and Gu 2003) and (Caceres et al. 2003; Uddin et al. 2004) suggested that up-regulation might be the majorpattern during the evolution of human brain. Moreover, a detailed analysis on expression profiles in primates (Gilad et al. 2006) revealed a rapid evolution of human transcription factors.Together, these studies have provided an updated version of the regulatory hypothesis of human-chimpanzee split (King and Wilson 1975). Noticeably, Babbin et al. (2010) showed that both noncoding and protein-coding RNAs contributed to gene expression evolution in the primate brain, and Chodroff et al. (2010) showed the role oflong noncoding RNA genes expressed in brains among diverse amniotes. Meanwhile, Khaitovich et al. (2005) addressed another important issue, that is, whether the rate of expression divergence among species differs among tissues. They analyzed genome-wide expression profiles in several tissues in primates, and observed that the brain or cerebellum tissue may have under stronger expression conservation than other tissuesunder study (testis, heart, liver and kidney); in particular, testis showed a rapid expressiondivergence.
Stabilizing selection model for tissue-driven hypothesis
Gu and Su (2007) coined the tissue-driven hypothesis, postulating that tissue-specific factors may serve as the common functional constraints that may be imposed on different aspects of genome evolution. Under this framework, three specific predictions were derived: (i) a positive correlation between tissue expression distance and protein sequence distance between species; (ii) a positive correlation of tissue expression distance between species with that between duplicate genes; and (iii) tissue-specific factors and tissue broadness are two independent, additive recourses that shape the pattern of genome evolution. Initial analyses (Gu and Su 2007; Su and Gu 2007) have provided strong evidence for supporting these predictions. Indeed, most recent related studies can be explained by the tissue-driven hypothesis, in spite of various terminologies used by different authors (Brawand et al. 2011).
Yet we have recently realized a theoretical drawback of the tissue-driven hypothesis recently. Shortly speaking, tissue expression evolution is driven by two factors: the mutational variance accessible to the tissue, and the functional constraints of the tissue. In other words, variation of each factor among tissues would results in the variation of expression divergence among tissues, and the original version of tissue-driven hypothesis (Gu and Su 2007) did not distinguish between these two possibilities.
Using tissue expression distances (Eti) between human and mouse for 29 orthologous tissues, Gu and Su (2007) found a considerable variation of human-mouse expression divergence among tissues. As discussed above, there are two alternative mechanisms that can explain this observation: The first one invokes the effect of tissue-specific functional constraint (functionally important tissues tend to have strong constraints on expression divergence), and the second one invokes the effect of tissue-specific mutational variance (tissue-specific regulatory networks shape the consequence of regulatory mutations.
It has been argued that gene expression may be optimized by natural selection (Bedford and Hartl 2009). Gu and Su (2007) invoked the stabilizing selection model (Hansen 1997) to describe the tissue-specific selection constraint on the expression divergence; though many other models were proposed (Gu 2004; Eng et al. 2009). For a gene expressed in a certain tissue (ti), the stabilizing selection on the expression level x follows a Gaussian fitness function fti(x) =exp[‐ wti(x‐ θ)2], where θ is the optimal expression level, wti is the coefficient of stabilizing selection on gene expression intissue ti; a large wti means a strong selection pressure, and vice versa. Under this model, the evolution of tissue expression follows an Ornstein-Uhlenback (OU) process, based on which the tissue expression distance is given by
Where Wti= 2NeWti is the strength of stabilizing selection against the expression divergence, and Ne is the effective population size; β= Wti ε2 is decay-rate of expression divergence, and ε2 is the mutational variance.
Owe attempt to solve this problem, arguing that the revised tissue-driven hypothesis should include two sub-hypotheses: the tissue-specific functional constraint, as well as its alternative tissue-specific mutational variance (She et al. 2009). Moreover, extending the underlying evolutionary model allows further data exploration. Microarray data reanalysis has revealed their relative importance in shaping the pattern of tissue evolution. Finally, we discuss how to utilize novel next-generation technologies (NGS) such as RNA-seq to investigate tissue expression evolution when genome-wide expression data are available in multiple tissues and multiple species.
Selection-mutation balance of tissue expression evolution
We solve this problem by formulating a stochastic model called the stationary Ornstein-Uhlenback (sOU) process model (Hansen 1997;Butler and King 2004). As shown in Material and Methods, the sOU model depends on two parameters: (i) The strength of tissue-specific functional constraint is measured by Wti; a large value indicates a strong constraint, and vice versa. It can be shown that tissue expression distance between twospecies is saturated to 1/Wti as the evolutionary time (t) is sufficiently large. And (ii) the mutational variance (2ε2t) measures the mutationalcapacity that drives the tissue-specific expression divergence between species. Moreover, the stationary assumption implies that the expression variance remains a constant during the expression divergence, under which we are able to estimate the two parameters.
In Eq.(1), we have only one observation (Eti) but two unknown parameters. To solve this problem, we need an additional equation that can be derived under the assumption of stationary OU process, which means that the expression variance remains invariant during the course of expression evolution. Let Rti be the coefficientof expression correlation between two species of the same tissue. Under the stationaryassumption, it is given by
Hence, one can easily estimate Wti from two observations Eti and Rti, that is, by replacing e−2βt in Eq.(B-l) with Rti according to Eq. (2), we have
Re-analysis of tissue-driven hypothesis
Estimation of tissue-driven factors (Wti)
We first showed that the expression variance remained roughly the same between the human and mouse in each of 29 tissues, indicating that the assumption of selection-mutation balance holds approximately. For each tissue of two species (human and mouse), we estimated Wti from the observed tissue expression distance (Eti) and the coefficient of expression correlation (Rti) (Table 1). The first question one may ask is to what extent the variation of expression divergence among tissues can be explained by the tissue factor. Simple calculation shows that the tissue expression distance is negatively correlated with the tissue-specific functional constraint (Wti) (R2=0.57,p-value < 0.001). We therefore conclude that tissue-specific function constraint andtissue-specific mutational variance may explain equally the variation of expression distance among tissues.
In Table 1, 29 tissues are classified intoseveral groups. We observed that neurorelated tissues, on average, have stronger tissue-specific function constraints (Wti) than neuro-unrelated tissues (p-value< 0.01, t-test), whereas no statistically significant difference in tissue-specific mutational variance (2ε2t) was found (p-value>0.10, t-test). Thoughwe confirmed that several tissues have relaxed functional constraints, i.e., a lowWti, including testis (Wti=0.33), CD4 (Wti=0.36), CD8 (Wti=0.34) and pancreas (Wti=0.39), we found no significant difference in either Wti or 2ε2t among other biological systems except for the neuro-system.
Decomposition of Eti-Dti (expression-sequence) correlation into Wti-Dti and 2ε2t-Dti correlations
Let Dti be the mean evolutionary distance of protein sequence (between the human and mouse) over a set of genes expressed in tissue ti. One important prediction of the tissue-driven hypothesis (Gu and Su 2007) is the existence of positive correlation between Eti and Dti, which reflects the common micro-environment of tissue (ti) on the expression divergence and protein sequence divergence, respectively. In Materials and Methods, we decomposed this prediction into the Wti-Dti correlation for the common tissue-specific functional constraints, and the 2ε2t-Dti correlation for tissue-specific mutational variance, respectively. Gu and Su (2007) considered two expression status of a gene in tissue ti, i.e., high expression or normal expression. In both cases, we indeed found a significant negative correlation between Wti and Dti. In the case of normal expression (Fig. 2, panel A), the coefficient of correlation is r=‐0.42 (p< 0.01), while r=‐0.31 (p< 0.01) in the case of high expression. Interestingly, we found thattissue-specific mutational variance showed a positive correlation with tissue protein distance (r=‐0.44, (p< 0.01 fornormally-expressed genes shown in Fig.2 panel B, and r= 0.38 for highly expressed gene, p< 0.01). We explain this positive 2ε2t-Dti correlation as the common tissue-specific genetic or epigenetic buffering against mutational effects.
Relationship between inter-species and inter-duplicate expression divergences
Expression divergence between duplicates has been thought as the major mechanism for duplicate gene preservation (Force et al. 1999; Lynch and Conery 2000; Gu et al. 2005; Kaessmann 2010). For a set of duplicate genes, let Tdupbe the mean expression distance between duplicate pairs, or tissue duplicate distance for short. Using 1312 duplicate pairs that were duplicated before the human-mouse split, Gu and Su (2007) estimated Tdup for each tissue, and found a highly significant correlation between tissue expression distance Eti and Tdup.Similar to the above analysis,we examined the Wti-Tdup correlation and the 2ε2t-Tdup correlation to further explore the underlying tissue factors (Fig.3). While we observed a highly significant Wti-Tdup negative correlation (r=‐ 0.95, p<0.001), no correlation between 2ε2t and Tdup (p> 0.10) was found. These observations suggest that when a tissue allows more inter-species expression divergence, it should also tolerate more extensive expression divergence between duplicated genes. Hence, duplicated genes tend to have more expression divergence in a tissue with relaxed developmental constraint, and vice versa.
Concluding remarks and outlook
In this perspective, we have extended the original tissue-driven hypothesis (Gu and Su 2007) to two sub-hypotheses (tissue-specific functional constraints versus tissue-specific mutational variance), and formulated amodel-based approach to testing these two alternatives. Applying to the human-mouse microarray data, our analysis reveals that both mechanisms may have played nontrivial roles on the tissue-driven evolution, though one should be cautious because of the high noises inherited in the microarray data (Yanai et al. 2004; Yanai et al. 2006). With the rapidincrease of high throughput transciroptome datasets next-generation sequencing (NGS) technologies such as RNA-seq datasets, we speculate that the tissue-driven hypothesis can be further tested in depthas follows.
First, the notion that tissue-specific mutational variance that drives tissue evolution may be shaped by tissue-specific RNA expression profiles (Wang et al 2009; Clark et al. 2011) and epigenetic patterns (She et al. 2009) can be investigated by the genomic analysis combining RNA-seq and other NGS techniques (Morozova et al. 2009). Indeed, previous studies (Tirosh et al. 2006; Cui et al. 2007) have shown that inter-speciesexpression variation may be affected by many gene-specific regulatory factors such as microRNA or TATA box. Second, Arendt (2008) has formulated a conceptual framework witha strong link between cell-type/tissue evolution and expression divergence after gene duplication (Prince and Pickett, 2002; Morkov and Li 2003; Huminiecki and Wolfe 2004; Gu et al. 2005). For instance, high constraint on inter-species expression variation for duplicate genes expressed in the central nerve system (CNS) may be the outcome after arapid evolution toward CNS-specific genes. Third, the relationship between tissue-driven hypothesis and tissue expression broadness can be studied more rigorously because RNA-seq has no the severe cross-hybridization problem that occurred in microarrays.Forth, it would be interesting to test whether the tissue-driven hypothesis still holds for intra-species expression variation such as in the human population (Pickell et al. 2010). And finally, with the availability of RNA-seq datasets in diverse tissue andorganisms, we are able to formulate a phylogeny-based framework to study the tissue expression, which would be more powerful than the two species-based analysis. With the help of a newly-developed statistical method (Gu et al. 2013), we have conducted a preliminary analysis by analyzing six tissues (brain, cerebellum, liver, heart, kidney an testis) in mammals (Brawand et al. 2011) and showed qualitatively the same results as shown above (not shown).
BOX-1
Stabilizing selection model
It has been argued that gene expression may be optimized by natural selection (Bedford and Hartl 2009). Gu and Su (2007) invoked the stabilizing selection model (Hansen 1997) to describe the tissue-specific selection constraint on the expression divergence; though many other models were proposed (Gu 2004; Eng et al. 2009). For a gene expressed in a certain tissue (ti), the stabilizing selection on the expression level x follows a Gaussian fitness function fti(x) = exp[‐wti(x‐ θ)2], where ff is the optimal expressionlevel, wti is the coefficient of stabilizing selection on gene expression in tissue ti; a large wtimeans a strong selection pressure, and vice versa. Under this model, the evolution of tissue expression follows an Ornstein-Uhlenback (OU) process, based on which the tissue expression distance is given by where Wti=2Ne wti is the strength of stabilizing selection against theexpression divergence, and Ne is the effective population size; β=Wtiε2 is decay-rate of expressi divergence, and ε2 is the mutational variance.
Tissue-specific function-constraint
The hypothesis of tissue-specific constraint postulates that, in multicellular organisms, tissue factors may play important roles in the functional constraint on the rate of expression evolution. As shown in Eq.(1), this effect can be characterized by the tissue-specific parameter Wti. First, Wti measures the strength of stabilizing selection on expression evolution. And second, Wti determines the saturated tissue expression distance, i.e., when t→∞, Eti→ 1/Wti Hence, the tissue-driven hypothesis predicts that, generally, the parameter Wti varies among tissues.
Tissue-specific mutation-accessibility
One of most-exciting findings in genome sciences is that genome-wide epigenetic patterns, such as DNA methylation, histone modification or miRNAs, may have played fundamental roles in shaping the tissue-specific expression profiles in multicellular organisms. It raises the possibility that the among-tissue variation of expression divergence between species could be considerably affected by these epigenetic factors. Tentatively, one may call this phenomenon tissue-specific mutational accessibility, predictinga variation of mutational variance (ε2) among tissues.
Distinguish between two tissue-specific mechanisms
In Eq.(1), we have only one observation (Eti) but two unknown parameters. To solve this problem, we need an additional equation that can be derived under the assumption of stationary OU process, which means that the expression variance remains invariant during the course of expression evolution. Let Rti be the coefficientof expression correlation between two species of the same tissue. Under the stationaryassumption, it is given by where R∞ is the coefficient of expression correlation as t→∞. In the case of R∞= 0, one caneasily estimate Wti from two observations Eti and Rti, that is, by replacing e‐2‐t in E q.(B-1) with Rti according to Eq. (2), we have Wti = (1‐ Rti)/Eti. Meanwhile, from Eq.(B-2) one can estimate 2βt=ln Rti, which leading to the estimation of 2ε2t=(ln Rti)/Wti.
Acknowledgements
We are grateful to Zhan Zhou for constructive comments in the early version of the manuscript, and Jingqi Zhou for technical assistances.
Footnotes
↵† These authors have equal contributions to the work, as attended students of graduate course GDCB576X (Comparative genomics and phenomics), Spring 2016.