Abstract
Allopolyploidy has played a major role in plant evolution but its impact on genome diversity and expression patterns remains to be understood. Some studies found important genomic and transcriptomic changes in allopolyploids, whereas others detected a strong parental legacy and more subtle changes. The allotetraploid C. bursa-pastoris originated around 100,000 years ago and one could expect the genetic polymorphism of the two subgenomes to become more similar and their transcriptomes to start functioning together. To test this hypothesis, we sequenced the genomes and the transcriptomes (three tissues) of allote-traploid C. bursa-pastoris and its parental species, the outcrossing C. grandiflora and the self-fertilizing C. orientalis. Comparison of the divergence in expression between subgenomes, on the one hand, and divergence in expression between the parental species, on the other hand, indicated a strong parental legacy with a majority of genes exhibiting a conserved pattern and cis-regulation. However, a large proportion of the genes that were differentially expressed between the two subgenomes, were also under trans-regulation reflecting the establishment of a new regulatory pattern. Parental dominance varied among tissues: expression in flowers was closer to that of C. orientalis and expression in root and leaf to that of C. grandiflora. Since deleterious mutations accumulated preferentially on the C. orientalis subgenome, the bias in expression towards C. orientalis observed in flowers suggests that expression changes could be adaptive and related to the selfing syndrome, while biases in the roots and leaves towards the C. grandiflora subgenome may be reflective of the differential genetic load.
Introduction
Polyploidy, and in particular allopolyploidy, whereby a novel species is created by the merger of the genomes of two species, is considered to be a common mode of speciation in plants (1) as it induces an instant reproductive isolation, the difference in chromosome number impeding reproduction with the parental species. In the case of allopolyploidy, the daughter species thus has at inception two divergent sub-genomes, one inherited from each parental species. Such an increase in genome copy number can be advantageous and could partly explain the apparent evolutionary success of allopolyploid species (2, 3). For instance, genome doubling creates genetic redundancy, thereby increasing genetic diversity and allowing the masking of deleterious mutations through compensation. Genome doubling and initial redundancy also offer new possibilities for the evolution of genes over time: one copy can degenerate, both can be conserved by dosage compensation through, for instance, compensatory drift (4) or their pattern of expression can diverge and even lead to the evolution of new functions (see (5) and references therein). Moreover, gene redundancy also potentially allows tissue-specific expression of different gene copies (6, 7). On the other hand, as pointed out by many authors (8–11), the evolutionary success of allopolyploids can also appear para-doxical since the birth of a new allopolyploid species will also be accompanied by numerous challenges. These challenges are first associated with the initial hybridization between two already divergent genomes that have now to start working together, implying, among other things, important changes of the meiotic machinery and of gene expression patterns (12).
The magnitude of gene expression changes has been reported to vary substantially across polyploid species, from minor modifications (13, 14) to so-called “transcriptomic shock” (7). The balance in expression pattern between the two subgenomes also seems to be highly variable and ranges from non-additivity, such as extreme expression dominance of one of the ancestral genomes over the other, to the additivity of their expression contributions (15, 16), and it also evolves through time. For example, in Mimulus peregrinus gene expression dominance was established early on but also in-creased over successive generations (17). However, the generality, timing, and causes of changes in expression pattern of the two parental genomes remain poorly known beyond a few case studies (16, 18) and may, to a large extent, depend on parental legacy because a part of the observed differences between the two subgenomes of the allopolyploid species may have already been present between the parental species (3).
Ultimately, changes in patterns of gene expression will follow from modifications in gene expression regulation. Differences in gene expression can be due to changes in cis-and trans-regulatory elements. Cis-regulatory elements alter allele-specific expression and are generally located close to the gene they regulate (e.g., promoters), whereas trans-regulatory elements affect both alleles and can be located anywhere in the genome (19–22). In the case of a newly formed allopolyploid species, one would expect the two copies of a gene to be under the influence of trans-regulatory elements inherited from both parents and its expression level to first move towards the mean expression of the two parental species. Retaining the parental pattern of expression in each subgenome would imply that only cis-regulation takes place, or there are forces opposing the establishment of cross trans-regulation. For instance, one could expect purifying selection to have a larger impact on trans-acting mutations than on cis-acting ones because the former are more pleiotropic than the latter. If so, the residual variants will mostly be cis-acting ((23) but see (24)). It was also shown that a gene is often under the influence of both trans-and cis-regulatory elements that act in opposite directions (22), leading to a cis-trans compensation that prevents overshooting optimal over-all expression level. Such compensation between cis-and trans-regulatory elements is one of the predictions of the en-hancer runaway (ER) model proposed by Fyon et al. (25). Under the ER model, and especially in outcrossing species where heterozygotes are frequent, cis-regulatory variants facilitate the exposure of alleles to purifying selection. If the enhancer and the gene they regulate are linked then the up-regulating variants will hitch-hike with the allele carrying the lowest number of deleterious mutations, leading to an open-ended escalation in enhancer strength (25). As selection on expression appears to be primarily stabilizing (22, 26, 27), at least at intermediate evolutionary timescales (28), a compensatory effect of expression in trans is predicted (25, 29). The relative importance of cis-and trans-regulation can be examined by comparing the relative expression in the parental species with the relative expression of homeologous genes in the newly formed tetraploid (19, 30, 31).
Finally, differential expression between the two genomes could result from a differential accumulation of deleterious or slightly deleterious mutations between the two subgenomes or, alternatively, be also related to biased phenotypic or adaptive changes associated to the differences between the two parental species. If the differential expression is only due to differential accumulations of deleterious mutations, we would expect to see the same differential expression pattern across different tissues, whereas if it is related to biased phe-notypic or adaptive changes then we may expect to see differences depending on the tissue considered.
The shepherd’s purse C. bursa-pastoris is an allotetraploid selfing species that originated some 100-300 kya from the hybridization of the ancestors of C. orientalis and C. grandi-flora (14) (Fig. 1A). The two parental species are strikingly different: C. orientalis (hereafter CO), a genetically depauperate selfer, occurs across the steppes of Central Asia and Eastern Europe (32), whereas C. grandiflora (hereafter CG), an obligate outcrosser with a particularly high genetic diversity, is primarily confined to a tiny distribution range in the mountains of Northwest Greece and Albania (32) (Fig. 1). Among Capsella species, only C. bursa-pastoris has a world-wide distribution (32), some of which might be due to ex-tremely recent colonization events associated with human population movements (32, 33). In Eurasia, the native range of C. bursa-pastoris is divided into three genetic clusters - Asia, Europe, and the Middle East (hereafter ASI, EUR and ME, respectively) - with low gene flow among them and strong differentiation both at the nucleotide and gene expression levels (33, 34). Reconstruction of the colonization history suggested that C. bursa-pastoris spread from the Middle East towards Europe and then expanded into Eastern Asia. This colonization history resulted in a typical reduction of nucleotide diversity with the lowest diversity being found in the most recent Asian population (33).
It has been possible to phase the sub-genomes by assigning each genome sequence (or transcript) to a parental species sequence (35). The phased data suggested that the differences in deleterious variants between the two subgenomes of C. bursa-pastoris are largely a legacy of the differences between the two parental species and that biased fractionation, the biased loss of ancestral genomes in an allopoly-ploid, is limited (14, 36). A recent study further demonstrated that the evolutionary history of the two subgenomes varies across the different populations (35): for example, selective sweeps were more common on the subgenome descended from C. grandiflora (hereafter CbpCg) than on the subgenome descended from C. orientalis (hereafter CbpCo) in Europe and the Middle East, while the opposite pattern was observed in Asia (35). There were also differences in gene expression: the two subgenomes showed no significant difference in the levels of expression in Asia, whereas the CbpCg subgenome was slightly more over-expressed than the CbpCo subgenome in Europe and the Middle East. The study by Kryvokhyzha et al. (35), however, did not include expression of the two parental species and considered only the expression in one tissue (seedlings). We thus do not know yet how the current state was established since the origin of the polyploid.
The aim of the present study was to address questions on the evolution of the pattern of expression of two subgenomes of the allotetraploid shepherd’s purse C. bursa-pastoris since they derived from the two parental species. We focused on two main questions. First, has the relative contribution of cis- and trans-regulation been altered by polyploidization? Second, could differential expression between the two subgenomes only results from a differential accumulation of deleterious/slightly deleterious mutations between the two subgenomes (nearly neutral hypothesis) or is it also related to phenotypic differences between the two parents (adaptive hypothesis)? Since one parent is outcrossing (C. grandiflora) and the other self-fertilizing (C. orientalis), the former with large flowers and the latter with tiny ones, one may expect differential expression in flower tissues of selfing C. bursa-pastoris to be biased towards the C. orientalis expression levels under the adaptive hypothesis whereas tissues that have not experienced adaptive specialization might show an expression bias towards C. grandiflora.
To address these questions and, more generally, to characterize the expression pattern of C. bursa-pastoris, we analyzed the genomes and the transcriptomes of three tissues (flowers, leaves, and roots) of 16 accessions coming from different populations of the C. bursa-pastoris natural range and compared them with those of the parental lineages C. grandiflora and C. orientalis (four accessions each). In total, 72 transcriptomes and 24 genomes were analyzed.
One hundred thousand generations after its inception, C. bursa-pastoris does not show any sign of a transcriptomic shock. Instead, our data revealed highly concerted changes with the expression levels of the two subgenomes converging towards a median value. This was achieved by a balance between cis-and trans-regulation and a strong parental legacy that was also observed for the accumulation of deleterious mutations over the two subgenomes. While the differential accumulation of deleterious mutations between subgenomes could explain part of the differential expression between them there were also significant tissue-specific differences in subgenome dominance and convergence, indicating that adaptive changes may also have contributed to the evolution of the expression patterns of the two subgenomes.
Material and methods
Samples, sequencing and data preparation
We obtained the whole genome and RNA-Seq data from flower, leaf and root tissues of (i) 16 accessions of C. bursa-pastoris coming from already characterized populations from Europe (EU), the Middle East (ME) and Eastern Asia (ASI) (33) and from hitherto unstudied Central Asian populations (CASI) and (ii) four accessions each of C. grandiflora and C. orientalis (Fig. 1). The genomic data included both published and newly sequenced genomes (Table S1). For newly sequenced genomes, DNA was extracted from leaves with the Qiagen DNeasy Plant Mini Kit, libraries were prepared using the TruSeq Nano DNA kit, and 150-bp paired-end reads were sequenced on Illumina HiSeqX platform (SciLife, Stockholm, Sweden). All 72 RNA-Seq libraries (24 accessions×three tissues) were sequenced in this study. For RNA sequencing, seeds were surface-sterilized and germinated as dexsscribed in (34). Seedlings were then transplanted into pots (10×10×10cm) filled with soil seven days after germination and cultivated in one growth chamber (22°C, 16:8h light/dark period, light intensity 150 µmol/m2/s). Seven days after the onset of flowering, flower buds, leaves, and roots were collected, snap-frozen in liquid nitrogen, and stored at −80°C before extraction following manufacturer protocol (Plant Total RNA Kit (Spectrum) for flower buds and leaves, and RNeasy Plant Mini Kit (Qiagen) for roots). RNA sequencing libraries were prepared using the TruSeq stranded mRNA library preparation kit including polyA selection and sequenced for 125-bp paired-end reads on Illumina HiSeq 2500 platform (SciLife, Stockholm, Sweden). Sequencing of new samples yielded an average library size of 57 million reads for DNA sequencing and 59 million reads for RNA-Seq.
DNA and RNA-Seq reads were mapped to the C. rubella reference genome (37) with Stampy v1.0.22 (38). To account for the divergence from the reference genome, the substitution rate was set to 0.025 for C. bursa-pastoris, 0.02 for C. grandiflora, and 0.04 for C. orientalis. On average, 85%, 90% and 85% of the DNA reads were successfully mapped for the corresponding three species and 98% in all species for RNA mapping. This yielded an average coverage of 51x and 52x for DNA and RNA data, respectively. Genotyping of DNA and RNA-Seq alignments was performed using HaplotypeCaller from the Genome Analysis Tool Kit (GATK) v3.5 (39) as described in (35). The subgenomes of C. bursa-pastoris were phased with HapCUT version 0.7 (40) following the procedure by (35). The quality of this phasing procedure was ascertained by comparing the phased subgenomes with the subgenome assembly obtained by (36). The un-phased expression data was generated for non-overlapping feature positions (option: -m union) using the htseq-count program from HTSeq v0.6.1 (41). To compare the expression between the two subgenomes of C. bursa-pastoris, homeologue-specific counting of alleles was performed using ASEReadCounter from GATK and phased according to the phased genomic data. We analyzed only the counts of SNPs that showed no strong deviation from the 0.5 mapping ratio in DNA data defined with a statistical model developed by (42). To make the homeologue-specific count data of C. bursapastoris comparable with parental read count data (allelic counting underestimates the expression of genes with a low number of heterozygous sites), we scaled the homeologue-specific counts using the unphased data and the allelic ratio in the phased data.
Population structure
In order to assess the relationship of the newly obtained Central Asian samples with other populations, we characterized population structure through a principal component analysis (ade4 R package (43)) and by reconstructing a phylogenetic tree from the genomic Single Nucleotide Polymorphisms (SNPs) of the 24 accessions (neighbor-joining algorithm on absolute genetic distance, ape R package (44)). Relationships between samples were also explored in the expression data using a principal component analysis (ade4 R package (43)) and hierarchical distance clustering with bootstrap support (pvclust R package, (45)).
Gene expression analyses
Given that the gene expression patterns in homeologue-specific and total expression can produce different results (46), we performed the differential gene expression analyses on both the unphased data and phased data. We first assessed the differences between populations of C. bursa-pastoris and parental species in unphased data by partitioning the analysis between tissues and populations. In a second step, the phased data was used to assess homeologue-specific expression differences between subgenomes in different tissues and populations. Finally, we analyzed the differences in expression patterns in parental diploid species and C. bursa-pastoris by classifying expression patterns into categories in both phased and unphased data.
Differential gene expression analyses were carried out in edgeR (47). The TMM normalization for different library sizes (47) was used for differential gene expression analyses, while for all other analyses, we used the count per million (CPM) normalization (one was added to every gene count to bypass log-transformation of zero expression). Phased counts were normalized by the mean library size of the two subgenomes and only genes showing no strong mapping bias were retained (see below). For both datasets (unphased or phased), only genes with at least one sample having a non-zero expression in every population/species were kept.
Differences between the two subgenomes (homeologue-specific expression were assessed with the integration of the information from both RNA and DNA data to exclude highly biased SNPs and to account for the noise in read counts due to statistical variability. The data were analyzed using the threestage hierarchical Bayesian model for allelic read counts developed by (42). The model was implemented using Markov chain Monte Carlo (MCMC) with 200,000 iterations with burn-in of 20,000 and thinning interval of 100. Each analysis was run three times to assess convergence. The significance specific expression (HSE) was defined from a Bayesian analog of the false discovery rate (FDR < 0.05). Expression patterns in C. bursa-pastoris and its parental species were classified into categories based on significant and non-significant differential expression defined with edgeR (47). We considered the four genomes/subgenomes (CG, CO, Cbpcg, and Cbpco) and three possibilities for each of the six pairwise comparisons (significantly over, under or equally expressed, FDR < 0.05), and grouped the resulting combinations into seven main categories: No difference, Intermediate, Legacy, Reverse, Dominance, Compensatory drift, and Transgressive (see the results for categories description). We also performed similar analysis for the un-phased total C. bursa-pastoris expression (thus considering only three pairwise comparisons) by classifying the expres-sion patterns into four major categories: 1) no differential expression, when no significant differences are detected in any of the three pairwise comparisons, 2) intermediate, when the expression of C. bursa-pastoris (Cbp) is intermediate between C. grandiflora (CG) and C. orientalis (CO), 3) dominance of one of the parents over the other, when the mean expression of C. bursa-pastoris is equal to only one parental species and the two parents are significantly different, and finally 4) transgressive, when the mean expression of C. bursapastoris is outside the range of expression of the two parents and significantly different from at least one parent.
Similarity and Convergence indices
To quantify the similarity between each subgenome expression level and the expression level in the parental species, we developed a similarity index (S). For each transcript i and each subgenome j, S was computed as the subgenome relative expression deviation from the mean expression level (E) in the parental species, µi = (EiCO + EiCG)/2:
This index is centered on 0, so that if Sij < 0 or Sij > 0, the expression of a given transcript in a given subgenome is more similar to the expression of that transcript in CG or CO, respectively. The difference between the absolute values of median Si for CbpCg and CbpCo was used as a measure of dominance between the subgenomes, ∆S= |SCbpCo| − |SCbpCg|.
Finally, for each gene that was differentially expressed between the two parental species, a convergence index, C, was computed from the absolute difference in expression for:
subgenomes:
parental species:
each subgenome and the opposite parental species:
These differences correspond to the phylogenetic distances (Fig. 1A). In principle, if the regulation of gene expression in CbpCg is independent of the regulation of gene expression in CbpCo, then the overall ∆sub, ∆par, ∆Cgand ∆Coare expected to be equal. To compare these quantities, for each transcript i, we used a convergence index (Ci):
So, CCbp measures the expression convergence of CbpCg toward CbpCo, CCbpComeasures the expression convergence of CbpCo toward CbpCg, and CCbp measures the over-all subgenomes convergence within Cbp. ∆xstands for either ∆Co, ∆Cgor ∆sub, respectively. Ci thus ranges from −1 to 1, with positive values indicating more similar expression between the subgenomes of C. bursa-pastoris than between parental species, and negative values indicating increased differences between subgenomes; the closer Ci to 0, the more similar are the expression patterns to parental species.
Gene ontology enrichment test
For various lists of genes of interest detected in the analyses described above, gene ontology (GO) enrichment tests were performed using the topGO R package (48). The GO term annotation was downloaded from PlantRegMap (http://plantregmap.cbi.pku.edu.cn/download.php#go-annotation) and used as a reference set for topGO (i.e., custom input). Fisher’s exact-test procedure (weight algorithm) was performed to assess the enrichment (p < 0.05) for either molecular functions (MF) or biological processes (BP). Finally, the REViGO software (49) was used to remove GO terms redundancy and to cluster remaining terms in a two-dimensional space derived by applying multidi-mensional scaling to a matrix of the GO terms semantic similarities. Cytoscape v3.6.1 was used to visualize GO terms networks (50).
Difference between species and subgenomes in dele-terious mutations
To compare the number of deleterious mutations between the two subgenomes of C. bursa-pastoris, we classified mutations into tolerated and deleterious ones (DEL) using SIFT4G (51). We used C. rubella (35) and Ara-bidopsis thaliana (TAIR10.22) SIFT4G reference databases. This helps avoid reference bias towards C.rubella away from calling mutations to be deleterious in the C. grandiflora homeologue. We considered only the mutations that accumulated after speciation of C. bursa-pastoris and identified mutations specific to C. grandiflora, C. orientalis, the two subgenomes of C. bursa-pastoris, and Neslia paniculata that was used as an outgroup here. All estimates were relative to the total number of SIFT4G annotated sites to minimize the bias associated with variation in missing data as in (35). Only the European and Middle Eastern populations were used in further analysis of the distribution of deleterious mutations, in order to exclude the effect of gene flow between C. orientalis and the Asian population of C. bursa-pastoris (35).
We assessed the distribution of deleterious mutations between the two subgenomes of C. bursa-pastoris to test whether they accumulated (i) more in one gene copy than in the other at the homeologue level, as would be expected under a pseudogenization process, (ii) more in one subgenome than in the other as expected if one subgenome predominates. Under the null hypothesis (random accumulation with-out subgenome bias) the distribution of deleterious mutations between the two subgenomes should follow a binomial distri-bution with mean 1/2. Under the first hypothesis, the distribution should be more dispersed with the same mean, which can be modeled by a Beta-binomial distribution. Under the sec-ond hypothesis, the mean should differ from 1/2. However, over-dispersion and bias can also occur because of missing data and sampling error, we thus used synonymous mutations (SYN) to control for this and built the correct null distribution. To do so, we developed a maximum likelihood method implemented in R (52) as follows. First, we identified a most likely probability distribution model by fitting four models to the SYN dataset, where nSY N is the sum of SY N mutations occurring on both homeologous genes and kSY N is the number of SY N mutations occurring on CbpCg genes. The four models are:
M1: kSY N ~ B(nSY N, 0.5), a binomial distribution with no bias between CbpCg and CbpCo,
M2: kSY N ~ B(nSY N, 0.5 + b), a binomial distribution with bias,
M3: kSY N ~ BB(nSY N, 0, φ), a beta-binomial distribution with no bias,
M4: kSY N ~ BB(nSY N, b, φ), a beta-binomial distribution with bias.
For convenience, the beta-binomial distribution: was re-parameterized as:
where and (53, 54). In this way, the parameter b was a measure of the bias towards the CbpCg genes, and φ was a measure of the variance of the probability that a mutation is found within the CbpCg homeologues, and can be interpreted as an index of overdispersion. A large value of φ indicates that mutations tend to accumulate preferentially in one of the two homeologous genes, and a small value of φ indicates that mutations are more evenly distributed between them. We calculated the likelihood of each model and chose the best-fitting model with a hierarchical likelihood ratio test (hLRTs).
After choosing the beta-binomial distribution with bias as the most likely null distribution, we estimated the parameters b and φ. We introduced a new set of models to test for the specific features of the distribution of deleterious mutations:
The null model assumes that both parameters b and φ are the same for the SY N and DEL datasets, while the alternative models allow the DEL dataset to have different parameters from the SY N dataset: only bDEL, only φDEL, or both bDEL and φDEL were allowed to vary. We calculated the likelihood of each model, chose the best fitting model with hierarchical likelihood ratio tests (hLRTs) and estimated the parameters of the selected model. Bootstrap estimates of confidence intervals were estimated with 1000 bootstrap replicates.
Relationship between deleterious mutations and gene expression
The SIFT4G annotation of the C. rubella database was used to match the gene IDs of the mutation and expression data. For each tissue, the relationship between the bias in the number of deleterious mutations between subgenomes and the bias in homeologue expression was investigated by calculating, for gene i in accession j, the difference (dij) in the number of deleterious mutations (DEL) between homeologous gene pairs:
The expression ratio between the homeologues of genes with significant HSE was used as a measure of homeologue expression bias:
Genes were further classified into four categories according to the deleterious mutations bias, d, and homeologue expression bias, e:
Genes with no bias in the distribution of deleterious mutation (d = 0) or no significant HSE (FDR < 0.05) were removed from the analysis. Fisher’s exact test was then used to test for independence between the difference in the number of deleterious mutations (d) and homeologue expression bias (e). As a control, the whole analysis was reproduced with dij computed from the number of synonymous mutations in genes with no DEL mutations. In addition, we also compared the number of silenced genes (genes with zero expression values) of each subgenome of C. bursa-pastoris, to check if there was a relationship between genetic load and silenced genes.
Results
Population genetic structure
The SNP-based PCA (670K genomic SNPs without any missing data) confirmed the phylogenetic relationships between C. grandiflora (CG), C. orientalis (CO) and C. bursa-pastoris (Cbp) described in (33– 35). The first principal component (PC) explained most of the variance (66%) and clearly discriminated CG and the CbpCg subgenome from CO and the CbpCo subgenome (Fig. 2, top-left panel). To investigate further the population structure within C. bursa-pastoris, we then focused on genetic variation in each subgenome (Fig. 2, middle and bottom-left panels, respectively for CbpCg and CbpCo). In both cases, there were three main clusters gathering accessions from Europe (EUR), Asia (ASI), and the Middle East (ME), respectively. Accessions from Central Asia (CASI) tended to cluster with European accessions for both subgenomes, even if they were more scattered (especially for the CbpCg subgenome). A phylogenetic analysis also confirmed that the new samples from Central Asia were most similar to the European genetic cluster and showed that they did not form a separate genetic cluster (Fig. S1).
Global variation in gene expression reflects genetic relationships
Pairwise comparisons of a number of differentially expressed (DE) genes between species (unphased data, 16,039 genes) showed that patterns of expression varied across tissues. First, the number of differentially expressed genes between the parental species was the highest in flower tissues, while leaf tissues were the least differentiated (Table S2). Second, in flowers, overall gene expression of C. bursa-pastoris was the closest to C. orientalis, while in the two other tissues it was the closest to C. grandiflora (Table S2). At the population level, no clear pattern appeared: for instance, ME accessions were the closest to C. grandiflora in roots, while ASI accessions were the closest to C. grandiflora in leaves and CASI accessions in flowers (Table S3).
Gene expression variation was then surveyed in 11,931 genes for which phased expression of the two subgenomes was available in all populations of C. bursa-pastoris. Clustering of population/species mean expression values confirmed that the main difference in overall expression variation was between tissues (Fig. S3). The principal component analyses of the three tissues separately (Fig. 3) revealed that the global variation pattern in gene expression reflected phylogenetic relationships (Fig. 3 and Fig. S2). The two subgenomes of C. bursa-pastoris were most similar to their corresponding parental genome along the Dim1, i.e. expression in the CbpCg subgenome grouped with C. grandiflora, and the CbpCo subgenome grouped with C. orientalis. The Dim2 reflected population structure; here again CASI accessions grouped with EUR accessions.
Testing for homeologue-specific expression (HSE) in C. bursa-pastoris showed that on average 4,096 genes (~34%) per sample were significantly differentially expressed between the two subgenomes (FDR < 0.05). The expression ratio between subgenomes (defined as was on average 0.496 across all genes and 0.493 across genes with significant HSE indicating no strong bias towards one of the subgenomes (Table S4). The ratio in DNA reads was 0.497 and thus there was no strong mapping bias towards either subgenome. Analyses of differential expression revealed no bias in the number of differentially expressed genes toward one subgenome either when comparing tissues (Table S5A, flowers and leaves being the most differentiated tissues and leaves and roots the least) or Cbp populations (Table S5B, Middle East and Asia being the most distant, except for CbpCo in flowers, while Europe and Central Asia are the closest).
Strong parental legacy and both cis- and trans-regulatory changes
In order to investigate thetotal expression level changes in C. bursa-pastoris after C. grandiflora and C. orientalis hybridization, expression patterns of unphased data across the three species were classified into four categories: No difference, Intermediate/Additivity, Dominance and Transgressive (Table 1). Up to 55-80% of the genes in C. bursa-pastoris were expressed at the same total level as in the parental species and 5 to 10% showed levels of expression intermediate to that of parental species. The dominance of one parental species over the other was most evident in flowers and roots. In flowers, ~14% of C. bursa-pastoris genes were expressed at the same level as in C. orientalis but differed significantly from C. grandiflora, and ~8% were expressed at the same level as in C. grandiflora but at a different level than in C. orientalis. The opposite dominance pattern was detected in the root tissue. Finally, a transgressive expression pattern, when expression levels in C. bursa-pastoris exceeded or were lower than the expression level of both parents, was detected in 8-16% of genes.
Gene expression in C. bursa-pastoris was further investigated by assessing the relative importance of cis- and trans-regulatory elements. The expression ratio of the two subgenomes was compared to the expression ratio between the two parental species (Fig. 4A). For a given gene, if its expression in the homeologous genes of C. bursa-pastoris is only regulated by cis-regulatory changes, it should be completely explained by the divergence between the parental species (the diagonal line in Fig. 4A). On the other hand, if homeologous genes are equally expressed in C. bursapastoris but not in the parental species, this means that Cbp expression is mainly controlled by trans-regulatory elements (the horizontal line in Fig. 4A) (19). First, the relationship between expression ratios in C. bursa-pastoris and parental species was positive and highly significant for all three tissues (p < 0.001), and the slope was intermediate between what would be expected if there were either only cis- or only trans-regulatory changes (β = 0.37, 0.42 and 0.46, respectively for flowers, leaves and roots). This indicates a strong parental legacy effect in expression of the two subgenomes of C. bursa-pastoris and suggests a joint effect of cis- and trans-regulation. Second, the variance of the expression ratio between subgenomes was significantly smaller than the variance of the expression ratio between parental genomes (Fisher’s variance test, all p < 0.001), indicating that the two subgenomes are closer to each other than the parental genomes are, therefore supporting a co-regulation of the two subgenomes through a mixture of trans- and cis-regulation (19, 30). Finally, the slope of the regression between the two expression ratios was the weakest in flowers, suggesting a slightly stronger trans-regulation and a higher level of constraints in this tissue than in roots and leaves (30).
As mentioned above, subgenome expression level relative to parental species expression can help to disentangle the role of cis- and trans-component on overall gene regulation. We thus classified the expression patterns between the two subgenomes and parental species in seven main categories (see Fig. 5, an example for flower tissues). The majority of the transcripts was not differentially expressed between parental genomes and subgenomes (No difference category), ranging from 60% in flowers to 81% in leaves (Table 2).
However, the slope of the regression between relative expression of subgenomes and parental species clearly indicated that, even if the expression levels were not significantly different between parental species and C. bursa-pastoris subgenomes, crossed trans-regulation tended to make the two subgenomes expression closer to each other than to either parental species (Fig. 4B “No difference” and Table 2). About 9% of genes had an Intermediate/Additive expression, i.e., the expression of both sub-genomes being in between the expression of the two parental species. As expected this pattern was due to a combination of both cis- and trans-regulation (β 0.3 0.4). Only 3% showed a strict legacy of parental species expression which is primarily due to cis-regulation (β 1). About 4% of the genes showed a Dominance pattern of either CG or CO genetic background (categories 6a and 6b, Fig. 5) but the relative proportion of each background varied largely among tissues: in flowers, 76% of the transcripts showed a dominance of CO, while there were only 45% and 34% genes with the dominance of CO in leaf and root tissues (Table 2). This pattern seems to be due to a dominance of transcription factors from one subgenome over the other (β ≈ 0.5 0.1); in favor of CO genetic background in flowers and CG in leaves and roots (Fig. 4B and Table 2). Finally, 3% of the genes had a Compensatory-drift profile (parental species expressions are similar but subgenome expressions diverge), a mere 0.4% showed a Reverse profile (each subgenome expression is similar to the opposite parental species) and about 10% of the transcripts showed a Transgressive pattern, either because of one (categories 7a and 7b) or of both subgenomes expression (category 7c) (Fig. 4B and Table 2). These last profiles are less straightforward to interpret in terms of cis-and trans-regulation pattern as they involve more complex post-hybridization regulation processes.
Finally, if the relative proportion of the different categories were globally conserved across tissues (Table 2), expression patterns of individual genes were strongly tissue-specific. In our data, only half of the genes showed the same expression pattern in all the three tissues. The most conserved category was No difference 77% and the least conserved one was Compensatory-drift 3%. Pairwise comparisons between tissues revealed that the number of genes with expression pattern changed between tissues was the largest between flowers and roots tissues (42%) and the smallest between leaf and root tissues (33%).
To conclude, only about 10% of the 11,931 transcripts had a transgressive or a reverse expression pattern. Expression patterns were poorly conserved between tissues except for the No difference category, indicating that the evolution of expression regulation is highly tissue-specific. Flower tissue differed the most from the two other tissues. In addition to a lower proportion of differentially expressed genes, flower tissues also had the lowest proportion of Transgressive category in the differentially expressed genes, indicating that when expression changes occurred, they either took place within the expression range of the parental species or they were compensated by the other subgenome (Compensatory-drift). This suggests a higher level of constraints on gene expression in flower tissues than in leaves and roots. Moreover, in flowers, the CO genetic background clearly dominates over the CG background, in striking contrast with the dominance of the CG genetic background in the other two tissues. Finally, expression profiles are more conserved between leaves and roots than between flowers and roots.
Expression similarity and convergence between subgenomes: flowers differ from roots and leaves
To understand better the joint dynamics of expression in the two subgenomes across tissues, we defined a new similarity index, S, that measures the relative expression deviation of a given subgenome from the average parental expression. S indices of both subgenomes were systematically biased towards the corresponding parental genome, i.e. CbpCg towards CG and CbpCo towards CO (binomial test, all p < 0.001) but the strength of this bias differed between subgenomes and across tissues (Fig. 6A). The distributions of S values for leaf and root tissues were more spread than the distribution for flowers, meaning that the relative expression in the two sub-genomes was globally less constrained in these tissues than in the flower tissue (Fig. S4).
In flowers, median S values for genes that showed significant differential expression between parental species (FDR < 0.05) showed dominance of the CbpCo over the CbpCg subgenome (∆S= 0.07), while the opposite pattern – i.e. dominance of CbpCg over CbpCo – was observed in leaves and roots (∆S= −0.08 and −0.14, respectively; Fig. 6A). This pattern was also observed when considering all genes, though it was less pronounced (Fig. S4). Such a dominance cannot only be due to the genes showing strict dominance of one genetic background (Dominance category, ~3 5%), but rather indicate a more global dominance of trans-regulation of one subgenome. Indeed, even if S indices tended to show a large legacy of parental genome expression, positive correlations between SCg and SCo (Spearman’s ρ, all p < 0.001) confirmed that both subgenomes were co-regulated in the same direction (Fig. S4), towards C. orientalis in flower tissues and towards C. grandiflora in leaf and root tissues.
Finally, in all tissues, most convergence indices were positive (Fig. 6B and S5), indicating that the difference in gene expression between subgenomes (∆sub) was generally lower than the difference between parental species (∆par); also, the larger the difference in expression between parental species, ∆par, the stronger the convergence between subgenomes, CCbp (Spearman’s ρ = 0.63, ρ = 0.74, ρ = 0.66, respectively for flowers, leaves and roots; all p < 0.001). However, although the overall degree of convergence was the same in the three tissues, the convergence was not symmetrical between the two subgenomes. In flowers, CbpCg tended to shift more towards CbpCo than the converse (CCg > CCo, Fig. 6B), while the opposite was true in the two other tissues (CCo > CCg, Fig. 6B). This explains the dominance patterns observed through the S indices and confirms the role of un-balanced trans-regulation in that system.
Genes showing converging expression patterns are enriched for specific functions
Regardless of the tissue considered, the expression profiles did not correspond to specific physical clusters along the genome with transcripts belonging to a given profile being spread across the genome: for each scaffold and each category, the average distance (bp) between two transcripts randomly sampled within a given category was not significantly different than that of two transcripts randomly sampled in different categories (Wilcoxon-Mann-Whitney’s test, all p > 0.05, Fig. S6). This suggests that the differential expression is not driven by large-scale epigenetic changes along chromosomes.
Gene ontology analyses, revealed, however, that the different expression profile categories (Fig. 5) were enriched for different molecular functions (MF, average overlap between categories: 17, 20 and 19% for flowers, leaves and roots tissues, respectively, Table S6A) and biological processes (average overlap, 10, 7 and 13%, Table S6B), though neither MF nor BP of a given category tended to cluster into specific networks). At the tissue level, the different expression profile categories were enriched for different MF and BP with a small average overlap between tissues (MF, 7% and BP, 7%, Table S7A and B), highlighting the specificity of expression regulation in different tissues.
We showed above that the main difference in expression between tissues was in the convergence of the two subgenomes: in flowers, CbpCg expression pattern converged toward that of CbpCo, while for the two other tissues convergence was in the opposite direction (CbpCo toward CbpCg). We tested whether the transcripts showing a convergence of CbpCg toward CbpCo (hereafter, ConvCo genes) or a convergence of CbpCo toward CbpCg (hereafter, ConvCg genes) were enriched for different molecular functions and biological processes. The two gene sets, ConvCo or ConvCg genes, were indeed enriched for GO terms belonging to different clusters (Fig. S7). For instance, in the flower tissues, ConvCo genes are enriched for biological processes involved in the transition between vegetative and reproductive phases, the dormancy of floral meristems and male meiosis, while ConvCg genes were enriched for cell redox homeostasis and related biological processes (Fig. S7 and Fig. S8). As expected, underlying molecular functions also tended to group into distinct clusters corresponding to different functional networks (Fig. S7 and Fig. S8). Finally, the two gene sets were also enriched for the same biological processes (e.g., fatty acid biosynthesis in flowers, sucrose and carbohydrate metabolisms in leaves and general metabolism in roots) or molecular functions (e.g, RNA, nucleotide and GTP binding or MF related to transporter activity, Fig. S7 and Fig. S8) indicating concerted changes of gene expression between the two subgenomes.
Deleterious mutations accumulate preferentially on the C. orientalis subgenome and are associated with the level of expression
Among the 11 million genomic sites segregating across the five genomes, about 3 million alleles were specific to the Capsella species, and 669,675 of these species-specific alleles were annotated by SIFT4G with the A. thaliana SIFT database, and 432,354 of them were annotated with the C. rubella database.
The estimated proportion of deleterious mutations among species and among the four populations of C. bursa-pastoris were similar independently of whether A. thaliana or C. rubella was used for SIFT4G annotation (Fig. 7A and Fig. S9A). Despite a lower number of accessions, the same pattern as in (35) was observed: i) the C. grandiflora genome had a lower proportion of deleterious mutations than C. orientalis or either subgenome of C. bursa-pastoris ii) within C. bursa-pastoris, the CbpCg subgenome always had a lower proportion of deleterious mutations than the CbpCo subgenome of the same population and iii) among the C. bursa-pastoris populations, both subgenomes of the Asian population had a higher proportion of deleterious mutations than the corresponding subgenomes in the other three populations, indicating a higher rate of mutation accumulation in this population. The proportion of deleterious mutations of the newly added CASI population was most similar to that of the EUR population with a larger variance of the proportion of deleterious mutations carried by CbpCg subgenome of CASI accessions (Fig. 7A).
Mutation accumulation pattern between the two subgenomes was further investigated by estimating the mutation accumulation bias towards CbpCg, b, and the overdispersion parameter φ. b was positive for SY N indicating a mapping bias towards CbpCg. b was also positive for DEL mutations in all accessions (Fig. S10), but much smaller than SYN and therefore bDEL bSYN was negative (Fig. 7B). This indicates a general bias towards more DEL mutations in the CbpCo subgenome (bDEL < bSYN). The same pattern was observed for φ (φDEL < φSYN), the difference φDEL φSYN being also negative (Fig. 7B and Fig. S10). Hence, contrary to the expectation of the pseudogenization process, the distribution of deleterious mutations was less over-dispersed than expected at random, suggesting that the accumulation of too many deleterious mutations per gene is prevented, a mechanism that might contribute to the maintenance of both homeologue copies. However, it should be noted that more silenced genes were observed in CbpCo than in CbpCg. (Fig. S11). Finally, a significant association between the deleterious mutations bias (dDEL) and the homeologue expression bias (e) was found for all three tissues (Fisher’s exact test, all p < 0.001): the categories where deleterious mutations and expression bias varied in the same direction (i.e., dDEL > 0 and e > 0 or dDEL and e < 0) were over-represented (Table S8). The homeologue copy carrying the highest number of deleterious mutations thus tends to show the lowest expression level. No such association was found when considering only synonymous mutations (Fisher’s exact test, p = 0.57, 0.74 and 0.27 for flowers, leaves and roots tissues, respectively), confirming that the association between deleterious mutations and expression level was not the result of a mapping or annotation bias toward one of the two subgenomes (Table S8).
Discussion
The events accompanying the birth of a polyploid species have often been described in rather dramatic terms, with expressions such as “transcriptomic shock” or “massive genome-wide transcriptomic response” often used (e.g. (7, 55, 56)). The early and formative years of a young polyploid might indeed be eventful, but what happens afterward may well be less dramatic, especially for tetraploid species with a disomic inheritance such as the shepherd’s purse. In the present study, we compared some of the genomic and transcriptomic changes that occurred between C. bursa-pastoris and its two parental species C. grandiflora and C. orientalis. Overall, the emerging picture is one of an orderly and rather conservative transition towards a new “normal” state. A conservative transition, because after around 100,000 generations we can still detect a significant parental legacy effect on both the number of deleterious mutations accumulated and gene expression patterns. And an orderly one too, since the emerging pattern of expression involves a balance between cis-and trans-regulatory changes suggesting the emergence of coordinated functioning of the two subgenomes. This general impression of a non-stochastic transition process to polyploidy (57) is reinforced by the variation in patterns of gene expression across the three tissues: as one would expect, the expression of both subgenomes in selfing C. bursa-pastoris was biased towards the selfing parent C. orientalis in flower, whereas in leaf expression of the two subgenomes were mostly similar, and in roots expression was biased towards C. grandiflora. This expression bias towards the C. orientalis subgenome in flowers despite a higher accumulation of deleterious mutations in this subgenome suggest that the evolution of gene expression is not entirely random.
Demography and expression: a limited effect of introgression?
Previous studies have stressed the importance of population structure and demographic history in genomic and transcriptomic studies of C. bursa-pastoris (34, 35); (35), for instance, showed a significant introgression of C. orientalis genetic background into Asian populations of C. bursapastoris. In the present study, we indeed showed that overall gene expression pattern reflected the main phylogenetic relationships. Each subgenome was the closest to the parental species it was inherited from and populations from close ge-ographic areas tended to cluster together, except for Central Asian accessions (CASI), which clustered with European ones even if they were geographically closer to the Asian or Middle-East ones. Most likely these samples were recently introduced to Central Asia, as it was suggested for C. burs pastoris accessions with European ancestry inhabiting the Russian Far East (33).
When comparing the number of differentially expressed genes between C. bursa-pastoris and parental species, no specific trend was detected and Asian accessions were not the closest to C. orientalis as one would have expected because of introgression. In leaf and roots tissues ASI was even closer to C. grandiflora than to C. orientalis. This can be explained by the fact that the vast majority of the genes (up to 80%) did not show any difference in expression (thus hiding a more subtle signal). Assessing the influence of introgression on expression pattern would require a more thorough investigation, for instance by focusing on genes for which in-trogression was actually characterized.
Transition to polyploidy: compensatory cis-trans-effects and stabilizing selection
As mentioned above, in the case of a newly formed allopolyploid species one would expect the two copies of a gene to be under the influence of trans-regulatory elements inherited from both parents and its expression level to first move towards the mean expression of the two parental species. However, different forces could lead to an excess of divergence in subgenome expression compared to what would be expected under a pure drift model. Polyploidy creates a large redundancy in gene function that should free one of the copies from purifying selection. Generally, the copy carrying more deleterious mutations is expected to degenerate, biasing the expression pattern toward one of the two parental species, even if sub-or neo-functionalization can still occur but to a much lower extent. This ought to be particularly true for C. bursa-pastoris as one of its parental species, C. orientalis, is a selfer that has accumulated more deleterious mutations than the other parent, the outcrossing C. grandiflora (35). This process will be reinforced by the enhancer runways process (25), that should strengthen cis-acting elements from the CbpCg subgenome as the CbpCg subgenome has a higher heterozygosity and a lower genetic load than the CbpCo subgenome.
In our study, however, we did not observe any “transcriptomic shock” (as for instance in, (7, 55)) neither major homeologue expression remodeling and/or subgenome expression asymmetry (as in e.g. (58)). In contrast, our study, like some others before it (15, 57, 59, 60), instead suggests an overall conservation of the expression pattern in polyploids and hybrids. And even if a “transcriptomic shock” did take place during the formation of the tetraploid, expression changes have stabilized since then. Some 100,000 years later parental legacy on subgenome expression is still detectable and the two subgenomes’ expression patterns are still closer to each other than that of parental species, clearly indicating that none of the subgenomes has degenerated; as expected, however, the CbpCo subgenome carries more silenced genes and a higher proportion of deleterious mutations than CbpCg. Most of the genes were under both cis- and trans-acting elements; the No difference and Intermediate expression categories represented up to 70 to 80% of genes depending on the tissue considered. Only a small fraction (5 to 10%) of genes showed either almost pure cis- (Legacy category) or trans- regulation (Dominance category). While the former can be explained by the absence of crossed trans-regulation, the latter could be due to the dominance of transcription factor of one subgenome over the other; though, in both cases, post-hybridization mutations affecting either cis-or trans-acting elements or both could have evolved. The remaining fraction (up to 15%, Reverse, Compensatory-drift and Transgressive) showed a more complex pattern that is hard to assign to a simple factor but could be in part due to new, intertwined cis- and trans-regulation across subgenomes. It should be noted that such patterns can naturally emerge after hybridization as a byproduct of stabilizing selection on diverging optima (61) for Transgressive profiles, on the overall amount of protein produced for Compensatory-drift profile, and on intermediate level of expression for Reverse profile, without invoking additional specific processes. To address further this question, it would be interesting to compare auto-and allopolyploids to tease apart the effects of hybridization and genome doubling.
Even though this does not, in any way, alter the conclusion above, we also would like to note here that the classification of overall expression patterns used in Table 1 and 2 in different categories is somewhat arbitrary as some expression patterns are ambiguous and could have been classified in different categories. It should also be pointed out that these classifications were dependent on the chosen False Discovery Rate (FDR). As a control, we reproduced the analysis based on unphased data of Cbp expression, with FDR< 0.01 and 0.1 (Table S9). It indicated that the number of genes within the different categories can vary substantially with the different FDR level (mainly because of variation in No difference category), however, the main patterns were not altered. Moreover, the main pattern of variation we described was a change in dominance between tissue that is obviously not affected by the bias described before. In part to overcome the limitations inherent to any a priori classification, we developed the expression similarity index, S, that confirmed our conclusions.
Level of expression dominance varies across tissues and functions
Allopolyploid species are often examined for unequal expression between homeologous genes because of their hybrid nature but other aspects of gene expression have been less extensively studied. For example, there might be no difference in the relative expression of subgenomes (balanced homeologue expression), but the total amount of transcripts can vary and reflect the dominance of the level of expression of one of the parents (62). C. bursa-pastoris exhibits rather balanced homeologue expression, but the summed expression of the two homeologues shows differentiation across tissues with the dominance of C. orientalis expression level in flowers, and C. grandiflora level in leaves and roots. The genes with significant expression bias between subgenomes also show strong dominance of CbpCo expression over CbpCg in flower. However, a positive cor-relation between the expression deviation indices of the two subgenomes indicates that this dominance is not primarily caused by up-regulation or down-regulation of one parental copy, but rather unidirectional regulation of homeologous genes as it has been observed, for instance, in cotton and coffee (2, 30, 63). This convergence could be possible because of the low divergence between the subgenomes of C. bursapastoris and, hence, the absence of barriers for trans-acting regulation of homeologous genes.
An intuitive explanation of this bias in flower tissues could be that this simply reflects the fact that both C. orientalis and C. bursa-pastoris are selfing species with tiny flowers, in contrast to C. grandiflora, an outcrossing species that has large flowers. A way to test this hypothesis would be to compare C. orientalis with both C. grandiflora and C. rubella for the genes implicated in the bias towards C. orientalis using root tissues as a control. In contrast, in the non-reproductive leaf and root tissues, expression is biased towards the genome of the outcrossing C. grandiflora. Although this interpretation needs further validation, it stands against the genomic shock pattern that implies a disruption of expression patterns. Finally, although the bias of expression observed between homeologous genes is not strongly shifted towards either subgenome, it is not random either: one subgenome can dominate over the other for a given function or pathway in a given tissue, suggesting constrained evolution in gene expression regulation at a tissue/function level. In many cases, it is not straightforward to explain why a particular subgenome dominates for a particular function, and this could simply be the result of coincidence in neutral evolution of gene regulation networks. In other cases such as flower tissues, however, the observed dominance makes biological sense.
Both subgenomes of C. bursa-pastoris are maintained, but they are not equal
Redundancy of polyploid genomes often assumes evolution of non-functionalization of duplicated genes (64–66) or even of a whole subgenome (67–69). When one gene copy of a duplicated gene starts to degenerate, the purifying selection on that copy becomes weaker and the deleterious mutations accumulate further, while the other copy of the gene remains functional and under purifying selection. If non-functionalization is prevalent, deleterious mutations are expected to be more unevenly distributed between the homeologous genes and even between the two subgenomes. We indeed observed more deleterious load in the CbpCo subgenome with the absolute load comparison and with the estimated parameter b indicating its degeneration. However, the dispersion for deleterious mutations indicated that they tend to be more evenly distributed between the homeologous genes than expected at random. This suggests that CbpCo genes cannot degenerate further after a certain amount of genetic load is accumulated. Thus, although the amount of accumulated genetic load differs between subgenomes of C. bursa-pastoris, both subgenomes are maintained and there is no large-scale non-functionalization at the gene and subgenome levels.
One might expect the differences between homeologues in accumulation of deleterious mutations would lead to bias in gene expression. For example, Arabidopsis suecica, like C. bursa-pastoris, is an allopolyploid species with parents characterized by different mating systems: the outcrossing Arabidopsis arenosa, and the selfing Arabidopsis thaliana (70). Chang et al. (71) observed a bias in expression in favor of the A. arenosa subgenome and, among other hypotheses, suggested that this bias could be due to the fact that mildly deleterious alleles are not purged as efficiently from the A. thaliana subgenome as from the A. arenosa subgenome. In C. bursa-pastoris, the CbpCo subgenome had a higher proportion of deleterious mutations than the CbpCg subgenome, but there was no strong bias in expression between subgenomes. However, when we paired the amount of derived deleterious mutations with the expression level of each gene and compared homeologous genes, we found that there was a significant association between deleterious mutation bias and expression bias (Table S8). The homeologous gene with more deleterious mutations tends to have a lower expression level than the other one. Moreover, we also found that there are more silenced genes in CbpCo, which is the subgenome with a higher proportion of deleterious mutations. These results are in accordance with the hypothesis that the bias in expression is linked to the accumulation of deleterious mutations. Yet, it is worth noting that the expression bias may not necessarily be the result of the biased distribution of deleterious mutations. The homeologue expression bias could also be the cause of the observed deleterious mutation bias, especially considering that we have only investigated the deleterious mutations in coding regions. Purifying selection on the homeologue with lower expression can be weaker (72), therefore it is less efficient in eliminating deleterious mutations. At any rate, the fact that we have a relative dominance of expression of CbpCo in flowers and of CbpCg in other tissues, despite CbpCo subgenome having a higher proportion of deleterious mutations than CbpCg, suggests that parental legacy and functional constraints may also play a major role.
Conclusion
In 1929, George Shull, one of the most prominent geneticists of his time (73), wrote: “It is considered a matter of fundamental significance that the increase in a number of chromosomes in the bursa-pastoris group is correlated with greater variability, greater adaptability, greater vigor, and greater hardiness”. In the present study, the merging of the two parental genomes was not accompanied by major disruptions of the transcriptome. Instead, there was a strong parental legacy and the emergence of a shift in the subgenome expression pattern towards a new “equilibrium” state reflecting the composite nature of the new species. Hence, being a selfer like its C. orientalis parent, there was a shift in flower tissues of the expression pattern of the C. grandiflora subgenome towards that of C. orientalis. Similarly, it seems also possible that the dominance of the C. grandiflora inherited subgenome in roots and leaves contributed to the high competitive ability of C. bursa-pastoris, which was similar to that of C. grandiflora but much higher than that of C. orientalis and C. rubella, its two self-fertilizing congeners (74, 75). It therefore seems that the present study, together with those more focused on fitness of C. bursa-pastoris (74, 75) contributed to better understanding of the causes of the correlation pointed out al-most 100 years ago by Shull.
AUTHOR CONTRIBUTIONS
ML, SG, and DK planned the study. DK obtained the data. DK and PM analyzed the genomic data. DK, PM, and MO analyzed the gene expression data. TD performed the genetic load analyses. DK, PM, TD, and ML wrote the article with inputs from all authors. SG, SIW, and ML supervised the project.
ACKNOWLEDGEMENTS
We thank Karl Holm, Kerstin Dalman, and Kerstin Jeppsson for help in the lab. Most of the analyses were carried out at Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under the project b2016212. This study was supported by grants from the Swedish Research Council and the Philip Sörensens Foundation to ML.