Abstract
Recent advances in genome resequencing have led to increased interest in prediction of the functional consequences of genetic variants. Variants at phylogenetically conserved sites are of particular interest, because they are more likely than variants at phylogenetically variable sites to have deleterious effects on fitness and contribute to phenotypic variation. Numerous comparative genomic approaches have been developed to predict deleterious variants, but they are nearly always judged based on their ability to identify known disease-causing mutations in humans. Determining the accuracy of deleterious variant predictions in nonhuman species is important to understanding evolution, domestication, and potentially to improving crop quality and yield. To examine our ability to predict deleterious variants in plants we generated a curated database of 2,910 Arabidopsis thaliana mutants with known phenotypes. We evaluated seven approaches and found that while all performed well, the single best-performing approach was a likelihood ratio test applied to homologs identified in 42 plant genomes. Although the approaches did not always agree, we found only slight differences in performance when comparing mutations with gross versus biochemical phenotypes, duplicated versus single copy genes, and when using a single approach versus ensemble predictions. We conclude that deleterious mutations can be reliably predicted in A. thaliana and likely other plant species, but that the relative performance of various approaches can depend on the organism to which they are applied.
Dramatically increased number of reference genomes, whole genome resequencing, and gene annotations have facilitated the discovery of sequence variants and increased interest in annotation of functional variants in many organisms. Functional annotation can yield insight into the genetic basis of phenotypic variation and is often a critical step in the identification of genes and variants underlying human disease (Ahituv et al. 2007; Cooper and Shendure 2011). In particular, interest in identifying putatively deleterious variants has increased, because these variants may contribute substantially to phenotypic variation (Manolio et al. 2009; Thornton et al. 2013). Most approaches assume that variants that disrupt a phylogenetically-conserved site are more likely to be deleterious (Ng and Henikoff 2006). Single nucleotide polymorphisms (SNPs) are the most abundant class of sequence variants. SNPs that alter amino acid sequences are more often associated with phenotypic variation than other classes of variants (1000 Genomes Project Consortium 2012; Fay 2013; Stenson et al. 2014). Amino acid substitutions in protein coding sequences are also the most readily identifiable class of variants that are likely to have biological impact; thus they have been the primary focus of variant annotation efforts.
Annotation of deleterious alleles is also relevant to understanding the genetic basis of phenotypic variation in other species. Complementation of recessive deleterious variants between haplotypes is thought to be one of the primary mechanisms underlying heterosis (Charlesworth and Willis 2009). This suggests that identification of deleterious alleles may be applied to hybrid breeding strategies (Yang et al. 2016). Annotation of deleterious variants improves prediction accuracy of complex traits (Dudley et al. 2012). Elevated proportions of deleterious relative to neutral variants in domesticated species suggest a cost of domestication (Cruz et al. 2008; Liu et al. 2017; Lu et al. 2006; Rodgers-Melnick et al. 2015). Studies of the genomic distribution and genetic contribution of deleterious variants can contribute both to understanding the origin and domestication of crop species and to advancing breeding and crop improvement strategies (Morrell et al. 2012).
Accurate prediction of deleterious variants is a key component of assessing their contribution to phenotypic variation. Numerous approaches for predicting deleterious variants have been developed (Ng and Henikoff 2006). The performance of an approach is typically assessed using the proportion of known, human disease causing variants that are accurately classified as deleterious. Benchmarking of various approaches using uniform test sets has shown substantial variability among approaches and improved performance is often achieved through the use of ensemble predictions based on multiple predictions (González-Pérez and López-Bigas 2011; Grimm et al. 2015; Olatubosun et al. 2012; Thusberg et al. 2011). However, the causes of performance differences across approaches are not well understood. While all approaches rely on sequence conservation at the phylogenetic level to identify deleterious variants, some approaches also incorporate protein structure, physical or biochemical properties of amino acid changes, or other attributes of protein sequence when they are available. The earliest conservation metrics used heuristic measures, sometimes including filtering or weighting to account for phylogenetic distance (Ng and Henikoff 2003). More recent approaches have incorporated evolutionary models that account for phylogenetic distance based on putatively neutrally evolving nucleotide sites (Davydov et al. 2010; Chun and Fay 2009). Reference bias and the alignments used to calculate conservation metrics are not often emphasized, but are important to accurate predictions and may account for some of the variability among predictions (Adzhubei et al. 2013; Chun and Fay 2009; Hicks et al. 2011). The accuracy of predictions is particularly dependent on the availability of annotated genomes among related species and the potential to generate sequence alignments, particularly for protein coding regions of the genome.
Studies of deleterious variants in non-human species are limited to a small subset of approaches that are not human-specific. Even so, there is a growing body of research that uses predicted deleterious variants to understand genomic patterns of variation and their contribution to complex traits, especially in plants. Patterns of deleterious variation have been examined in Arabidopsis thaliana (Cao et al. 2011), rice (Günther and Schmid 2010; Liu et al. 2017), maize (Mezmouk and Ross-Ibarra 2014; Rodgers-Melnick et al. 2015), sunflower (Renaut and Rieseberg 2015), poplar (Zhang et al. 2016), barley, and soybean (Kono et al. 2016). However, the accuracy of predictions in plants has only been examined for a small number of known variants (Günther and Schmid 2010) and only in the past few years have a diverse set of plant genomes and protein homologs become available (Goodstein et al. 2012). Furthermore, plants are known to have a larger number of multi-gene families and a higher frequency of polyploidy than occurs in mammals (Lockton and Gaut 2005). These genome-specific factors influence whether a sequence variant is truly deleterious (Charlesworth 2012). The model system A. thaliana is a particularly attractive plant species for evaluating approaches that predict deleterious variants because decades of basic research in development, physiology, cell biology, and plant-pathogen interactions have identified large numbers of amino acid altering mutations with phenotypic consequences.
To evaluate various tools to predict deleterious variants in plants, we generated a curated database of 2,910 A. thaliana mutants with known phenotypic alterations. We identified seven approaches that can predict deleterious variants outside of humans. Among these approaches, SIFT (Ng and Henikoff 2003), PolyPhen2 (Adzhubei et al. 2013) and PROVEAN (Choi et al. 2012) generate their own alignments using non-redundant protein databases, whereas MAPP (Stone and Sidow 2005), GERP++ (Davydov et al. 2010), and two versions of a likelihood ratio test (Chun and Fay 2009) make predictions using pre-specified alignments as input. For these latter cases we used the BAD_Mutations pipeline for identifying homologs and alignments based on 42 plant genomes (Kono et al. 2016). We found that while all approaches performed better than similar assessments in humans, the relative ranking and the highest performing approach differed from previously reported comparisons using human data. We did not find factors that are major determinants of differences among approaches. Our results demonstrate that reliable prediction of deleterious variants can be achieved in A. thaliana and likely other plant species, expanding the potential value of using deleterious variants to better understand naturally occurring variation and to improve crop breeding strategies.
Results
A database of literature curated Arabidopsis thaliana mutants
To evaluate approaches that predict deleterious variants, we generated a database of A. thaliana amino acid substitutions from i) mutants with described phenotypic alterations and ii) common amino acid polymorphisms unlikely to affect fitness. Out of 2,910 mutants in 995 genes, 81% were from manually curated entries in UniProtKB/Swiss-Prot (n = 2,368), 10% were from our own literature curation (n =293) and 8.6% were independently identified in both sets (n = 249) (Table S1). Within the same 995 genes, 1,583 common amino acid polymorphisms were identified in 80 accessions (Cao et al. 2011). For our analyses, we assume mutations that cause a deviation from the wildtype phenotype are likely deleterious.
Performance of approaches designed to identify deleterious variants
Using the database of A. thaliana mutations, we assessed seven approaches for their ability to distinguish deleterious and neutral changes. The approaches were selected because they can generate predictions in non-human organisms. Comparison of sensitivity to specificity showed that each approach could reliably distinguish deleterious and neutral substitutions (Figure 1). A likelihood ratio test (LRT) implemented using the BAD_Mutations pipeline showed significantly higher performance than all other approaches as measured by the area under the curve (AUC) of sensitivity versus specificity as well as at thresholds of 95% sensitivity and 95% specificity (Figure 1, Table S1). A reference masked version of LRT (LRTm), designed to eliminate reference bias (Simons et al. 2014), was the approach with the second highest performance. PROVEAN and PolyPhen2 showed similar performance as measured by AUC, significantly higher than SIFT, GERP++ and MAPP. The relative ranking by AUC was identical when 1,050 mutations with missing predictions for at least one approach were removed (Table S1).
A second means of assessing performance is through comparing predictions of rare versus common variants. Common variants are likely neutral or nearly neutral, whereas deleterious alleles are kept at low frequency (Ewens 2004). Using SNPs identified in a set of 80 A. thaliana strains, we found each approach identified more deleterious SNPs at low compared to common frequencies (Figure 2). At minor allele frequencies between 2/80 (2.5%) and 8/80 (10%) the LRTm and SIFT predicted a lower proportion of deleterious SNPs compared to the other approaches, indicating that they are less sensitive to detecting alleles under weak selection. At the lowest frequency 1/80 (1.25%), which is expected to include many rare and potentially strongly deleterious variants, LRT called the largest proportion of SNPs deleterious.
Performance across phenotypic and duplicate gene categories
To further characterize differences in performance we compared class of variants, including those identified by genome-wide mutant screens or by directly targeting individual proteins. In general, mutants identified from screens have gross morphological or easily observable phenotypic effects and are assigned allele names, whereas directed mutants are typically not given allele names and have biochemical phenotypes. To compare these two groups, we split the data into those with allele names (1,910), as a proxy for those with gross phenotypes, and those without allele names (1,000), as a proxy for biochemical phenotypes. As measured by AUC, some of the approaches performed better and their performance was more similar for the gross phenotypic class compared to the biochemical class (Figure 3a). Both SIFT and PolyPhen2 demonstrated the largest increase in performance for predicting mutations with gross phenotypic alterations. For this type of mutation, the performance of PolyPhen2 was comparable to the LRT.
Gene duplication may reduce prior selective constraints on a protein, enabling variants to occur at previously conserved sites (Kondrashov et al. 2002). Thus duplicated genes may pose challenges to predicting deleterious alleles and none of the approaches explicitly distinguish orthologs and paralogs. We identified 466 of the 995 genes as duplicated in A. thaliana based on blastp hits with 60% or more identity. We compared the performance of these genes to the remaining single copy genes. Each approach showed equal or better performance for duplicated versus single copy genes, with SIFT in particular showing the largest increase in performance (Figure 3b).
Approach dissimilarity and composite predictions
A reported previously (Chun and Fay 2009; Doniger et al. 2008; González-Pérez and López-Bigas 2011; Olatubosun et al. 2012), we found substantial disagreement in predictions among the approaches. At a 95% specificity threshold, 93.6% of mutants were predicted deleterious by one or more approach but only 51.3% were predicted deleterious by six or more of the seven approaches. Similarly, only 0.25% of common SNPs were predicted deleterious by all approaches but 16.6% were predicted deleterious by at least one. Comparing the disagreement between approaches we found LRT and LRTm to produce very similar predictions, but to be distinct from most of the other approaches (Figure 4). We used five models that combined the predictions of all approaches except for SIFT, which had a higher proportion of missing calls. Only two of these ensemble models, a linear discriminant analysis and a generalized linear model with penalized maximum likelihood, performed significantly higher than LRT based on an AUC (Table S2).
Discussion
In this study, we benchmarked the ability of several widely-used approaches to distinguish putatively deleterious and neutral amino acid substitutions in A. thaliana. Prior evaluations of performance focused on large sets of mutants for single proteins or known human disease variants (Adzhubei et al. 2013; Ng and Henikoff 2003). Overall we find high performance across approaches in their ability to distinguish neutral and deleterious variants, validating their use in plants. The highest performance is achieved by a likelihood ratio test (LRT) implemented using the BAD_Mutations pipeline, in this case using alignments from 42 plant genomes. Despite considerable variation among prediction approaches, no single factor explains differences among performance.
Below, we discuss our results along with characteristics of the approaches and test data that may contribute to differences in predictions and performance when applied to non-human species. One important consideration is the distinction between deleterious variants and those that impact protein function and have phenotypic consequences. While these two groups are overlapping, they are not identical. Because conservation and divergence between species is directly related to fitness, we have used the term “deleterious” when referring to the prediction approaches. However, the test sets used to evaluate approaches are composed of variants known to affect protein function or phenotype. Thus, regardless of the nomenclature any evaluation of approach performance necessarily assumes a large overlap between conserved amino acid positions and those that affect protein function as measured by phenotype.
Phylogenetic power, alignments, and reference databases
Phylogenetic power is critical to all comparative genomic approaches that predict deleterious variants. When homologs are too closely related, not enough time has passed for neutral sites to accumulate amino acid substitutions. When homologs are too distantly related, functional sites may not be conserved due to compensatory changes or divergence in homolog function (Breen et al. 2012; Jordan et al. 2015; Marini et al. 2010). The LRT differs from the other approaches examined in that it uses synonymous sites as an internal control to account for the expected amount of protein divergence under a neutral model. As such, even homologs that are nearly identical in their amino acid sequences are informative, so long as they have accumulated changes at synonymous sites. However, distantly related homologs are uninformative if divergence at synonymous sites is saturated, thus the LRT should only be applied to organisms where a sufficient number of related genomes are available. GERP++ is similar to the LRT in that it uses a neutral substitution rate to make its predictions, but differs in that the neutral rate must be specified rather than being estimated from synonymous sites within the alignment. GERP++ also does not make use of the genetic code to distinguish synonymous and nonsynonymous changes. In this regard, GERP++ was not appropriately applied since we used a fixed neutral rate for all genes rather than an alignment specific neutral rate. Out of the approaches compared, phylogenetic power cannot explain the differences between the LRT, MAPP and GERP++ since they used the same alignments.
All approaches studied here use alignments to make their predictions, making the protein database and choice of homologs to be included in the alignment a critical step. For MAPP, GERP++, and LRT we used alignments generated using the BAD_Mutations pipeline which queries proteins from sequenced plant genomes, in this case from 42 Angiosperm species. SIFT and PolyPhen2 use the UniRef database (2011), whereas PROVEAN uses the most recent non-redundant protein database from NCBI. Both PROVEAN and PolyPhen2 are known to be sensitive to the choice of the reference database and criteria for inclusion of homologs (Adzhubei et al. 2013; Choi et al. 2012). Despite the choice of homologs being an important step in predicting deleterious substitutions, the use of a plant-specific or entire non-redundant database does not appear to be a major contributor to performance differences (Figure 1).
Training and test sets
Performance of an individual approaches depends on both the training and test sets used to measure it. Because performance is typically measured using common SNPs and known disease variants in humans, there has been some concern over the lack of independence between training and test sets (Dong et al. 2015; Grimm et al. 2015). However, another consideration that has not yet been examined is whether performance in one species translates to other distantly related species, which may not have the same availability of homologs from sequenced genomes spanning a range of phylogenetic relatedness. The performance of individual approaches could depend on the study system in that some approaches may expect homologs at certain phylogenetic distances, low rates of compensatory change, or low rates of gene duplication.
Previous studies of the accuracy of prediction approaches made use of five human test datasets (Dong et al. 2015; Grimm et al. 2015). We find better performance across approaches in our A. thaliana dataset than that reported for humans (Table 1). It is unclear why the approaches uniformly perform better in A. thaliana, one possibility is that the neutral and deleterious variants in A. thaliana are more distinct from one another than in humans. The very large proportion of phenotyping changing variants in our A. thaliana test set that are identified as deleterious means that this test data set is less useful for approach comparison due to the small number of cases that are difficult to predict correctly.
Population and gene-specific performance
Because nearly all measures of performance use either common polymorphism or recently fixed amino acid substitutions as a proxy for neutral SNPs, population and gene-specific factors that influence neutral polymorphism are expected to influence measures of performance. Humans have a small effective population size relative to other mammals (Leffler et al. 2012) and consequently a high ratio of nonsynonymous to synonymous diversity (Fay et al. 2001; Kosiol et al. 2008). Thus, distinguishing neutral and deleterious variants may be more difficult in humans than other species, and approaches trained using human polymorphism may be more conservative with respect to weakly deleterious variants. In comparison, predicting deleterious variants in A. thaliana may be facilitated by the fact that it is a selfing species with an effective population size larger than that of humans (Cao et al. 2011).
It should be noted that both demographic history and the process of local adaptation could play important roles in the distribution of deleterious. In populations that are colonizing or expanding into novel environments, the selective coefficients against individual variants may change (Slotte et al. 2013), and locally adaptive variants may become appreciably enriched. Both humans and A. thaliana are known to have undergone demographic expansion in their recent evolutionary histories (Hoffmann 2002; Finlayson 2005). While the relative extent of local adaptation in these two species is difficult to quantify, both exhibit an excess of low frequency amino acid polymorphism characteristic of deleterious variants (Lohmueller et al. 2008; Henn et al. 2016; Cao et al. 2011).
Another potentially important factor in predicting deleterious variants is gene duplication. A. thaliana carries remnants of a whole genome duplication along with numerous single copy duplications (The Arabidopsis Genome Initiative 2000) more than are present in the human genome (Lynch and Conery 2000). Gene duplication can lead to relaxed selection during subfunctionalization or pseudogenization (Ohno 1970), enabling amino acid variants to accumulate in recently duplicated genes. However, we found very similar performance between duplicate and single copy genes, consistent with a similar finding in humans using PolyPhen2 (Adzhubei et al. 2013). Because we only included genes with known mutant phenotypes, the sample of recently duplicated genes is limited. Recent duplicates are more likely to accumulate common variants that would appear deleterious and so may be among those genes where predictions are the most difficult.
Conclusions and future directions
Most approaches developed to predict deleterious mutations were trained using human data and in many cases can only be used for human proteins, e.g.,Kircher et al. 2014; Li et al. 2009; Schwarz et al. 2010. This study demonstrates that several generalized approaches perform exceptionally well in A. thaliana, implying that they should also work well for other plant species. Despite the high performance, it is quite likely further improvements could be achieved. Notably, LRT requires longer run times than any of the other approaches, typically 5.2 hrs of compute time per gene. Although we did not investigate whether a faster approach could be implemented without a loss in performance, it is acknowledged that the long run time of the LRT may limit the application of the approach to large genomics datasets. One potential avenue to pursue is whether faster heuristic measures of site-specific conservation based on the BAD_Mutations pipeline of alignments could achieve similarly high performance. However, further study would be needed to test whether heuristic measures of amino acid conservation would be robust to the reference species and protein alignments to which they were applied. A second approach would be to find a more effective means of generating predictions from the combined output of multiple prediction approaches, as this has been shown to be highly effective in humans, e.g. (González-Pérez and López-Bigas 2011). Although we did not find an ensemble predictor that greatly improved performance, this might reflect the relatively small number of variables used to generate ensemble predictions.
Methods
Mutations with phenotypic effects were obtained from two sources. We generated a manually curated set of 542 amino acid altering mutations in 155 genes with phenotypic effects that are described in the literature. These mutations were found by searching the Arabidopsis Information Resource (http://www.arabidopsis.org) for genes with either dominant or recessive alleles caused by nucleotide substitutions. We also identified mutations using a literature search in Google Scholar (http://scholar.google.com). For each variant we recorded the amino acid substitution, position and link to the published paper (Table S3). We excluded nonsense mutations because they frequently completely eliminate gene function. We identified a second set of 2,617 amino acid altering mutations in 960 genes from the manually curated UniProt/Swiss-Prot database (http://www.uniprot.org/, (Boutet et al. 2016). The two sets were independently generated and had an overlap of 249 mutants. Using those mutants with named alleles as an indicator of those with gross versus biochemical phenotypes, 65% of our manually curated set and 33% of the Swiss-Prot set had macroscopic phenotypes. Duplicated genes were defined by those proteins with a significant blastp hit (E-value < 0.05) to another A. thaliana protein with greater than 60% identity. By this criteria 466/995 proteins were classified as duplicated.
Single nucleotide polymorphisms (SNPs) without any known phenotype were obtained from a set of 80 sequenced A. thaliana strains (Ensembl, version 81, “Cao_SNPs”, (Cao et al. 2011)). At the time of download, these were the only SNP set available with unrestricted use. After filtering out sites with heterozygous or missing genotype calls, there were 10,797 biallelic amino acid altering SNPs in the 995 proteins. We used a subset of 1,583 common SNPs (>10%) as those least likely to have phenotypic effects.
We assessed amino acid substitutions using six approaches: LRT (Chun and Fay 2009), PolyPhen2 (Adzhubei et al. 2010), SIFT 4G (Vaser et al. 2016), Provean (Choi et al. 2012), MAPP(Stone and Sidow 2005) and Gerp++ (Davydov et al. 2010). PolyPhen2 predictions were generated using the standalone software (v2.2.2) with the PolyPhen2 bundled non-redundant database (uniref100-release 2011_12) and the probabilistic variant classifier using the default HumDiv model. Precomputed SIFT 4G predictions were obtained for A. thaliana (TAIR10.23) (http://sift.bii.a-star.edu.sg) and are based on the UniRef90 database (2011). SIFT 4G predictions were not available for 855 substitutions, predominantly because the amino acid change involved more than one nucleotide change within a codon. Provean predictions (v1.1.5) were generated for all mutations using NCBI's non-redundant database (04/02/2016). MAPP predictions were generated using BAD_Mutations alignments and trees (see below). GERP++ generates predictions for single nucleotide positions rather than codons. To assess GERP++ performance we used the GERP++ score at the first, second or third position of the codon if the amino acid substitution could occur by a single change at one of those positions and the average of the GERP++ scores at the first and second positions for all other types of changes. In addition, because GERP++ did not perform well using neutral substitution rates estimated from each alignment (default) we used a uniform neutral rate of 10 substitutions per site across all genes.
Predictions using a likelihood ratio test (LRT) were performed with the BAD_Mutations pipeline (Kono et al. 2016). The pipeline makes use of sequenced and annotated genomes. We used blast searches of 42 angiosperm genomes and retaining the top hit from each with a blast e-value threshold of 0.05. Only Angiosperms were used to avoid extensive saturation of synonymous sites. Pasta protein alignments (Mirarab et al. 2015) were generated using the homologs and the likelihood of dN = ωdS compared to dN = dS for each codon of interest was calculated using HYPHY (Pond et al. 2005), where dN and dS are the nonsynonymous and synonymous substitution rate and ω is a free parameter. Sequences with ‘N's or other ambiguous nucleotides were discarded prior to the likelihood ratio test. The LRT differs compared to its original formation (Chun and Fay 2009) in that: i) dS was estimated using all codons for each gene separately, ii) query sequences were optionally masked in the likelihood calculation to avoid any reference bias and iii) branches with dS greater than 3 were set to 3 to avoid spuriously high estimates of dS. Additionally, the original LRT used heuristics to eliminate sites with dN > dS, the derived allele present in another species, or with fewer than 10 species in the alignment. Rather than eliminating sites, we used logistic regression to provide a single probability of being deleterious based on the LRT test and these additional pieces of information.
Logistic regression was applied using both the masked and unmasked LRT p-values, where the masked p-values were generated from alignments without the A. thaliana reference allele. For the unmasked logistic regression, we used the terms log10(LRT p-value), constraint (dN/dS), Rn and An, where Rn and An are the number of A. thaliana reference and alternative (i.e., mutant) amino acids observed in the alignment, respectively. For the masked model we replaced An and Rn with the absolute value of Rn – An and the maximum of Rn and An, respectively. For both models p-values less than 1e-16 were set to 1e-16 and constraint values greater than 10 were set to 10. Ten-fold cross validation was used to assess the fit of the logistic regression. The average area under the ROC curve based on cross validation was 0.9575 (unmasked) and 0.9471 (masked). Because these values were nearly identical to the performance of the model fit to the entire dataset, 0.9581 (unmasked) and 0.9471 (masked), we used the logistic regression coefficients from the full dataset:
Sensitivity, specificity and area under the curve (AUC) were calculated for each approach using the pROC package in R (Robin et al. 2011). Confidence intervals for each were calculated by stratified bootstrapping (n = 2000).
Combined predictions were generated based on the combined scores of six approaches: LRT, LRT-masked, PolyPhen2, Provean, GERP++ and MAPP. Sites with missing predictions from one or more approach (n = 215) were removed. Combined predictions were generated using: 1) logistic regression with each approach's score as a predictive variable, 2) support vector machine, 3) random forest, 4) linear discriminant analysis and 5) generalized linear model with penalized maximum likelihood implemented by the glmnet package in R (Friedman et al. 2010). The performance of each model was assessed by AUC values obtained from 10-fold cross-validation.
Data access
LRT predictions were implemented in the Python package BAD_Mutations which is freely available from http://github.com/MorrellLAB/BAD_Mutations.git.
Author contributions
T.J.Y.K. and P.J.H. wrote code for BAD_Mutations. J.C.F., L.L. and C.H. S. analyzed the data. L.L., J.C.F., T.J.Y.K., and P.L.M. wrote the initial draft of the manuscript. All authors contributed to final manuscript preparation.
Acknowledgements
We acknowledge funding support from the US National Science Foundation Plant Genome Program grant (DBI-1339393 to JCF and PLM), from a University of Minnesota Doctoral Dissertation Fellowship supporting TJYK. We thank members of the Morrell Lab for discussion and software testing. Finally, we would like to thank Dr. Danelle Seymour from UC Irvine for comments.