Abstract
The phylogeny of seed plants remains one of the most enigmatic problems in evolutionary plant biology, with morphological phylogenies (which include fossils) and molecular phylogenies pointing to very distinct topologies. Almost all morphology-based phylogenies support the so-called anthophyte hypothesis, grouping the angiosperms with Gnetales and several extinct seed plant lineages, while most molecular phylogenies link Gnetales with conifers. In this study, we investigate the phylogenetic signal present in seed plant morphological datasets. We use maximum parsimony and Bayesian inference, combined with a number of experiments with all available seed plant morphological matrices to address the morphological-molecular conflict. First, we ask whether the lack of association of Gnetales with conifers in morphological analyses is due to an absence of signal or to the presence of competing signals, and second, we compare the performance of parsimony and Bayesian approaches with morphological datasets. Our results imply that the grouping of Gnetales and angiosperms is largely the result of long branch attraction, consistent across a range of methodological approaches. Thus, the signal for the grouping of Gnetales with conifers in morphological matrices was swamped by convergence between angiosperms and Gnetales, both situated on long branches, in previous analyses. However, this effect becomes weaker in more recent analyses, as a result of addition and critical reassessment of characters. Bayesian inference proves to be more resistant to long branch attraction, and the use of parsimony is largely responsible for persistence of the anthophyte topology. Our analyses finally reconcile morphology with molecules in the context of the seed plant phylogeny, and show that morphology may therefore be useful in reconstructing other aspects of the phylogenetic history of the seed plants.
INTRODUCTION
The use of morphology as a source of data for reconstructing phylogenetic relationships has lost most of its ground since the advent of molecular phylogenetics, except in paleontology. However, in more recent times there has been renewed interest in morphological phylogenetics (Pyron 2015; Lee and Palci 2015). A major impetus for this renaissance has been an increased interest in the phylogenetic placement of fossil taxa in trees of living organisms, stimulated by the growing necessity of accurate calibrations for dating the molecular trees that represent the main basis for modern comparative evolutionary studies. Other factors have been by the development of new methods for dating phylogenies that can integrate phylogenetic inference of the placement of fossils in the dating process, i.e., tip-dating (Pyron 2011; Ronquist et al. 2012; Zhang et al. 2016), as well as renewed interest in the application of statistical phylogenetics to morphological data both on a theoretical (Wright et al. 2014, 2015; O’Reilly et al. 2016) and an empirical level (Lee and Worthy 2012; Godefroit et al. 2013; Cau et al. 2015). To these motivations may be added the long-recognized value of fossils for elucidating the homologies of novel structures (such as the seed plant ovule and eustele) and the order of origin of the morphological synapomorphies of extant (crown) groups. This is critical because major groups, such as angiosperms, are often separated from their closest living relatives by major morphological gaps (numbers of character changes), even if the incorporation of fossils does not affect inferred relationships among living taxa (Doyle and Donoghue 1987; Donoghue et al. 1989).
Many phylogenies based on morphology have been recently published for important groups with both living and fossil representatives, including mammals (O’Leary et al. 2013), squamate reptiles (Gauthier et al. 2012), arthropods (Legg et al. 2013), and the genus Homo (Dembo et al. 2016). However, the validity and use of morphological data in reconstructing phylogeny have been severely criticized, notably by Scotland et al. (2003), based on supposed diminishing returns in the discovery of new morphological characters and the prevalence of functional convergence. The painstaking acquisition of morphological characters, which requires a relatively large amount of training and time, could turn out to be systematically worthless if the phylogenetic signal present in these data is either insufficient or misleading. Indeed, the number of characters that can be coded for morphological datasets represents a major limit to the use of morphology and its integration with molecular data, especially in the age of phylogenomics, where the ever-increasing amount of molecular signal could simply “swamp” the weak signal present in morphological datasets (Doyle and Endress 2000; Bateman et al. 2006). Morphological data may also be afflicted to a higher degree than molecules by functional convergence and parallelism (Givnish and Sytsma 1997), which could lead a morphological dataset to infer a wrong phylogenetic tree. Even though the confounding effect of convergence has been formally tested only in a few studies (Wiens et al. 2003), it seems to be at the base of one of the deepest cases of conflict between molecules and morphology in the reconstruction of evolutionary history, namely the phylogeny of placental mammals (Foley et al. 2016). In this case, the strong effect of selection on general morphology caused by similar lifestyle seems to hinder attempts to use morphology to reconstruct phylogenetic history in this group (Springer et al. 2007), and it affects even large “phenomic” datasets (Springer et al. 2013).
Another example of conflict between morphology and molecular data involves the relationships among seed plants. Before the advent of cladistics, some authors proposed that angiosperms were related to the highly derived living seed plant order Gnetales, while others argued that these two groups were strictly convergent and Gnetales were instead related to conifers (for a review, see Doyle and Donoghue 1986). However, the view that angiosperms are related to Gnetales and fossil Bennettitales, called the anthophyte hypothesis, is one of the oldest and seemingly most stable results of the morphologically based parsimony analyses of seed plant phylogeny. Since Hill and Crane (1982) and Crane (1985), the grouping of Bennettitales, Gnetales, the fossil Pentoxylon, and angiosperms (sometimes with the fossil Caytonia as the closest outgroup of angiosperms) was retrieved in almost all successive analyses (Doyle and Donoghue 1986, 1992; Nixon et al. 1994; Rothwell and Serbet 1994; Doyle 1996, 2006, 2008; Hilton and Bateman 2006; Friis et al. 2007; Rothwell et al. 2009; Rothwell and Stockey 2016; Fig. 1). Some analyses associated anthophytes with “Mesozoic seed ferns” (glossopterids, corystosperms, and Caytonia), others with “coniferophytes” (conifers, Ginkgo, and fossil cordaites). By contrast, since the advent of molecular phylogenetics, the anthophyte hypothesis has lost most of its support among plant biologists. Although molecular analyses cannot directly evaluate the status of presumed fossil anthophytes, they can address the relationship of angiosperms and Gnetales. Molecular data from different genomes analyzed with different approaches do not yield a Gnetales plus angiosperm clade, with the exception of few maximum parsimony (MP) and neighbor joining analyses of nuclear ribosomal RNA or DNA (Hamby and Zimmer 1992; Stefanovic et al. 1998; Rydin et al. 2002) and one MP analysis of rbcL (Rydin and Källersjö 2002). The majority of molecular trees retrieve a clade of Gnetales plus Pinaceae (Bowe et al. 2000; Chaw et al. 2000; Gugerli et al. 2001; Qiu et al. 2007; Zhong et al. 2011), conifers other than Pinaceae (cupressophytes) (Nickrent et al. 2000; Rydin and Källersjö 2002), or conifers as a whole (Wickett et al. 2014), which we refer to collectively as “Gnetales-conifer” trees. In most of these trees angiosperms are the sister group of all other living seed plants (acrogymnosperms). The main exceptions are “Gnetales-basal” trees, in which Gnetales are sister to all other living seed plants (e.g., Albert et al. 1994; Rydin and Källersjö 2002).
Several potential issues have been identified with both sorts of data. Regarding molecules, these include limited taxonomic sampling resulting from extinction the majority of seed plant lineages, loss of phylogenetic signal due to saturation (particularly at third codon positions), strong rate heterogeneity among sites across lineages and conflict between gene trees (Mathews 2009), composition biases among synonymous substitutions (Cox et al. 2014) as well as systematic errors and biases (Magallón and Sanderson 2002; Burleigh and Mathews 2007; Zhong et al. 2011), leading to a plethora of conflicting signals. In analyzing datasets that yielded Gnetales-basal trees, studies that have attempted to correct for these biases have generally favored trees in which Gnetales are associated with conifers (Magallón and Sanderson 2002; Burleigh and Mathews 2007). Regarding morphology, it has been shown that different taxon sampling strategies, particularly regarding choice of the closest progymnosperm outgroup of seed plants (Hilton and Bateman 2006), can lead to different results concerning the rooting of the seed plants.
The conflict between molecules and morphology has led to different attitudes toward morphological data within the botanical community (Donoghue and Doyle 2000; Bateman et al. 2006; Rothwell et al. 2009). Following suggestions of Donoghue and Doyle (2000), Doyle (2006, 2008) reconsidered several supposed homologies between angiosperms and Gnetales in the light of the molecular results. These studies and the analysis of Hilton and Bateman (2006) also incorporated newly recognized similarities between Gnetales and conifers, for example in wood anatomy (Carlquist 1996), as well as improved evidence on the morphology of the seed-bearing cupules in fossil taxa. When building a morphological matrix, dissecting a character into more character states may represent an improvement by distinguishing convergent states during primary homology assessment (Jenner 2004; Zou and Zhang, 2016), although it may also lead to a lack of resolution when the number of states becomes excessive. In the phylogeny of seed plants, there are many special factors that complicate character coding. Among living taxa, the assessment of homology is complicated by the plastic and modular nature of plant development (Mathews and Kramer 2012). Among fossil taxa, the mode of preservation of many key fossils has critical consequences for the amount of data available. This affects not only the number of missing characters, but also the process of primary homology assessment and character coding. Although these issues with coding are most severe in fossils preserved as compressions, such as Caytonia (Doyle 2008; Rothwell et al. 2009) and Archaefructus (Sun et al. 2002; Friis et al. 2003; Doyle 2008; Rudall and Bateman 2008; Endress and Doyle 2009), even fossil groups that are exquisitely preserved as permineralizations (e.g., Bennettitales) are not immune to conflicting interpretations (Friis et al. 2007; Rothwell et al. 2009; Crepet and Stevenson 2010; Doyle 2012; Pott 2016). Indeed, even after careful reconsideration of potentially convergent traits between Gnetales and angiosperms, maximum parsimony seemed to continue to favor the anthophyte hypothesis (Doyle 2006; Hilton and Bateman 2006; Rothwell et al. 2009). The possibility that morphological data are inadequate to resolve the phylogeny of seed plants would represent a severe hindrance, especially in the light of the small number of extant lineages that survived extinction during the Paleozoic and Mesozoic (Mathews 2009) and the great morphological gaps among these surviving lineages. However, there have been signs that the conflicts with molecular data are weakening: Doyle (2006) found that trees in which Gnetales were nested in conifers were only one step less parsimonious than anthophyte trees, and in Doyle (2008) trees of the two types became equally parsimonious.
In this study, we attempt to elucidate the phylogenetic signal present in published morphological datasets of the seed plants. We test whether the potential convergence between angiosperms and Gnetales represents a major issue in morphological datasets of seed plants by reanalyzing the matrices that were driven by earlier homology assumptions concerning characters of the two groups (i.e., the matrices compiled before the incoming of the molecular results) as well as the matrices that revised such assumptions (the matrices of Doyle 2006 and Hilton and Bateman 2006, and datasets derived from them), and testing whether the signal and support for the anthophytes changes between these two sets of matrices. Then we investigate whether the fact that these analyses did not place Gnetales in or near the conifers was due to the absence of signal or the presence of competing signals by investigating the relative support for the anthophytes and the Gnetales-conifer clade in all the matrices. After revealing a more coherent signal supporting a Gnetales-conifer clade in the latest matrices, we investigate whether the retrieval of an anthophyte topology by maximum parsimony was affected by methodological biases that could be overcome by using model-based Bayesian methods.
MATERIALS AND METHODS
Matrices
The Crane (1985), Doyle and Donoghue (1986, 1992), Nixon et al. (1994), Rothwell and Serbet (1994), and Doyle (1996, 2006, 2008) matrices were manually coded from the respective articles. The Hilton and Bateman (2006) matrix was kindly provided by Richard Bateman. The matrices from Analysis 3 of Rothwell et al. (2009) and from Rothwell and Stockey (2016) were downloaded from the supplementary materials of the respective articles.
Parsimony analyses
We performed maximum parsimony analyses of all matrices with PAUP 4.0a136 (Swofford 2003), using the heuristic search algorithm with random addition of taxa and 1000 replicates. Bootstrap analyses were conducted using 10,000 replicates, using the “asis” addition option and keeping one tree per replicate (Müller 2005).
We also conducted analyses with a topological constraint, forcing the Gnetales into a clade with the extant conifers. Significant differences between the constrained and unconstrained topologies were tested using the Templeton test (Templeton 1983) as implemented in PAUP v. 4.0a136 (Swofford 2003). We investigated the effects of recoding characters by Doyle (2006, 2008) in more detail by using MacClade (Maddison and Maddison 2003) to compare the number of steps in each character on trees with Gnetales nested in anthophytes and associated with conifers.
Bayesian inference (BI)
Bayesian analyses relied on MrBayes v. 3.2.3 (Ronquist et al. 2012), under the Markov k-states (Mk) model (Lewis 2001).
For each matrix, we conducted two analyses, one with an equal rate of evolution among characters and another with gamma-distributed rate variation. In both cases, we used the MKpr-inf correction for parsimony informative characters. The analyses were run for 5,000,000 generations, sampling every 1000th generation. The first 10,000 runs were discarded as burn-in. Posterior traces were inspected using Tracer (Rambaut and Drummond 2007).
Model testing and rate variation
We also conducted stepping stone analyses (SS) (Xie et al. 2011; Ronquist et al. 2012) in order to evaluate the most appropriate model of rate variation among characters (equal rates vs. gamma-distributed rates). We used 4 independent runs with 2 chains with the default MrBayes parameters, run for 5,000,000 generations and sampling every 1000th generation. Using the marginal likelihoods from the SS analysis, we then calculated the support for the two models using Bayes factors (BF) (Kass and Raftery 1995).
Exploring conflict in the data
To explore phylogenetic conflict in the data, we employed the software SplitsTree 4 (Huson and Bryant 2006). We used this program to visualize conflicts among the bootstrap replicates from the MP analysis and among the posterior tree samples from the BI analysis. A consensus network (Holland et al. 2004) was built using the “count” option. The cut-off for visualizing the splits was set at 0.05.
Long branch attraction tests
We modified the matrices to perform tests for long branch attraction (LBA), following the suggestions of Bergsten (2005). Two matrices were created to test the potentially destabilizing effect of the two long-branched groups suspected to create this artifact, angiosperms and Gnetales, by successively removing them (long branch extraction analysis, LBE). To test further the hypothesis of an LBA artifact exerted by angiosperms, we followed a similar approach to the sampling experiment in Rota-Stabelli et al. (2010): another matrix was created to elongate the branch subtending angiosperms by removing non-angiospermous fossil outgroups (Pentoxylon, Bennettitales, and Caytonia) (branch elongation analysis, BE). To test the effect of including fossil data in the matrices, we created a set of matrices in which all fossil taxa were removed (extant experiment, EX).
Morphospace analysis
To visualize morphological patterns in the different matrices, we conducted principal coordinates (PCO) analyses using the R package Claddis (Lloyd 2016). The taxa were then plotted on the first two PCO axes.
RESULTS
Our re-analyses of the historical morphological matrices of seed plants resulted in trees identical to the published trees (Table 1). The MP trees and the consensus trees always show an anthophyte clade (with or without Caytonia), except trees based on the Doyle (2008) matrix, in which anthophyte and Gnetales-conifer topologies are equally parsimonious. However, bootstrap analysis shows that the anthophyte clade is not strongly supported in any of the matrices, with the exception of the Nixon et al. (1994) matrix (Fig. 2).
Constraining Gnetales and conifers to form a clade always results in trees longer than the most parsimonious trees, except in the trees based on the Doyle (2008) matrix (Table 2). The Templeton test of the best trees against the worst of the constrained trees (i.e., the most parsimonious constrained tree that is statistically most different from the most parsimonious unconstrained tree) does however show that this difference is only significant in the Nixon et al. (1994) matrix.
The stepping stone analysis shows strong support for rate variation among characters in all matrices except those of Crane (1985) and Doyle and Donoghue (1986) (Table 3). The strength of the support seems to be correlated with both the number of characters and the number of taxa (Supplementary Fig. 1), which were lowest in the oldest analyses.
The trees obtained from the BI analyses show a much sharper differentiation between early and late matrices. With the pre-2006 matrices, support and topology are mostly in agreement with the MP analyses. However, with the post-2000 matrices we observe a shift in support from the anthophytes to a clade of Gnetales and coniferophytes (Fig. 2, 3).
To test the whether the anthophyte topology could be the result of LBA, we first performed removal experiments. The removal of the angiosperms has different effects on the pre- and post-2000 matrices. With the Crane (1985) matrix, a topology with Bennettitales, Pentoxylon and the Gnetales diverging after Lyginopteris and before the other taxa becomes as parsimonious as the topology with the anthophytes nested among Mesozoic seed ferns that was retrieved with the full matrix. With the Doyle and Donoghue (1986) matrix, Bennettitales, Pentoxylon, and Gnetales are nested within coniferophytes. With the Doyle and Donoghue (1992) and Rothwell and Serbet (1994) matrices, the consensus tree is identical to the trimmed consensus of the full matrix. With the Nixon et al. (1994) matrix, cordaites and Ginkgo are successive outgroups to a conifer + anthophyte clade, whereas with the full matrix they are equally parsimoniously placed as successive outgroups to the conifers, in a clade that is sister to monophyletic anthophytes. The inverse happens with the Doyle (1996) matrix, where the position of Ginkgo and cordaites is destabilized by the removal of the angiosperms, with these taxa being either successive outgroups to extant and fossil conifers or sister to a clade composed of anthophytes, conifers, Peltaspermum, and Autunia. The position of the Gnetales in an anthophyte clade is maintained in all matrices.
With the post-2000 matrices, the effect of removal of the angiosperms is consistent among different matrices (Fig. 4d-f). With the Hilton and Bateman (2006), Doyle (2006), and Doyle (2008) datasets, the resulting trees see the Gnetales nested within the coniferophytes, with or without Bennettitales. With the Rothwell et al. (2009) matrix, a topology with a clade of Gnetales and conifers that excludes Bennettitales becomes most parsimonious (Fig. 4e). With the Rothwell and Stockey (2016) matrix, Gnetales are sister to Taxus in a coniferophyte clade that also includes Doylea.
The removal of the Gnetales has no impact at all on trees based on the Crane (1985), Doyle and Donoghue (1986), and Doyle and Donoghue (1992) matrices, in which the topology is identical to the trimmed topology of the consensus in the full analysis. With the Nixon et al. (1994) matrix, the removal of the Gnetales results in a coniferophyte clade (including Ginkgo and Cordaitales) becoming the most parsimonious topology. With the Rothwell and Serbet (1994) matrix, the removal of Gnetales results in a breakup of the Caytonia-Glossopteris-corystosperm clade. With the Doyle (1996) matrix, the only difference lies in the placement of the corystosperms, Autunia, and Peltaspermum, which are sister to a coniferophyte clade in the analysis without Gnetales.
With the post-2000 matrices, the removal of the Gnetales results in a shift of the anthophyte clade to a position outside a coniferophyte clade (Fig. 4f). With the Doyle (2006) and Doyle (2008) matrices, an extended anthophyte clade including Cycadales and glossopterids is sister to a clade of Callistophyton, Peltaspermum, Autunia, and corystosperms plus coniferophytes. The analysis of the Rothwell and Stockey (2016) matrix represents an exception, where the placement of the anthophytes is not affected by the removal of the Gnetales. The removal of Doylea in addition to Gnetales results in a similar pattern to the other post-2000 matrices.
In the branch elongation experiment, we observed that MP bootstrap support for the angiosperm plus Gnetales clade increases with decreasing taxon sampling in all matrices (Fig. 4g). This effect is even stronger in the extant experiment matrices, where a split including angiosperms plus Gnetales is strongly supported by the MP bootstrap in all matrices.
BI analysis of the BE and EX matrices shows a less linear pattern (Fig. 4h, i). In the BE analyses, the signal for the anthophytes decreases in the Doyle and Donoghue (1986, 1992) matrices, reaching less than 0.5 posterior probability (pp) in the analysis with gamma-distributed rate variation. In the Nixon et al. (1994), Rothwell and Serbet (1994) and Doyle (1996) matrices, the pp of the anthophytes in the BE matrices is comparable to that from the full matrices. In the post-2000 BE matrices, BI support for the anthophytes is almost null in the Hilton and Bateman (2006) and Doyle (2006) matrices (<0.07 pp) and increases in the Doyle (2008) and Rothwell et al. (2009) matrices analyzed using gamma-rate variation (0.55 and 0.51 respectively) and in the Rothwell and Stockey (2016) matrix (0.23 for the equal-rate analysis, 0.37 for the gamma analysis).
The analyses of the EX matrices all show high to moderate support (1-0.75 pp) for the split containing angiosperms plus Gnetales. With the post-2000 matrices, the use of the gamma-distributed model recovers a higher pp for the anthophytes.
The morphospace analyses (Fig. 5) provide a graphic confirmation of the morphological separation of both Gnetales and angiosperms from other seed plants and the impression that Gnetales share competing morphological similarities with both angiosperms and conifers. In the morphospace generated from most of the pre-2000 matrices, Gnetales lie closer to angiosperms (data not shown). With the Doyle (1996) matrix and the post-2000 matrices, the first axis of the PCO appears to separate angiosperm-like and non-angiosperm-like taxa, whereas the second axis seems to represent a tendency from a seed fern-like towards a conifer-like morphology. The placement of the Gnetales is always closer to the conifers than to the angiosperms (Fig. 5). However, in all cases, Gnetales seem to have higher levels of “angiosperm-like” morphology than do conifers, represented by their rightward placement on the first PCO axis. This is shared by Doylea in the Rothwell and Stockey (2016) matrix. Between the analyses of the Doyle (1996) and Doyle (2008) matrices (Fig. 5a, b), there is a modest shift of Gnetales away from angiosperms and towards conifers.
DISCUSSION
Morphology and the phylogeny of the seed plants
The results of our analyses help to unravel some of the main issues regarding the phylogenetic signal for the anthophyte clade in morphological matrices of seed plants. MP bootstrap analyses, the Templeton test on constrained topologies, and BI analyses all agree in showing that support for assignment of Gnetales to an anthophyte clade did not increase with increasing taxon or character sampling, as noticed by Donoghue and Doyle (2000). One of the most interesting results is the switch in support between matrices compiled before the main molecular analyses of seed plant phylogeny (pre-2000) and afterwards (i.e., Doyle 2006 and Hilton and Bateman 2006). These two matrices, which both used Doyle (1996) as a starting point but were modified independently, with only limited discussion at later stages of the two projects, and made different choices regarding character coding, taxon sampling, and splitting of higher-level taxa, both show a very similar pattern. If under the MP criterion an anthophyte topology was more parsimonious, although without significant support, the Bayesian criterion favors a grouping of Gnetales and conifers. This phenomenon was already reported by Mathews et al. (2010), who reanalyzed the matrix of Doyle (2008) using BI, but their result passed mostly unnoticed. The matrices descended from Doyle (2006) (i.e., Doyle 2008) and Hilton and Bateman (2006) (i.e., Rothwell et al. 2009, 2016) exhibit a similar pattern.
Examination of the behavior of characters on anthophyte and Gnetales-conifer trees illustrates how changes in character analysis between the studies of Doyle (1996) and Doyle (2006, 2008) increased support for Gnetales-conifer trees. Some changes were the result of doubts concerning the homology of anthophyte characters. For example, character 14 of Doyle (1996), which contrasted the absence of a tunica layer in the apical meristem in cycads, Ginkgo, and most conifers with its presence in Gnetales, angiosperms, and Araucariaceae, underwent one less step on anthophyte trees. However, the tunica consists of one layer of cells in Gnetales, but two layers in angiosperms, suggesting that it may not be homologous in the two groups. Doyle (2006, 2008) therefore split presence of a tunica into two states, and the resulting character (4) underwent the same number of steps with Gnetales in both positions. The same is true for redefinition of the megaspore membrane character (120), from thick vs. reduced to present vs. absent; the megaspore membrane is thin in Gnetales, but absent in angiosperms, Caytonia, and probably Bennettitales. Other changes involved newly recognized conifer-like features of Gnetales. For example, Doyle (2006, 2008) added a character for presence of a torus in the pit membranes of xylem elements in conifers and Gnetales (character 12, based on Carlquist 1996) and rescored Gnetales as having a tiered proembryo (character 130), as in conifers; both characters undergo one less step on Gnetales-conifer trees than on most anthophyte trees (except some with major rearrangements elsewhere in seed plants). Doyle (1996) scored Gnetales as having as pinnate/paddle-shaped microsporophylls (character 37, state 0), which favored an anthophyte tree by one step, but when Doyle (2008) rescored microsporophylls in Gnetales as simple and one-veined (character 55, state 1), as in conifers, based on developmental studies by Mundry and Stützel (2004), the character favored the Gnetales-conifer topology by one or two steps. The shift of Gnetales away from angiosperms and towards conifers in the morphospace analyses based on Doyle (1996) and Doyle (2008) (Fig. 5a, b) is presumably the result of these changes in character analysis.
These trends show that reconsideration of potentially convergent characters between angiosperms and Gnetales and recognition of previously overlooked similarities between Gnetales and conifers succeeded in generating a matrix containing a signal that agreed with the molecular signal associating Gnetales with extant conifers. This result clearly contradicts the view that morphology and molecules are in strong conflict with each other (Bateman et al. 2006, Rothwell et al. 2009) and validates the arguments to this effect advanced by Doyle (2006, 2008) on a parsimony basis. Indeed, in all post-2000 matrices a topology with Gnetales linked with conifers requires the addition of only a few steps to the length of the anthophyte trees, and in the Doyle (2008) matrix both topologies became equally parsimonious. The common focus on the MP consensus tree and the lack of exploration of almost equally parsimonious alternatives may have tended to inflate the perceived conflict between molecules and morphology (e.g., Rothwell et al. 2009). Our analyses show that the signal retrieved using MP is more correctly characterized as ambiguous.
On the other hand, our BI analyses of all post-2000 matrices converge on a similar result. The placement of Gnetales in an extended coniferophyte clade including Ginkgoales, cordaites, and extant and extinct conifers becomes favored in all BI analyses, with stronger support obtained in analyses with gamma rate variation among sites implemented in the model. A signal for linking Gnetales and angiosperms in an anthophyte clade seems to be much weaker, especially compared with the results of the MP analyses. The presence of a coherent signal in the BI analyses of post-2000 morphological matrices of seed plants favoring the placement of Gnetales in or near conifers has interesting implications regarding stem relatives of the angiosperms. Indeed, most post-2000 matrices are broadly congruent in attaching Pentoxylon, glossopterids, Bennettitales, and Caytonia to the stem lineage of the angiosperms (Fig. 3).
Parsimony and Bayesian inference perform differently with seed plant datasets
Our results also add new empirical evidence on the debate concerning the usefulness of morphological data in reconstructing phylogenetic relationships, as well as discussion of the best method to analyze such data (Wright and Hillis 2014; O’Reilly et al. 2016; Puttick et al. 2017). One of the causes of the incompatibility between MP and BI could be the presence of long branches in the tree, which could lead to LBA phenomena (Felsenstein 1978; Bergsten 2005). Analyses based on simulated matrices and real data have repeatedly shown that probabilistic, model-based approaches are more robust to LBA than MP (Swofford et al. 2001; Brinkmann et al. 2005, and references therein). The BI trees show that both angiosperms and Gnetales are situated on very long morphological branches, especially in the post-2000 matrices. After following some of the suggestions by Bergsten (2005) and other methodologies (Rota Stabelli et al. 2011), we conclude that LBA is responsible at least in part for the continuing support for the anthophyte clade in MP analyses of the post-2000 matrices. We base this conclusion on several lines of evidence. First, BI recovers a Gnetales-conifer topology with higher probability than a topology with Gnetales in anthophytes, thus favoring a topology that separates the long branches over a topology that unites them. Second, more complex and better-fitting models recover a higher posterior probability for the topology in which angiosperms and Gnetales are separated (Figs. 2, 3). Third, removing Gnetales or angiosperms results in a rearrangement of the MP topologies in which the other long branch “flies away” from its original position. Fourth, support for the Gnetales plus angiosperms increases with decreased taxon sampling on the branch leading to the angiosperms (Fig. 4g-i). However, relationships in many other parts of the trees obtained with MP and BI are similar, suggesting that MP is not necessarily misleading where long branch effects are lacking. To our knowledge, this represents the first reported case of LBA in a morphological analysis that is supported by multiple tests (Bergsten 2005), with much stronger support than previously reported cases (Lockhart and Cameron 2001; Wiens and Hollingsworth 2000). The nature of this phenomenon can be easily visualized using a principal coordinates analysis, where the presumed close relationship between Gnetales and conifers and the convergence of the former with the angiosperms are effectively congruent with the positions of the three taxa in the plot of the first two PCO axes (Fig. 5). Such a tool could represent an interesting option for exploring the structure of the data in future phylogenetic analyses.
In conclusion, our analyses show that morphological data agree in broad lines with the results of the molecular analyses regarding the position of the Gnetales in seed plant phylogeny. This strongly suggests that morphology carries a phylogenetic signal that is consistent with molecular data, and may therefore be useful in reconstructing other aspects the phylogenetic history of the seed plants, especially the position of fossils relative to living taxa. The supposed conflict between the two sorts of data on the phylogeny of seed plants (Bateman et al. 2006; Rothwell et al. 2009) seems therefore less deep than previously thought, and due partially to methodological issues. Since data from the fossil record are particularly important for resolving the evolutionary history of seed plants, because of the wide gaps that separate extant groups and the potential biases in analysis of such sparsely sampled taxa (Burleigh and Mathews 2007; Mathews 2009; Magallón et al. 2013), our results give new hope for the possibility of integrating fossils and molecules in a coherent way. This is even more important in light of new fossil discoveries (e.g., Rothwell and Stockey 2013, 2016) and the reconstruction of new species-level taxa that show similarities to fossils previously associated with angiosperms (e.g., the Triassic Petriellaea plant, which shares leaf and cupule features with Caytonia: Bomfleur et al. 2014).
Another aspect that emerges from our study is the importance of signal dissection in all phylogenetic analyses involving morphology. Although most phylogenetic analyses based on morphology are still conducted in a parsimony framework, some authors have already underlined the potential of model-based approaches in this field (Lee and Worthy 2012; Lee et al. 2014). Our analyses show that BI yields more robust results under different taxon sampling strategies, and is particularly promising for correcting errors due to long branch effects. Our study converges with previous work indicating that the use of model-based techniques could allow the successful integration of taxa with a high proportion of missing data (Wiens 2005; Wiens and Tiu 2012), which would be extremely useful given the nature of the paleobotanical record.
SUPPLEMENTARY MATERIAL
The supplementary material is available as an online appendix.
ACKNOWLEDGMENTS
MC acknowledges H. Peter Linder for his fundamental support to this work, and for important comments on this manuscript. We would like to thank Richard Bateman and Gar Rothwell for making their matrices available, and Omar Rota-Stabelli for useful discussions about long branch attraction. Tanja Stadler, Susanne Renner, Elisabeth Truernit, Gavin George, and Frank Anderson are gratefully acknowledged for comments on a previous version of this manuscript, and Guy Atchison and Yanis Bouchenak-Khelladi for useful comments on the present version.