Abstract
New techniques for the species-level sorting of millions of specimens have to be developed in order to answer the question of how many species live on earth. These methods should be reliable, scalable, and cost-effective as well as largely insensitive to the low-quality genomic DNA commonly obtained from museum specimens. Mini-barcodes seem to satisfy these criteria, but it is unclear whether they are sufficiently informative for species-level sorting. This is here tested based on 20 datasets covering ca, 30,000 specimens of 5,500 species. All specimens were first sorted based on morphology before being barcoded with full-length cox1 barcodes. Mini-barcodes of different lengths and positions were then obtained in silico from the full-length barcodes using a sliding window approach (3 windows: 100-bp, 200-bp, 300-bp) as well as nine published mini-barcode primers (length: 94 – 407-bp). Afterwards, we determined whether barcode length and/or position reduces congruence between morphospecies and molecular Operational Taxonomic Units (mOTUs) that were obtained using three different species delimitation techniques (ABGD, PTP, objective clustering). We find that there is no significant difference in performance between full-length and mini-barcodes as long as they are of moderate length (>200-bp). Only very short mini-barcodes (<200-bp) perform poorly, especially when they are located near the 5’ end of the Folmer region. Overall, congruence between morphospecies and mOTUs is ca. 80% for barcodes that are >200-bp. The congruent mOTUs contain ca. 75% of the specimens and we estimate that most of the conflict is caused by ca. 10% of the specimens that should be targeted for re-examination. Overall, barcode length (>200-bp) and species delimitation methods have minor effects on congruence. Our study suggests that large-scale species discovery and metabarcoding can utilize mini-barcodes without significant loss of information when compared to full-length barcodes. This is good news given that mini-barcodes can be obtained via cost-effective tagged amplicon sequencing using short-read sequencing platforms (Illumina: “NGS barcodes”).
Introduction
The question of how many species live on earth has intrigued biologists for a very long time, but we are nowhere close to having a robust answer. We do know that fewer than 2 million of the estimated 10-100 million multicellular species have been described and that many are currently being extirpated by the “sixth mass extinction” (Ceballos et al., 2015; Sánchez-Bayo & Wyckhuys, 2019) with potentially catastrophic consequences for the environment (Cafaro, 2015). Monitoring, halting, and perhaps even reversing this process is hampered by the “taxonomic impediment”. This impediment is particularly severe for “invertebrates” that collectively contribute much of the animal biomass (e.g., arthropods, annelids, nematodes: Stork et al., 2015; Bar-On et al., 2018). Most biologists thus agree that there is a pressing need for accelerating species discovery and description. This is very likely to require the development of new molecular methods. We would argue that they should not only be accurate, but also (1) rapid, (2) cost-effective, and (3) and largely insensitive to DNA quality. These criteria are important because tackling the earth’s biodiversity would likely require the processing of >500 million specimens even under the very conservative assumption that there are only 10 million species and a new species is discovered with every 50 specimens processed. Cost-effectiveness is similarly important because millions of species are found in countries with only basic research facilities and the large-scale international transfer of specimens for molecular work is becoming increasingly difficult under the Nagoya protocol. Fortunately, many species are already represented in museum holdings, but such specimens often yield degraded DNA (Cooper, 1994). Therefore, methods that require DNA of high-quality and quantity are not likely to be suitable for large-scale species discovery in invertebrates.
Conceptually, species discovery and description can be broken up into three steps. The first is obtaining specimens, the second, species-level sorting, and the third, species identification or description. Fortunately, centuries of collecting have generated many of the specimens that are needed for large-scale species discovery. Indeed, for many invertebrate groups it is likely that the museum collections contain more specimens of undescribed than described species; i.e., this unsorted collection material represents vast and still underutilized source for species discovery (Lister & Climate Change Research Group, 2011; Kemp, 2015; Yeates et al., 2016). The second step in species discovery/description is species-level sorting, which often involves various levels of sorting according to the taxonomic expertise available. This step is in dire need for acceleration. Traditionally, it starts with the sorting of unsorted material into major taxa (e.g., order-level in insects). This task can be accomplished by parataxonomists but may in the future be taken over by machine sorting utilizing neural networks (Valan et al., 2019). In contrast, the subsequent species-level sorting is usually time-limiting because the specimens for many invertebrate taxa have to be prepared by highly-skilled specialists (e.g., dissected; slide-mounted) before the material can be sorted into putative species. This means that the traditional techniques are neither rapid nor cost-effective for many invertebrate groups. This impediment is likely to be largely responsible for why certain taxa that are known to be abundant and species-rich are particularly poorly studied (Bickel, 1999).
An alternative way to sort specimens to species-level would be with DNA sequences. This approach is particularly promising for metazoan species because most multicellular animal species can be distinguished based on cytochrome c oxidase subunit I (cox1) barcode sequences (Hebert et al., 2003). However, such sorting requires that each specimen is barcoded. This creates cost- and scalability problems when the barcodes are obtained with Sanger sequencing (see Taylor and Harris, 2012). Such sequencing is currently still the standard in barcoding studies because the animal barcode was defined as a 658-bp long fragment of cox1 (“Folmer region”: Folmer et al., 1994), although sequences >500-bp with <1% ambiguous bases are also considered BOLD-compliant (BOLDsystems.org). The 658-bp barcode was optimized for ABI capillary sequencers but has become a burden because it is not suitable for most new sequencing technologies, while Sanger sequencing remains expensive and only scalable when expensive liquid-handling robots are used. This approach is hence unlikely to become widely available in those countries that harbour most of the species diversity. Due to these constraints, very few studies have utilized DNA barcodes to sort entire samples into putative species (but see Fagan-Jeffries et al., 2018). Instead, most studies use a mixed approach where species-level sorting is carried out based on morphology before a select few specimens per morphospecies are barcoded (e.g., Riedel et al., 2010). This two-step process requires considerable amounts of skilled manpower and time.
Scalability and cost-effectiveness are hallmark features of the new short-read high throughput sequencing technologies. In addition, these technologies are particularly suitable for sequencing the kind of degraded DNA that is typical for museum specimens. Indeed, anchored hybrid enrichment (AHE) has already been optimized for the use with old museum specimens (Bi et al., 2013; Guschanski et al., 2013; Blaimer et al., 2016) and is likely to play a major role for the integration of rare species into taxonomic and systematic projects. It will be difficult, however, to apply AHE to millions of specimens because it requires time-consuming and expensive molecular protocols (e.g., specimen-specific libraries). Fortunately, for most species it is likely that species-level sorting does not require a very large number of markers. We would thus argue that the initial species-level sorting can be achieved using barcodes that are obtained via “tagged amplicon sequencing” on next-generation sequencing platforms (“NGS barcodes”; Wang et al., 2018; Yeo et al., 2018). Until recently, obtaining full-length NGS barcodes via tagged amplicon sequencing was difficult because the reads of most next-generation-sequencing platforms were too short for sequencing the full-length barcode. This has now changed with the arrival of third-generation platforms (ONT: MinION: Srivathsan et al., 2018; PacBio: Sequel: Hebert et al., 2018). These platforms, however, come with drawbacks; viz elevated sequencing error rates and higher cost. These problems are likely to be overcome in the future (e.g., Yang et al., 2018), but such solutions will not solve the main challenge posed by full-length barcodes; i.e., reliably obtaining amplicons from museum specimens with degraded DNA (e.g., Hajibabaei et al., 2006). We thus submit that one should optimize the barcode length based on empirical evidence; it should be only as long as needed for accurate pre-sorting of specimens into putative species.
Barcodes that are shorter than the full-length barcode are called mini-barcodes. They are obtained with primers that amplify shorter subsets of the original barcode region and have several advantages. Firstly, such amplicons are easier to obtain when the DNA in the sample is degraded (Hajibabaei & McKenna, 2012). Secondly, mini-barcodes can be sequenced at low cost using tagged amplicon sequencing on short-read sequencing platforms (e.g., Illumina). Thirdly, mini-barcode primers are available for a large number of species-rich metazoan clades (Hajibabaei et al., 2006; Meusnier et al., 2008; Hebert et al., 2013; Little, 2014) as well as for specific taxa such as fruit flies, catfish and sharks (Fan et al., 2009; Bhattacharjee & Ghosh, 2014; Fields et al., 2015). It is thus not surprising that short barcodes are already the barcodes of choice when the template DNA is degraded. This is often the case for museum specimens (Zuccon et al., 2012; Hebert et al., 2013) or for environmental DNA which is usually analysed via metabarcoding (e.g.: processed food: Armani et al., 2015; Shokralla et al., 2015; water, soil, fecal matter: Epp et al., 2012; Srivathsan et al., 2015; Lim et al., 2016). Mini-barcodes were initially obtained via Sanger sequencing, but they can now be sequenced much more efficiently via tagged amplicon sequencing on short-read platforms (“NGS barcoding”: Wang et al., 2018: sequencing cost < 4 cents). Wang et al. (2018) could thus implement a “reverse workflow” based on sequencing all specimens without any species-level pre-sorting based on morphology. Four-thousand specimens of ants were barcoded with a 313-bp mini-barcode. The specimens were then pre-sorted into 89 molecular operational taxonomic units (mOTUs) that were largely congruent with morphospecies (86 species). However, it remained unclear whether full-length DNA barcodes would have further improved congruence, whether the results from this study can be generalized, and which mini-barcode is optimal for large-scale pre-sorting of specimens into putative species.
The answers to these questions remain elusive, because mini-barcodes remain insufficiently tested despite their ubiquitous use in metabarcoding. Arguably, existing tests suffer from lack of scale (the largest study includes 6695 barcodes for 1587 species: Meusnier et al., 2008) and taxonomic scope (usually only 1-2 family-level taxa: e.g. Hajibabaei et al., 2006; Yu & You, 2010). Furthermore, the tests yielded conflicting results. Hajibabaei et al. (2006) found high congruence with the full-length barcode when species are delimited based on mini-barcodes and Meusnier et al. (2008) find similar BLAST identification rates for mini-barcodes and full-length barcodes in their in silico tests. However, Yu & You (2010) conceded that mini-barcodes may have worse accuracy despite having close structural concordance with the full-length barcode. In addition, Sultana et al. (2018) concluded that the ability to identify species is compromised when the barcodes are too short (<150-bp), but it remained unclear at which length and in which position mini-barcodes start performing well. Furthermore, published tests of mini-barcodes compare their performance to results obtained with full-length barcodes. All conflict is then implicitly considered evidence for the failure of mini-barcodes to yield the “correct” mOTUs. However, results obtained with longer barcodes should not automatically be assumed to be accurate given that the Folmer region varies in nucleotide variability (Roe & Sperling, 2007). Lastly, the existing tests of mini-barcodes do not include a sufficiently large number of different mini-barcodes in order to be able to detect positional and lengths effects across the 658-bp barcode region.
Here, we address the lack of scale by including 20 studies covering 5500 species represented by ca. 30,000 barcodes. We furthermore test a large number of different mini-barcodes by applying a sliding window approach to generate mini-barcodes of different sizes (100, 200, 300-bp window sizes, 60-bp intervals) and compare the results to the performance of nine mini-barcodes with published primers (mini-barcode length: 94 – 407-bp). The taxonomic scope of our study is broad enough to include a wide variety of metazoans ranging from earthworms to butterflies and birds. Lastly, we do not assume that mOTUs based on full-length barcodes are automatically more accurate than those obtained with mini-barcodes. Instead, we use morphology as an external criterion for assessing whether mOTUs obtained with different-length barcodes have different levels of congruence with morphospecies. Note that this does not imply that morphology is more suitable for species delimitation than molecular data. Instead, we test whether shortening barcodes influences congruence with morphology; i.e. morphology is treated as a constant while testing whether barcode length and/or position influences the number of morphospecies that are recovered. Given that morphology is a generally accepted type of data that can be used for species delimitation, mini-barcodes that significantly lower congruence with morphospecies are unlikely to be useful for accurate species-level sorting.
We also compare the performance of different species delimitation methods. There has been substantial interest in developing algorithms for mOTU estimation, leading to the emergence of various species delimitation algorithms over the past decade (e.g., objective clustering: Meier et al., 2006; BPP: Yang & Rannala, 2010; jmOTU: Jones et al., 2011; ABGD; Puillandre et al., 2012; BINs: Ratnasingham & Hebert, 2013; PTP: Zhang et al., 2013; etc.). For the purposes of this study, we selected three algorithms as representatives of distance and tree-based methods: objective clustering, Automatic Barcode Gap Discovery (ABGD) and Poisson Tree Process (PTP). Objective clustering utilizes an a priori distance threshold to group sequences into clusters, ABGD groups sequences into clusters based on an initial prior and recursively uses incremental priors to find stable partitions, while PTP utilizes the branch lengths on the input phylogeny to delimit species units. Arguably, barcode data may not be appropriate for the application of PTP because a single marker is not likely to yield reliable phylogenetic trees (including branch lengths), but PTP has been frequently applied to barcode data in the literature (e.g. Ermakov et al., 2015; Han et al., 2016; Hollatz et al., 2016) and is thus included here. There are numerous additional techniques for species delimitation, but most require multiple markers and/or are usually even more reliant on accurately reconstructed phylogenetic trees and may not be easily scalable to millions of specimens. They are therefore not included in this study.
Materials & Methods
Dataset selection
We surveyed the barcoding literature in order to identify publications that cited the original barcode paper by Hebert et al. (2003) and met the following criteria: 1) have pre-identified specimens where the barcoded specimens were pre-sorted/identified based on morphology and 2) the dataset had at least 500 specimens with cox1 barcodes >656-bp. We identified 20 most recent datasets starting from 2017 (Table S1); all had >500 barcoded specimens even after removing those that were not sorted to species level (e.g., only identified to genus or higher) or had short sequences <657-bp (the full-length barcode is technically 658-bp long, but a 1-bp concession was made to prevent the loss of too much data). The barcode sequences were downloaded from BOLDSystems or NCBI GenBank and aligned with MAFFT v7 (Katoh & Standley, 2013) with a gap opening penalty of 5.0.
Using a custom python script, we generated three sets of mini-barcodes along a “sliding window”. They were of 100-, 200- and 300-bp lengths. The first iteration begins with the first base pair of the 658-bp barcode and the shifting windows jump 60-bp at each iteration, generating ten 100-bp windows, eight 200-bp windows and six 300-bp windows. Additionally, we identified nine mini-barcodes with published primers within the cox1 Folmer region (Fig. 1 & Table S2). These mini-barcodes have been repeatedly used in the literature published after 2003 and were used for a broad range of taxa. The primers for the various mini-barcodes were aligned to the homologous regions of each dataset with MAFFT v7 --addfragments (Katoh & Standley, 2013) in order to identify the precise position of the mini-barcodes within the full-length barcode. The mini-barcode subsets from each barcode were then identified after alignment to full-length barcodes. Note that most of the published primers are in the 5’ prime region of the full-length barcode.
Species delimitation
The mini-barcodes and the full-length barcodes were clustered into putative species using three species delimitation algorithms: objective clustering (Meier et al., 2006), ABGD (Puillandre et al., 2012) and PTP (Zhang et al., 2013). For objective clustering, the mOTUs were clustered at 2 − 4% uncorrected p-distance thresholds (Srivathsan & Meier, 2012) using a python script which reimplements the objective clustering of Meier et al. 2006 and allows for batch processing. The p-distance thresholds selected are the typical distance thresholds used for species delimitation in the literature (Meier et al. 2006; Ratnasingham & Hebert, 2013; Meier et al. 2016). The same datasets were also clustered with ABGD (Puillandre et al., 2012) using the default range of priors and with uncorrected p-distances, but the minimum slope parameter (-X) was reduced in a stepwise manner (1.5, 1.0, 0.5, 0.1) if the algorithm could not find a partition. We then considered the ABGD clusters at priors P=0.001, P=0.01 and P=0.04 in this study. The priors (P) refer to the maximum intraspecific divergence and functions similarly to p-distance thresholds at the first iteration, before being recursively refined by recursive application of the ABGD algorithm. Lastly, in order to use PTP, the datasets were used to generate maximum likelihood (ML) trees in RAxML v.8 (Stamatakis, 2014) via rapid bootstrapping (-f a) and the GTRCAT model. The best tree generated for each dataset was then used for species delimitation with PTP (Zhang et al., 2013) under default parameters.
Performance assessment
We assess the performance of mini-barcodes by using morphospecies as an external arbiter. Species-level congruence was quantified using match ratios between molecular and morphological groups (Ahrens et al., 2016). The ratio is defined as , where Nmatch is the number of clusters identical across both mOTU delimitation methods/thresholds (N1 & N2). Incongruence between morphospecies and mOTUs is usually caused by a few specimens that are assigned to the “incorrect” mOTUs. Conflict at the specimen-level can thus be quantified as the number of specimens that are in mOTUs that cause conflict with morphospecies.
In order to test whether barcode length is a significant predictor of congruence, MANOVA tests were carried out in R (R Core Team, 2017) with “match ratio” (species-level congruence) as the response variable and “dataset” and “mini-barcode” as categorical explanatory variables. We found that most of the variance in our study was generated by the “dataset” variable (P < 0.05 in MANOVA tests). Given that we were particularly interested in the effect of barcode length and position, “dataset” was subsequently treated as a random effect “mini-barcode” as the explanatory variable (categorical) in a linear mixed effects model (R package lme4: Bates, 2010). The emmeans R package (Lenth, 2018) was then used to perform pairwise post-hoc Tukey tests between mini- and full-length barcodes so as to assess whether either barcode was performing significantly better/worse. To compare the differences in performance between objective clustering, ABGD and PTP, ANOVA tests were performed in R. After which, pairwise Tukey tests were used to determine which species delimitation method was responsible for significant differences. Lastly, in order to explore the reasons for positional effects, the proportion of conserved sites for each mini-barcode was obtained using MEGA6 (Tamura et al., 2013).
Match ratios indicate congruence at the species level, but it is also important to determine how many specimens have been placed in congruent units. Species- and specimen-level congruence are only identical when all mOTUs are represented by the same number of specimens. However, specimen abundances are rarely equal across species and hence match ratio is insufficient at characterizing congruence between mOTUs and morphospecies. It is straightforward to determine the number of congruent specimens as follows:
Congruence Class I specimens: If A = B then number of congruent specimens is Nc1 = |A| OR |B|.
Incongruence is caused by morphospecies that are split, lumped, or split and lumped in the mOTUs. However, any one mis-sorted specimen placed into a large-sized mOTU leads to all specimens in two mOTUs to be considered “incongruent” according to the criterion outlined above. Yet, most specimens are congruent and full congruence could be restored by reexamining the mis-sorted specimen. It is therefore also desirable to determine the number of specimens that require re-examination or, conversely, the number of specimens that would be congruent if one were to remove a few incongruent specimens. This number of specimens can be estimated by counting congruent specimens as follows:
Congruence Class II specimens: Specimens that are in split or lumped mOTUs relative to morphospecies. Here, the largest subset of congruently placed specimens can be determined as follows. If A1 ∪ A2 ∪ … ∪ Ax = B : Nc2= max(|A1|, |A2| … |Ax|)
Congruence Class III specimens: This covers specimens in sets of clusters that are both split and lumped relative to morphospecies. Here, only those specimens are considered potentially congruent that (1) are in one mOTU and one morphospecies and (2) combined exceed the number of the other specimens in the set of clusters. In detail, if A1 ∪ A2 ∪ … ∪ Ax = B1 ∪ B2 ∪ … ∪ By : Nc3 = max(|A1 ∩ B1|, |A2 ∩ B1| … |Ax ∩ By|) only if .
Results
For species delimitation with objective clustering, we found that the 2% p-distance threshold yielded the highest congruence across the datasets. It was hence used as the upper-bound estimator for species- and specimen-level congruence. The corresponding results for the 3 and 4% p-distance clusters are reported in the supplementary materials. For ABGD it was the P=0.001 prior that yielded the highest average match ratio and hence the clusters generated by this prior were used in the main analysis (see supplementary material for results under P=0.01 and P=0.04). PTP does not require parameter choices post the input tree.
The MANOVA tests performed on all treatments (species delimitation method and distance threshold/prior) indicated that the test variable “dataset” was responsible for much more of the observed variance in “match ratio”. The choice of mini-barcode or mOTU algorithm that was used to generate the mOTUs was of secondary importance (Table S3). After accounting for “dataset”, we find that only mini-barcodes <200-bp perform significantly worse than full-length barcodes (Fig. 2); for all other mini-barcodes (>200-bp) the congruence with morphospecies does not differ significantly and is occasionally superior to what is observed for the full-length barcode. This is evident in the large number of significant differences (p < 0.05 & p < 0.001) in pairwise post-hoc Tukey tests applied to 100-bp mini- and 657-bp full-length barcodes. Only short <100-bp barcodes have a mean performance that is worse (<0 match ratio deviation) than the full-length barcode. Conversely, there is no significant difference between the 200 and 300-bp mini-barcodes and the full-length barcode when objective clustering or PTP are used to estimate mOTUs. Under ABGD, the mini-barcodes outperform the full-length barcodes. For all mOTU delimitation methods, the variance across datasets appears to decline as the mini-barcode increases in length (Fig. 2). The results obtained for in silico mini-barcodes are consistent with the performance of mini-barcodes with published primers: the mini-barcodes of 94-bp, 130-bp, and 145-bp length tend to perform worse than the longer mini-barcodes (Fig. 3). The results are also similar for specimen-level congruence (Table 1 & Fig. S8). However, there are some exceptions including the performance improvements of short mini-barcodes, for example, for European marine fish and Northwest Pacific molluscs when grouped with objective clustering.
When the performance of the three different clustering methods was compared, significant differences (p < 0.05 in ANOVA test) were found only for the 100-bp mini-barcode set (Fig. 4). Here, pairwise post-hoc Tukey tests find that objective clustering performs significantly better than the other delimitation methods (p < 0.001) while ABGD and PTP do not differ significantly (p = 0.88) but behave erratically for short mini-barcodes (Fig. 2).
Mini-barcodes situated at the 5’ end of the full-length barcode appear to perform somewhat worse than those situated at the middle or at the 3’ end (Fig. 2). For example, the 100-bp mini-barcodes at the 5’ end perform poorly for objective clustering (mini-barcode midpoints at 50, 110 & 170-bp), ABGD (mini-barcode midpoints at 110 & 170-bp) and PTP (mini-barcode midpoint at 110-bp). This effect is, however, only statistically significant when the mini-barcodes are very short (100-bp). This positional effect is present across all species delimitation techniques. Note that the 5’ end of the full-length barcode appears to contain a large proportion of conserved sites, particularly around the 170-bp and 230-bp midpoint of the 100-bp mini-barcode (Fig. 5). This positional effect averages out as the mini-barcodes increase in length.
With regard to specimen-based congruence, we evaluated to the mini-barcodes with published primers and here report the results for those that a barcode length >200-bp. Approximately three quarters of all specimens are in the “Congruence Class I” (Tables 1 & S8); i.e., their placement is congruent between mOTUs and morphospecies (Average/Median: OC at 2%: 75/75%; ABGD P=0.001: 71/72%; PTP: 75/75%). The remaining specimens are placed in mOTUs that are split, lumped, or split and lumped. The number of specimens that are predominantly responsible for the splitting and lumping are here classified as Congruence Class II and III specimens. Overall, fewer than 10% of the specimens fall into these categories (Table 1: see Class II specimens across species delmitation methods). These are the specimens that should be studied when addressing conflict between morphospecies and mOTUs.
Discussion
Accelerating species discovery and description is arguably one of the foremost challenges in modern systematics. Material for many undescribed species is already in world’s natural history museums, but the specimens need to be sorted to species-level before they become available for species identification/description and can be used for large-scale analyses of biodiversity patterns. Pre-sorting specimens with DNA barcodes is a potentially promising solution because it is scalable, can be applied to millions of specimens, and much of the specimen handling can be automated. However, in order for this approach to be suitable, a sufficiently large proportion of the pre-sorted units need to accurately reflect species boundaries and the methods for obtaining the sequences need to be suitable for the processing of large numbers of specimens whose DNA is degraded.
The main source of variance in congruence: datasets
We here find that the average congruence between mOTUs and morphospecies is 80% for all barcodes >200-bp (median: 83%), with the median being higher (83%) because of outlier datasets with congruence <65% (OC at 2%; ABGD P=0.001, PTP). These outlier datasets are also likely to be responsible for the observation that much of the variance in congruence throughout our study is caused explained by the variable “dataset”. Despite the outliers, 72-75% (median) of the ca. 30,000 specimens are assigned to species that are supported by molecular and morphological data. Overall, this is a very high proportion when compared to species-level sorting by parataxonomists (Krell, 2004). Unfortunately, this specimen-based perspective on congruence is often underappreciated when mOTUs and morphospecies are compared. However, specimen-level congruence is an important criterion for evaluating the suitability of species-level sorting with barcodes. After all, the basic units in a museum collections or an ecological survey are specimens and not species. The correct placement of specimens into species is thus important for systematists and biodiversity researchers alike given that the former would like to see most of the specimens in a collection correctly placed and the latter often need abundance and biomass information at species-level resolution.
The remaining ca. 25% of specimens are placed in mOTUs whose boundaries do not agree with morphospecies. One may initially consider this an unacceptably high proportion, but it is important to keep in mind that the misplacement of one specimen (e.g., due to a contamination of a PCR) will render two mOTUs incongruent; i.e., all specimens in these mOTUs will be considered incongruent and included in the 25%. Arguably, one should instead estimate how many specimens are causing the conflict. These are the specimens that should be targeted in reconciliation studies. The proportion across the 20 datasets in our study is fairly low and ranges from 10-12% (median) depending on which mOTU delimitation technique is used.
Conflict between mOTUs and morphospecies can be caused by technical error or biology. A typical technical factor would be accidental misplacement of specimens due to lab contamination or error during morphospecies sorting. Indeed, the literature is replete with cases where mOTUs that were initially in conflict with morphospecies became congruent once the study of additional morphological characters let to the revision of morphospecies boundaries (e.g., Smith et al., 2008; Tan et al., 2010; Baldwin et al., 2011; Ang et al., 2017). But there are also numerous biological reasons for why one should not expect perfect congruence between mOTUs and species. Lineage sorting, fast speciation, large amounts of intraspecific variability, and introgression are known to negatively affect the accuracy of DNA barcodes (Will & Rubinoff, 2004; Rubinoff et al., 2006; Meier, 2008). It is thus somewhat surprising that regardless of these issues, the final levels of congruence between morphospecies and DNA sequences are often quite high in animals (Ball et a., 2005; Cywinska et al., 2006; Renaud et al., 2012; Landi et al., 2014; Wang et al., 2018). This implies that the pre-sorting specimens to species-level units based on mini-barcodes is worth pursuing for many metazoan clades. High levels of congruence are, however, not a universal observation across all of life. This approach to specimen sorting is unlikely to be useful in groups with widespread barcode sharing between species. This phenomenon occurs within Metazoa (e.g., Anthozoa: Huang et al., 2008) and is likely to be the default outside of Metazoa (e.g., Chase & Fay, 2009; Hollingsworth et al., 2011).
Barcode length and species delimitation methods
We here tested the widespread assumption that mOTUs based on full-length barcodes are more reliable than those based on mini-barcodes (Burns et al., 2007; Min & Hickey, 2007). If this assumption was confirmed, then the use of mini-barcodes would have to be discouraged despite higher amplification success rates, improved suitability for degraded starting material, and the availability of cost-effective sequencing on short-read high-throughput platforms. However, we find that the performance of cox1 mini-barcodes with a length >200-bp do not differ significantly from the performance of full-length barcodes. Indeed, compared to the dataset effect, the choice of barcode length is largely secondary. This conclusion is robust across 20 diverse datasets and holds across different clustering algorithms.
We also find that the choice of species delimitation algorithm matters little for mini-barcodes >200-bp (Fig. 4). This is fortunate as objective clustering and ABGD algorithms are less computationally demanding than PTP, which necessitates the reconstruction of a ML trees. However, there are some exceptions. Firstly, when the mini-barcodes are extremely short (~100-bp), objective clustering tends to outperform ABGD and PTP. PTP’s poor performance for the 100-bp mini-barcodes is not surprising given that it relies on tree topologies which cannot be estimated with confidence based on so little data. ABGD’s poor performance is mostly observed for certain priors (e.g., P=0.04: Fig. S5 & S6). Under these priors, ABGD tends to lump most of the 100-bp barcodes into one or few large clusters. Prior-choice also affects ABGD’s performance for full-length barcodes. ABGD does not perform well with very low priors (P = 0.001: Fig. 2 & 3 vs. P = 0.01; P = 0.04: Fig. S5). Overall, we conclude that the selection of the best priors and/or clustering thresholds remains a significant challenge for the study of largely unknown faunas that lack morphological information as an a posteriori method for selecting priors/thresholds. Overall, we recommend the use of multiple methods and thresholds in order to distinguish robust from labile mOTUs that are heavily dependent on threshold- or prior-choice.
Positional effects
We find that in general, mini-barcodes at the 3’ end of the Folmer region outperform mini-barcodes at the 5’ end. This is consistent across all three species delimitation methods and was also reported by Shokralla et al. (2015) who concluded that mini-barcodes at the 5’ end have worse species resolution for fish species. This positional effect is apparent when match ratios are compared across a “sliding window” (Fig. 2). The lowest congruence with morphology is observed for 100-bp mini-barcodes with midpoints at the 50, 110 and 170-bp marks. However, this positional effect is only significant when the barcode lengths are very short (<200-bp). Once the mini-barcodes are sufficiently long (>200-bp), there seems to be no appreciable difference in performance, which is not surprising because sampling more nucleotides helps with buffering against regional changes in nucleotide variability across the Folmer region. These changes may be related to the conformation of the Cox1 protein in the mitochondrion membrane. The Folmer region of Cox1 contains six transmembrane α-helices and connected by five loops (Tsukihara et al. 1996; Pentinsaari et al. 2016). Pentinsaari et al. (2016) compared 292 Cox1 sequences across 26 animal phyla and found high amino acid variability in helix I and the loop connecting helix I and helix II (corresponding to position 1-102 of cox1), as well as end of helix IV and loop connecting helix IV and V (corresponding to positions ~448-498). These regions of high variability are distant from the active sites and thus less likely to affect Cox1 function (Pentinsaari et al. 2016). This may lead to lower selection pressure and high variability in these areas which could impact the performance of mini-barcodes for species delimitation.
Accelerating biodiversity discovery and description
We had earlier argued that species discovery and description can be broken up into three steps (1) obtaining specimens, (2) species-level sorting and (3) species identification or description. We here only address species-level sorting. This means that the impediments caused by slow species identification and description remain apparently unresolved. However, this is only partially correct. Firstly, some mOTUs delimited via barcodes can be identified via barcode databases. The proportion of successful identification differs depending on how well a particular fauna has been studied. This is illustrated by our recent work on dragon- and damselflies (Odonata), ants (Formicidae), and non-biting midges (Chironomidae) in Singapore (Wang et al., 2018, Yeo et al., 2018; Baloğlu et al., 2018). For odonates, BLAST-searches identified more than half of the 95 mOTUs and >75% of the specimens to species. The corresponding numbers for ants and midges were ca. 20% and 10% at mOTU-level, and 9% and 40% at the specimen-level. Secondly, mOTUs discovered via barcodes can be readily compared across studies and borders (Ratnasingham, et al. 2013). In contrast, species newly discovered based on morphological evidence usually remain unavailable to the scientific community until they are published. This is a very significant differences because a large amount of downstream biodiversity analysis can be carried out based on mOTUs instead of identified/described species. This includes studying species richness and abundance over time which is a task that is becoming increasingly important in the 21st century. This means that only moderate harm is done if species identification or description are only completed at a later time.
The analyses of biodiversity patterns will be impacted by incorrectly delimited mOTUs. In our study, we find that ca. 80% of the mOTUs are congruent with morphospecies. This is prior to a reconciliation stage where the morphology of specimens with a conflicting assignment is revisited in order to rule out that the morphological evidence was misinterpreted and/or an insufficient number of characters was studied; i.e., we would predict that the overall congruence levels after reconciliation will be higher. Ideally, we would like to know which proportion of mOTUs that are in conflict with morphospecies will eventually be rejected, but unfortunately we still know fairly little about the congruence levels between morphology and barcodes after reconciliation. This is because rigorous studies would have to be based on datasets with dense taxon and geographic sampling where morphological and DNA sequence information is obtained for all specimens and all cases of conflict are re-studied. Unfortunately, there are very few datasets that satisfy these criteria. This is presumably because the high cost of full-length barcodes has prevented biologists from sequencing all specimens.
Conclusions
We here illustrate that mini-barcodes can be used for pre-sorting specimens into putative species and that they are arguably the preferred choice because (1) they are obtained more readily for specimens that only yield degraded DNA (Hajibabaei et al., 2006) and (2) are much cheaper. In particular, we recommend the use of mini-barcodes >200-bp at the 3’ end of the Folmer region. It is encouraging that such mini-barcodes perform well across a large range of metazoan taxa. These conclusions are based on three species delimitation algorithms (objective clustering, ABGD and PTP) which, overall, have no appreciable differences in performance for such mini-barcodes. If the DNA of the specimens is so degraded that very short mini-barcodes have to be obtained, we advise against the use of PTP and ABGD (especially with high priors) in order to reduce the likelihood that species are lumped.
Acknowledgements
We would like to acknowledge support from a Ministry of Education grant on biodiversity discovery (R-154-000-A22-112). We would also like to thank Athira Adom for data processing and Emily Hartop for proofreading.