ABSTRACT
Phenotypic evolution and speciation depend on recombination in many ways. Within populations, recombination can promote adaptation by bringing together favorable mutations and decoupling beneficial and deleterious alleles. As populations diverge, cross-over can give rise to maladapted recombinants and impede or reverse diversification. Suppressed recombination due to genomic rearrangements, modifier alleles, and intrinsic chromosomal properties may offer a shield against maladaptive gene flow eroding co-adapted gene complexes. Both theoretical and empirical results support this relationship. However, little is known about this relationship in the context of behavioral isolation, where co-evolving signals and preferences are the major hybridization barrier. Here we examine the genomic architecture of recently diverged, sexually isolated Hawaiian swordtail crickets (Laupala). We assemble a de novo genome and generate three dense linkage maps from interspecies crosses. In line with expectations based on the species’ recent divergence and successful interbreeding in the lab, the linkage maps are highly collinear and show no evidence for large-scale chromosomal rearrangements. The maps were then used to anchor the assembly to pseudomolecules and estimate recombination rates across the genome. We tested the hypothesis that loci involved in behavioral isolation (song and preference divergence) are in regions of low interspecific recombination. Contrary to our expectations, a genomic region where a male song QTL co-localizes with a female preference QTL was not associated with particularly low recombination rates. This study provides important novel genomic resources for an emerging evolutionary genetics model system and suggests that trait-preference co-evolution is not necessarily facilitated by locally suppressed recombination.
INTRODUCTION
Speciation is contingent on the accumulation of genomic variation and the formation of barriers that prevent gene flow between populations. Genomes diverge under the influence of selection and drift, while gene flow counteracts this divergence by homogenizing the genome (Felsenstein 1981; Kirkpatrick and Ravigne 2002; Gavrilets 2003). To appreciate the speciation process and the origin of the fascinating diversity of life on earth, we need to understand the interaction between the mechanisms that change allele frequencies and the mechanisms that govern the association of beneficial and deleterious alleles with other alleles. A key process in this interaction is recombination, which creates new allelic combinations during meiosis in sexually reproducing organisms.
Any association between loci that underlie environmental adaptation or between loci underlying co-evolving (sexual) signals and signal responses (i.e. co-adapted gene complexes) will be affected by recombination (Felsenstein 1981). Within populations, recombination can mitigate Hill-Robertson interference by combining locally adaptive alleles from different genomic backgrounds and by decoupling beneficial and deleterious alleles (Hill and Robertson 1966); recombination can also influence the covariance between sexual traits and preference across sexes (Smith and Haigh 1974; Smith 1978; Gillespie 2000; Otto 2009). As such, recombination might increase the efficiency of background selection (purging deleterious alleles), sexual selection (through signal-preference co-evolution), and local adaptation in the earliest stages of speciation (by linking locally adapted alleles; Noor et al. 2001; Rieseberg 2001; Kirkpatrick and Ravigne 2002; Yeaman and Whitlock 2011).
Between divergent populations with some (but incomplete) reproductive isolation, recombination can also counteract population divergence and prevent the closure of a reproductive boundary by creating combinations of alleles that are favorable in different contexts (Noor et al. 2001; Rieseberg 2001; Coyne and Orr 2004; Ortiz-Barrientos et al. 2016). It is important to realize that interspecific recombination is constrained both by intrinsic properties of the species’ genomes that also constrain intraspecific recombination, as well as by the effects from (divergent) selection and alternatively fixed chromosomal rearrangements (Yeaman and Whitlock 2011; Feder et al. 2012). The most intensely studied chromosomal rearrangements suppressing recombination between divergent populations are inversions. Inversions can suppress recombination locally in the genome and, thus, promote reproductive isolation, by trapping genetic incompatibilities in linkage blocks (Noor et al. 2001), acting synergistically with other genes causing isolation (Rieseberg 2001), or by linking locally adaptive alleles (Kirkpatrick and Barton 2006). Other chromosomal rearrangements, such as translocations and transposable elements, can likewise contribute to ‘chromosomal speciation’ (Rieseberg 2001) as well as to preventing gene flow and furthering genetic divergence among heterospecifics.
Interestingly, there is ubiquitous among-species variation in recombination rates (Wilfert et al. 2007; Smukowski and Noor 2011). In insects, for example, rates vary from 16.1 cM/Mb (centi-Morgans per megabase) in Apis melifera to 0.1 cM/Mb in the mosquito Armigeres subalbatus (Wilfert et al. 2007). There is also variation across the genome within individuals. For example, 50-fold differences have been observed within single chromosomes of humans and birds (Myers et al. 2005; Singhal et al. 2015). These patterns of variation underline that the efficacy of selection acting within species may differ across taxa and across genomes of the same species.
A major prediction following from theoretical work is that favorable allele combinations that promote ecological adaptation are more likely to reside in regions of low recombination. Recombination frustrates natural selection by breaking up associations between segregating alleles that are locally adaptive within the resident population and counteracts divergent selection if there is gene flow between recently diverged populations (Bürger and Akerman 2011; Yeaman and Whitlock 2011; Yeaman 2013). So far, empirical evidence for the prediction that locally adaptive alleles reside in regions of low recombination is not conclusive (Roesti et al. 2013; Burri et al. 2015; Marques et al. 2016). However, a recent study indicated that the interaction between gene flow and divergent selection is a strong predictor for the association between adaptive alleles and regions of low recombination in multiple species of stickleback fish (Samuk et al. 2017).
However, it is unclear how these predictions apply to the evolution of behavioral isolation. Theoretical models of speciation by sexual selection depend on linkage disequilibrium between sexual signaling traits and corresponding preference genes (Fisher 1930; Lande 1981; Kirkpatrick 1982). Linkage disequilibrium between trait and preference genes can come about by assortative mating (Lande 1981; Andersson and Simmons 2006) or by physical linkage (Kirkpatrick and Hall 2004), either through closely linked loci or through pleiotropy (a single gene affecting both signal and preference phenotypes). Here, the role of recombination is more complex: On the one hand, recombination can help consolidate loci brought together by nonrandom mating and as such facilitate linkage disequilibrium between trait and preference (Kirkpatrick and Ravigne 2002). On the other hand, recombination can also tear apart co-adapted trait and preference alleles if genes are exchanged between populations that differ in mating phenotypes. Therefore, recombination between sexually divergent populations in sympatry and parapatry often compromises differentiation in mating phenotypes and hinders speciation (Arnegard et al. 2004; Servedio 2009, 2015; Servedio and Burger 2014). However, there has been limited empirical insight into the relationship between trait-preference co-evolution and genome-wide variation in recombination rates (see Davey et al. 2017 for a recent exception).
Here, we examine the genomic architecture, specifically structural variation and heterogeneity in interspecific recombination, of four closely related, sexually isolated species of Hawaiian swordtail crickets from the genus Laupala. Laupala is one of the fastest speciating taxa known to date (Mendelson and Shaw 2005). The 38 morphologically cryptic species, each endemic to a single island of the Hawaiian archipelago (Otte 1994; Shaw 2000a) are the product of a recent evolutionary radiation. Evidence suggests that speciation by sexual selection on the acoustic communication system has driven this rapid diversification, as both male mating song and female acoustic preferences have diverged extensively among Laupala species (Otte 1994; Shaw 2000b; Mendelson and Shaw 2002). Sexual trait evolution strongly contributes to the onset and maintenance of reproductive isolation (Mendelson and Shaw 2002; Grace and Shaw 2011). Quantitative variation in one key temporal property of male song (pulse rate) and corresponding female preference strongly covaries across species and across populations within species (Shaw 2000b; Grace and Shaw 2011). Although the mechanisms of trait-preference co-evolution require further study, there is evidence that both are associated with a polygenic basis and that genetic loci controlling quantitative variation in traits and preferences are physically linked in the genome (Shaw and Lesnick 2009; Wiley et al. 2012). Notably, one of the major song quantitative trait loci (QTL; haploid effect size ∼ 9%) co-localizes with the first mapped preference QTL (haploid effect size ∼ 14%). Directional effects of song QTL provide additional evidence that (sexual) selection is driving divergence between species (Shaw et al. 2007).
The species pairs involved in this study, L. kohalensis and L. pruna, and L. paranigra and L. kona, are endemic to the Big Island, the youngest island of the Hawaiian archipelago (Fig 1A, B). Although these species pairs have apparently diverged in allopatry within the Big Island, past or future migration is likely, given their geographical proximity. Indeed, although allopatric and more closely related to L. kohalensis, L. pruna currently overlaps in distribution with L. paranigra (Fig 1B). The discordance between nuclear and mitochondrial phylogenies (Shaw 2002) and the limited degree of postzygotic isolation between some species pairs further emphasize the possibility of gene flow across natural populations. Together, the biogeography and the genetics of song and preference variation in this system provide a unique opportunity to explore the interaction between interspecific recombination rate variation, co-evolution of mating traits, and speciation.
We first assemble a de novo L. kohalensis draft genome and then obtain thousands of SNP markers for heterogeneously hybrid offspring from three laboratory-generated interspecific crosses. We then generate three dense linkage maps and compare these maps to test the hypothesis that the genomic architectures of young, sexually differentiated species are largely collinear (similar marker order) and have conserved interspecific recombination frequencies (similar marker distances). There is some variation in the level of overall differentiation in the species pairs studied here, but all lineages are young (approximately 0.5 million years or less, Fig 1). It is commonly expected that strong prezygotic isolation can evolve rapidly and largely in the absence of intrinsic postzygotic isolating mechanisms (Coyne and Orr 2004), but explicit comparisons of chromosomal architectures across behaviorally isolated species are rare. We compare the maps visually and use variation in maker order and length (measured in genetic distance, or centi-Morgans [cM]) as indicators of possible chromosomal rearrangements affecting the recombination rates differently in different crosses (Fig 1C). Then, from the large amount of information on linkage across many genomic markers from three hybrid crosses, we anchor the draft genome assembly to pseudomolecules and estimate the landscape of recombination across the genome. Finally, using an additional map that integrates the amplified fragment length polymorphism (AFLP) markers from previous QTL studies in L. kohalensis and L. paranigra, we approximate the location of known male song QTL, including one co-localizing with a female acoustic preference QTL, on the pseudomolecules. We examine local variation in recombination rates across the genome and in relation to the location of the song and preference QTL to test the hypothesis that song-preference co-evolution is facilitated by suppressed interspecific recombination. This study provides important insight into the role of the genomic architecture during divergence of closely related species separated by premating barriers.
MATERIAL & METHODS
De novo genome assembly
The Laupala kohalensis draft genome (estimated genome size ∼ 1.9 Gb; Petrov et al. 2000) was sequenced using the Illumina HiSeq 2500 platform. DNA was isolated with the DNeasy Blood & Tissue Kits (Qiagen Inc., Valencia, CA, USA) from six immature female crickets (c. five months of age) chosen randomly from a laboratory stock population (approximate lab generation=14). Females were chosen to balance DNA content of sex chromosomes to autosomes (female crickets are XX; male crickets are XO). DNA was subsequently pooled for sequencing. Four different libraries were created: a paired-end library with an estimated insert size of 200 bp (sequenced by Cornell Biotechnology Resource Center), a paired-end library with an estimated insert size of 500 bp, and two mate-pair libraries with insert sizes of 2 and 5 Kb (sequenced by Cornell Weill College Genomics Resources Core Facility).
Reads were processed using Fastq-mcf from the Ea-Utils package (Aronesty 2011) with the parameters -q 30 (trim nucleotides from the extremes of the read with qscore below 30) and -l 50 (discard reads with lengths below 50 bp). Read duplications were removed using PrinSeq (Schmieder and Edwards 2011) and reads were corrected using Musket with the default parameters (Liu et al. 2013).
Reads were assembled using SoapDeNovo2 (Luo et al. 2012). The reads were assembled using different Kmer sizes (k = 31, 39, 47, 55, 63, 71, 79 and 87). The 87-mer assembly produced the best assembly (based on N50/L50, assembly size, and number of scaffolds). Scaffolds and contigs were renamed using an in-house Perl script. Gaps were filled using GapCloser from the SoapDeNovo2 package.
The gene space covered by the assembly was evaluated using three different approaches. (1) Laupala kohalensis unigenes produced by the Gene Index initiative (Cricket release 2.0: http://compbio.dfci.harvard.edu/cgi-bin/tgi/gimain.pl?gudb=cricket) were mapped using Blat (Kent 2002). Only unigenes mapping with 90% or more of their length were considered; (2) 50 bp paired-end RNA-seq reads from a congeneric species, L. cerasina were mapped using Tophat2 (Kim et al. 2013). Reads were processed using the same methodology described above, but using a minimum length of 30 bp; (3) using BUSCO (Simão et al. 2015) to search for conserved eukaryotic and arthropod genes.
Samples
We generated three F2 interspecies hybrid families to estimate genetic maps. Multiple F1 male and sibling females were intercrossed to generate F2 mapping populations for the following species crosses: (1) a L. kohalensis female and L. paranigra male (“ParKoh", 178 genotyped F2 hybrid offspring; previously reported in Shaw et al. 2007); (2) a L. kohlanesis female and a L. pruna male (“PruKoh", 193 genotyped F2 hybrid offspring); (3) a L. paranigra female and a L. kona male (“KonPar", 263 genotyped F2 hybrid offspring). These four species are part of a recently radiated clade showing conspicuous mating song divergence (Mendelson and Shaw 2005). Approximate geographic distributions of the species, phylogenetic relationships and parent collection localities are shown in Fig 1 and in Table S1. Crickets used in crosses were a combination of lab stock and outbred individuals (L. kohalensis [for ParKoh] and L. paranigra [for ParKoh and KonPar] were both lab reared for 3-15 generations; L. kohalensis [for PruKoh], L. pruna and L. kona were wild-caught). All parental and hybrid generations were reared in a temperature-controlled room (20°C) on Purina cricket chow and provided water ad libitum.
Genotyping
DNA was extracted from whole adults using the DNeasy Blood & Tissue Kits (Qiagen, Valencia, CA, USA). Genotype-by-Sequencing library preparation and sequencing were done in 2014 at the Genomic Diversity Facility at Cornell University following Elshire et al. (2011). The Pst I restriction enzyme was used for sequence digestion and DNA was sequenced on the Illumina HiSeq 2500 platform (Illumina Inc., USA).
Reads were trimmed and demultiplexed using Flexbar (Dodt et al. 2012) and then mapped to the L. kohalensis de novo draft genome using Bowtie2 (Langmead and Salzberg 2012) with default parameters. We then called SNPs using two different pipelines: The Genome Analysis Toolkit (GATK; DePristo et al. 2011; Van der Auwera et al. 2013) and FreeBayes (Garrison and Marth 2012). For GATK we used individual BAM files to generate gVCF files using ‘HaplotypeCaller’ followed by the joint genotyping step ‘GenotypeGVCF’. We then evaluated variation in SNP quality across all genotypes using custom R scripts to determine appropriate settings for hard filtering based on the following metrics (based on the recommendations for hard filtering section “Understanding and adapting the generic hard-filtering recommendations” at https://software.broadinstitute.org/gatk/ accessed on 28 February 2017): quality-by-depth, Phred-scaled P-value using Fisher’s Exact Test to detect strand bias, root mean square of the mapping quality of the reads, u-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities, u-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele. For FreeBayes we called variants from a merged BAM file using standard filters. After variant calling we filtered the SNPs using ‘vcffilter’, a Perl library part of the VCFtools package (Danecek et al. 2011) based on the following metrics: quality (> 30), depth of coverage (> 10), and strand bias for the alternative and reference alleles (SAP and SRP, both > 0.0001). Finally, the variant files from the GATK pipeline and the FreeBayes pipeline were filtered to only contain biallelic SNPs with less than 10% missing genotypes using VCFtools.
We retained two final variant sets: a high-confidence set including only SNPs with identical genotype calls between the two variant discovery pipelines and the full set of SNPs which included all variants called using FreeBayes but limited to positions that were shared among the GATK and FreeBayes pipelines.
Linkage mapping
The genotype information from the parental lines was used to assign ancestry to the SNP loci. The parents of the crosses were heterogeneously heterozygous and only ancestry informative loci were retained, i.e. all loci for which one or more of the parents was heterozygous were discarded. We were unable to obtain sequence data from the parents for PruKoh, but used sequence data from a single, non-parental L. pruna female, and three available L. kohalensis females, all from the same populations as the parents. Ancestry was inferred if all three L. kohalensis individuals were homozygous for one allele and the L. pruna individual was homozygous for the alternative allele. All other loci were discarded. The loci were then further filtered based on genotype similarity and segregation distortion (see below for details).
The linkage maps deriving from the three species crosses were generated independently and by taking a three-step approach, employing both the regression mapping and the maximum likelihood (ML) mapping functions in JoinMap 4.0 (van Ooijen 2006) as well as the three-point error-corrected ML mapping function in MapMaker 3.0 (Lander et al. 1987; Lincoln et al. 1993).
In the first step, we estimated “initial” maps that are relatively low resolution (5 cM) but with high marker order certainty. For initial maps, we first grouped (3.0 ≤ LOD ≤ 5.0) and then ordered the high-confidence markers that showed no segregation distortion (markers with χ2-square associated P-value for deviation from Mendelian inheritance < 0.05 were discarded) and for which no marker had more than 95% similarity in genotypes across individuals compared to other markers (otherwise, one of each pair was excluded). When excluding similar loci, we favored those marker loci shared among the three mapping populations over markers unique to one or two crosses. We then checked for concordance among the three mapping algorithms. In most cases, the maps were highly concordant (in ordering of the markers; with respect to cM among markers, distances differed depending on the algorithm, especially between the regression and ML methods in JoinMap). Discrepancies among the maps produced by the different algorithms for the same cross were resolved by optimizing the likelihood and total length of a given map as well as by using the information in JoinMap’s “Genotype Probabilities” and “Plausible Positions".
These initial maps were then filled out using MapMaker with marker loci passing slightly more lenient criteria: markers drawn from the full set of SNPs, with false discovery rate (Benjamini and Hochberg 1995) corrected P value for χ2-square test of deviation from Mendelian inheritance ≤ 0.05 and fewer than 99% of their genotypes in common with other markers loci. First, more informative markers (no missing genotypes, > 2.0 cM distance from other markers) were added satisfying a log-likelihood threshold of 4.0 for the positioning of the marker (i.e., assigned marker position is 10,000 times more likely than any other position in the map). Remaining markers were added at the same threshold, followed by a second round for all markers at a log-likelihood threshold of 3.0. We then used the ripple algorithm on 5-marker windows and explored alternative orders.
In the second step, “comprehensive” maps were obtained in MapMaker by sequentially adding markers from the full set of SNPs that met the more lenient criteria described above to the initial map. Markers were added if they satisfied a log-likelihood threshold of 2.0 for the marker positions, followed by a second round with a log-likelihood threshold of 1.0. We then used the ripple algorithm again on 5-marker windows and explored alternative orders. Typically, MapMaker successfully juxtaposes SNP markers from the same scaffold. However, in marker dense regions with low recombination rates, the likelihoods of alternative marker orders coalesce. In such regions, when multiple markers from the same genomic scaffold were interspersed by markers from a different scaffold, we repositioned the former markers by forcing them in the map together. If the log-likelihood of the map decreased by more than 3.0 (factor 1000), only one of the markers from that scaffold was used in the map. The comprehensive maps provide a balance between marker density and confidence in marker ordering and spacing.
The third step was to create “dense” maps. We added all remaining markers that were not yet incorporated in step two, first at a log-likelihood threshold of 0.5, followed by another round at a log-likelihood threshold of 0.1. We then used the ripple command as described above. The dense maps are useful for anchoring of scaffolds and for obtaining the highest possible resolution of variation in recombination rates, but with the caveat that there is some uncertainty in marker order. Uncertainty is expected to be higher towards the centers of the linkage groups where crossing over events between adjacent markers become substantially less frequent (see Results).
Comparative analyses
Based on the recent divergence times and high interbreeding successes, we predict a large degree of collinearity of the linkage maps. We note that interpretations must take into account the non-independence of the ParKoh and PruKoh/KonPar maps, as only comparing PruKoh and KonPar comprises a fully independent contrast. We first examined whether inversions or other chromosomal rearrangements were common (affecting linkage map lengths and marker orders) or whether maps were generally collinear by comparing among the initial and comprehensive linkage maps visually using map graphs from MapChart (Voorrips 2002). Inverted or transposed markers present in two or all maps can be detected by connecting “homologs” in MapChart (a homolog in this case means a scaffold that is represented in two or more maps). Then, we tested whether linkage maps are generally collinear across the species pairs quantitatively. We used Spearman’s rank order correlation (ρ) test to examine the strength of correlation between the order in shared markers (the homologs in MapChart). We calculated ρ and the corresponding P-value (the probability of observing the measured or stronger correlation given no true correlation exists) by using the cor.test() function in R (R Development Core Team 2016).
We then tested for genetic incompatibilities among the genomes of the four species, by measuring segregation distortion in sliding, 10 cM windows. Although we filtered out markers with very high levels of segregation distortion (using a 5% FDR cutoff) to purge markers with potential sequencing errors, groups of distorted markers in a single region of a linkage group represent genomic regions with biased parental allele contributions, suggesting genetic incompatibilities (or, less common, selfish alleles and other active segregation distorters). Because L. kohalensis and L. paranigra are more distantly related to each other (and, thus, allowing more time for genetic incompatibilities to accumulate) than they are to L. pruna and L. kona, respectively (Mendelson and Shaw 2005; see Fig 1.), we expected more regions with significant segregation distortion in the ParKoh map relative to the KonPar and PruKoh maps. We calculated genotype frequency and the negative 10-base logarithm of the P value for the χ 2-square test of deviation from Mendelian inheritance across the linkage groups in R using the R/qtl package (Broman et al. 2003). Windows with P < 0.01 were considered to have significant segregation distortion, und thus potentially reflecting genetic incompatibilities.
After establishing that the linkage maps were generally collinear (see Results), we merged the maps and examined patterns of variation in crossing over along the Laupala genome. Maps were consolidated using ALLMAPS (Tang et al. 2015). Then, we calculated species-specific average recombination rates for the linkage groups by dividing the total length of the linkage group (in cM) by the physical length of the pseudomolecule (in million bases, Mb) obtained by merging homologous linkage groups using ALLMAPS. Lastly, to evaluate recombination rate variation along the linkage groups, we fitted smoothing splines (with 10 degrees of freedom, based on the fit of the spline to the observed data) in R to describe the relationship between the consensus physical distance (as per the anchored scaffolds) and the genetic distance specific to each linkage map. Variation in the recombination rate was then assessed by taking the first derivative (i.e. the rate) of the fitted spline function. The estimated recombination rates are likely to be an overestimate of the true recombination rate, because unplaced/unordered parts of the assembly do not contribute to the physical length of the pseudomolecules but are reflected in the genetic distances obtained from crossing-over events in the recombining hybrids.
To test the hypothesis that linked trait and preference genes reside in low recombination regions, we integrated the AFLP map and song and preference QTL peaks identified in previous work on L. kohalensis and L. paranigra (Shaw and Lesnick 2009) with the current ParKoh SNP map and projected the QTL peaks onto the anchored genome. The SNPs used in the present study were obtained from the same mapping population (same individuals) as in the 2009 AFLP study. Therefore, we combined the high confidence SNPs described above (for the “initial” map) with the AFLP markers reported in (Shaw et al. 2007) that were of the same individuals as the SNP markers used in this study and created a new linkage map using the same stringent criteria as for the “initial” maps described above. We projected this map onto the anchored draft genome based on common markers (scaffolds). We then approximated the physical location of the QTL peaks by looking for SNP markers on scaffolds present in the draft genome flanking AFPL markers underneath the QTL peaks identified in the 2009 study.
DATA ACCESIBILITY
Supplementary files are available on FigShare. See section “supplementary materials” for details. Raw data (vcf files, linkage maps, pseudomolecule agp file), and R-scripts will be deposited on FigShare after final acceptance and are available upon request. The genome assembly and sequencing reads are available on NCBI’s GenBank under BioProject number PRJNA392944. The Genotype-by-sequencing reads will be made available in NCBI’s short read archive under BioProject number PRJNA429815
RESULTS
De novo genome assembly
The sequencing of the four libraries yielded 162.5 Gb of raw sequences (Table 1). After read processing, 145.5 Gb was used for the sequence assembly. We compared among assemblies resulting from different Kmer sizes (k = 31, 39, 47, 55, 63, 71, 79 and 87). Based on the N50/L50 and the total assembly size, the assembly produced with k = 87 was retained for the final draft genome. Despite a large number of scaffolds in the final assembly (149,424), the median length of the scaffolds was high and the total length of the assembly covers about 83% of the expected complete genome in Laupala.
Gene space coverage in the assembly was evaluated using the L. kohalensis cricket gene index (Danley et al. 2007) (release 2.0), RNASeq from Laupala cerasina (Blankers et al. 2018), and by performing a BUSCO search using eukaryotic and arthropod specific conserved genes. Respectively 95% and 92% of the Laupala gene index and RNAseq sequences mapped to the current genome. In addition, the BUSCO search indicated very few missing genes in either database (Table 1).
Collinear linkage maps
We obtained 815,109,126; 522,378,849; and 311,558,401 reads after demultiplexing for ParKoh, KonPar, and PruKoh, respectively. Average sequencing depth ± standard deviation across all individuals in the F2 mapping population after filtering, (before and) after marker selection based on segregation distortion and ancestry information for linkage mapping was (62.4 x ± 162.5) 52.2 x ± 31.4, (44.3 x ± 58.5) 38.1 x ± 23.8, and (56.1 x ± 105.7) 41.8 x ± 29.3, respectively.
In the initial maps, 158 (ParKoh), 170 (KonPar), and 138 (PruKoh) markers were grouped into eight linkage groups at a LOD score of 5.0, corresponding to the seven autosomes and one X-chromosome in Laupala. The corresponding marker spacing was 5.14, 4.85, and 7.33 cM. The comprehensive maps contained 526, 650, and 325 markers with an average marker spacing of 1.91, 1.37, and 3.25 cM and on the dense maps we placed 608, 823, and 383 markers with on average 1.69, 1.37, and 3.25 cM. between markers
The recent divergence times and the limited levels of post-zygotic isolation observed in this system led us to hypothesize that the linkage maps would show a high degree of collinearity. The visual comparison of marker positioning showed that the relative locations of shared scaffolds were similar across the linkage maps in both the initial and the comprehensive maps (Fig 2, Fig S1). However, we also observe substantial variation in the total genetic length of homologous linkage groups, indicating recombination rate variation (Fig 2, Fig S1). This variation may in part result from chromosomal rearrangements. However, we can only reliably detect rearrangements in our maps if they are not segregating within species and are fixed for alternative arrangements between L. pruna and L. kohalensis on the one side and L. paranigra and L. kona on the other side. In that specific scenario the inverted marker order is visible when contrasting the PruKoh and KonPar maps, while the ParKoh map would show reduced recombination in that area (Fig 1C). Despite the apparent variation in recombination rates among homologous linkage groups, Spearman’s rank correlation of pairwise linkage group comparisons was high (ρ varied between 0.91 and 1.00) and similar to values seen in comparisons of intraspecific linkage maps (e.g. Poursarebani et al. 2013); the quantitative measure of collinearity was largely consistent across linkage groups and across cross types (Table 2). Finally, merging the maps into a consensus pseudomolecule assembly allowed us to measure the error between individual maps and the merged assembly. Correlations between linkage maps and the pseudomolecule assembly were generally high (> 0.95), indicating substantial synteny (Fig S2).
Limited heterogeneity in segregation distortion
We expected genetic incompatibilities to be more likely to occur in the ParKoh cross than in the KonPar and PruKoh cross, because L. kohalensis and L. paranigra are more distantly related than any of the other species pairs (Fig 1). We tested this hypothesis by examining the degree of segregation distortion in markers within 10 cM sliding windows across the linkage maps. Overall, segregation distortion was limited and average genotype frequencies were close to Mendelian expectations (Fig 3). However, LG3 showed a bias against L. kohalensis homozygotes in the ParKoh cross but not in any of the other crosses. Additionally, there was significant variation in the frequency of heterozygotes across the linkage groups (linear model Freq[heterozygotes]∼LG x cross: R2 = 0.21, F20,1547 = 20.7, P < 0.0001). Post-hoc Tukey Honest Significant Differences revealed that linkage group 7 had the lowest abundance of heterozygotes overall and within each of the intercrosses and that levels of heterozygosity on LG 7 were similar across the maps (Table S2). Together, these results show that from some LGs and in some crosses, certain genotype combinations were less common than expected, potentially as a result for genetic incompatibilities or meiotic drive.
Variable recombination rates across the genome
We anchored a total of 1054 scaffolds covering 720 million base pairs, a little below half the current genome assembly (see Table S3 for scaffold number, N50, and assembly size per LG and Fig S3 for coverage variation across the linkage groups). This gives us enough power to make inferences about broad-scale recombination rate variation, but not about the existence of small-scale recombination hotspots. Average recombination rates (cM/Mb) varied from between 0.75 (KonPar) and 0.93 (ParKoh) on the X chromosome to between 3.12 (KonPar) and 4.24 (PruKoh) on LG6 (Table 3). We note that the recombination rate for LG6 might be artificially inflated because of lower assembly quality (expressed as N50) of this LG relative to the other LGs in all linkage maps and in the pseudochromosomes (Table S3). Both including and excluding the sex chromosome, there is a significant linear relationship between chromosome size and genetic length (linear mixed effect model with cross as random variable; With X: ß = 0.62, F1,23 = 14.95, P = 0.0008; without X: ß = 0.69, F1,20 = 29.7, P < 0.0001) and between chromosome size and broad-scale recombination rate (with X: ß = −34.1, F1,23 = 29.1, P < 0.0001).; without X: ß = - 24.0, F1,23 = 63.7, P < 0.0001).
Most linkage groups showed wide regions of strongly reduced recombination rates in the center of the linkage groups (Fig. 4). The general pattern of peripheral peaks in recombination rates juxtaposing large recombination “desserts” was similar among the three intercrosses, but some additional cross-specific peaks in recombination rates were observed on almost all linkage groups (Fig 4).
Trait-preference co-evolution despite high recombination
Contrary to our expectation, the approximate location of the colocalizing song and preference QTL peak from (Shaw and Lesnick 2009) was associated with average recombination rates in the ParKoh and KonPar map and low recombination rates in the PruKoh map (Fig 4; Table S4). However, most other QTL peaks are located in regions of low recombination (Fig 4; Table S4).
DISCUSSION
The evolutionary trajectory of diverging populations and the likelihood of speciation can be heavily influenced by recombination. Within species, recombination can create favorable combinations of alleles or decouple deleterious from beneficial alleles. Among species, regions with low recombination can provide a genetic shield against introgression of maladaptive loci (Noor et al. 2001; Rieseberg 2001; Butlin 2005; Slatkin 2008; Noor and Bennett 2009; Cutter and Payseur 2013; Ortiz-Barrientos et al. 2016). Understanding recombination is thus critical to understanding adaptation and speciation. Recombination also has important implications for the analysis of genotype-phenotype relationships (Mackay 2001), demographic inference (Li and Durbin 2011), and analyses of genomic variation (Cutter and Payseur 2013; Wolf and Ellegren 2016). However, we still have limited insight into the patterns of recombination rate variation among species and across genomes, in particular for radiations powered largely by behavioral isolation.
Here, we study four species of sexually divergent Hawaiian swordtail crickets and generate the first pseudomolecule-level assembly for Orthoptera and the first published genome assembly for crickets, an important model system in neurobiology, behavioral ecology, and evolutionary genetics (Horch et al. 2017). Below, we discuss how our results provide insight into the potential for structural variation (linkage map collinearity) and genetic incompatibilities to drive reproductive isolation among closely related Laupala species. We also elaborate on the patterns of variation in recombination rates across the genome. We then discuss the surprising finding that colocalizing male song and female preference QTL did not fall in a region with particularly low recombination. This is important because it challenges the hypothesis that co-evolution of traits and preferences is facilitated by locally reduced recombination between recently diverged populations.
Collinearity of Genetic Maps
Based on the recent divergence (Mendelson and Shaw 2005) and strong premating isolation of Laupala species in the absence of conspicuous morphological and ecological differences (Otte 1994; Shaw 1996; Mendelson and Shaw 2005), we expected limited variation in chromosome structure and few signatures of genetic incompatibility between the species. In line with these expectations, we found that linkage groups are collinear across interspecies crosses (Fig 2). This was true both for comparisons of non-independent species pairs (between ParKoh and the other two crosses), as well as for the independent contrast of the PruKoh map versus the KonPar map. We saw some instances where markers occupied regions that may have been translocated or inverted. However, these instances were rare (Fig 2, Fig S1) and recombination rates were similar among homologous linkage groups across the hybrid families (Fig 4). Moreover, quantitative measures of correlation (Spearman’s rank correlation among maps, Pearson’s correlation coefficient between maps and the pseudomolecule assembly) as well as limited segregation distortion (but see discussion of one exception below) both supported the collinearity hypothesis.
Variation in the organization and structure of chromosomes can contribute to postzygotic reproductive isolation after speciation as well as to the speciation process directly (Noor et al. 2001; Rieseberg 2001). We conclude that at least for the Laupala group that radiated on the Big Island of Hawaii in the last 500,000 years, structural rearrangements have not played a major role in the evolution of reproductive isolation. This is because, similar to two hybridizing Heliconius species (Davey et al. 2017), we observed that chromosome-wide recombination rates are relatively conserved and large chromosomal rearrangements are absent. We hypothesize that for Laupala on the Big Island premating isolation combined with (partial) geographic separation (i.e. low migration rates) provides a sufficiently strong barrier to gene flow between sister species. Indeed, as has been shown in recent models of the role of inversions in speciation, genomic rearrangements can only invade and spread in diverging populations if levels of gene flow and the contribution of structural variation to isolation (by linking adaptive alleles or incompatibilities) is high relative to the strength of assortative mating (Feder et al. 2014; Dagilis and Kirkpatrick 2016).
We acknowledge that the power to detect rearrangements and changes in recombination rates is limited by the resolution of our maps. The average spacing of markers is between 1.37 and 3.25 cM. Thus, the upper limit of the magnitude of intervals within which we can detect rearrangements is on the order of 105 and 106 bp. Due to constraints on the sample size and sequencing strategy, it is thus difficult to attribute subtle variation in marker order and genetic distance between the maps to genomic rearrangements versus mapping errors and sampling variance. Closely related organisms typically show conserved recombination rates within 500 kb intervals; more heterogeneity might be revealed at higher resolution (Stevison et al. 2017).
Genetic incompatibilities
We expected genetic incompatibilities to be more likely between genomes of more distantly related species. Accordingly, we discovered a single region covering approximately half of linkage group 3 with high segregation distortion in ParKoh; we found no such deviations in the other two crosses (Fig 3). Inspection of genotype frequencies indicated that there were fewer individuals than expected that were homozygous for L. kohalensis alleles for loci in this region. In a controlled cross, segregation distortion can be caused by prezygotic effects such as meiotic drive of selfish genetic elements and distorter genes (e.g. like sd in Drosophila melanogaster (Larracuente and Presgraves 2012)), and by postzygotic genetic incompatibilities (Dobzhansky 1937; Muller 1942; Burt and Trivers 2006; Presgraves 2010; Hallmann et al. 2017). Genotypic errors may produce superficially similar patterns but are unlikely to distort segregation over large genomic regions and with consistent bias towards the same genotypes. Although meiotic drive is a possible alternative to genetic incompatibilities, we do not see the same effect in the other cross involving L. paranigra, where selfish genetic elements or segregation distorters ought to have a similar effect. Overall, the large region on linkage group 3 reveals a potential local post-mating barrier to gene flow that could contribute to strengthening existing prezygotic barriers in secondary contact zones or following episodes of migration.
Recombination landscape
Chromosomal rearrangements influence genomic divergence by locally altering recombination rates within and among species. Felsenstein (1974, 1981) illuminated the role of intraspecific recombination in purging deleterious alleles and the role of interspecific recombination in decoupling co-adapted alleles. In recent years, the role of recombination and its interaction with divergent selection and adaptation on genomic scales have received considerable attention (e.g. Yeaman and Whitlock 2011; Feder et al. 2012; Samuk et al. 2017) and technological advances are shifting focus towards characterizing the recombination landscape across species (Butlin 2005; Slatkin 2008; Noor and Bennett 2009; Barb et al. 2014; Burri et al. 2015).
Here, we show that there is limited variation in recombination rates across the maps of three interspecific crosses (Fig 4), but strong heterogeneity in recombination rates across the genome. Genome-wide average interspecific recombination rate varied between 1.3 and 1.5 cM/Mb (Table 3), similar to intraspecific rates observed in dipterans and substantially lower than social hymenopterans and lepidopterans (Wilfert et al. 2007). We note that our estimates are derived from interspecific maps, which may lead to somewhat lower estimates compared to intraspecific maps (e.g. Beukeboom et al. 2010), where genetic incompatibilities and rearrangements may reduce rates of crossing over; however, differences between intra and interspecific recombination might be negligible if rearrangements are rare (e.g., Davey et al. 2017). Moreover, we anchored about 50% of the nucleotides in the draft assembly to linkage groups, and there remain many scaffolds not mapped to a genomic position. These ‘missing’ scaffolds are expected to add to the physical length of the chromosomes more so than to the genetic length of the chromosomes, thus lowering the recombination rate. However, our study emphasized relative patterns of recombination, which should not be affected by our sampling. And while we can only approximate intraspecific recombination rates at this point, we note that recent divergence of the species involved and collinearity of the linkage maps support conservation of recombination landscapes across intraspecific and interspecific comparisons.
Interestingly, for all three species pairs we document high variability in interspecific recombination across genomic regions. We found large regions of low recombination in all three maps, with recombination rates well below 1 cM/Mb and occasionally approaching zero, flanked by steep inclines reaching rates up to 6 cM/Mb (Fig 4). This pattern is consistent with earlier findings in plants (Anderson et al. 2003), invertebrates (Rockman and Kruglyak 2009; Niehuis et al. 2010), and vertebrates (Backström et al. 2010; Roesti et al. 2013; Singhal et al. 2015), but differs from observations in, for example, Drosophila (Kulathinal et al. 2008) and humans (Myers et al. 2005), that show heterogeneity in recombination rates, but not necessarily much higher rates on the periphery of the chromosomes. Commonly invoked drivers of local recombination suppression, such as selection against recombination due to negative epistasis or the maintenance of linkage disequilibrium between mutually beneficial alleles (Smukowski and Noor 2011; Stevison et al. 2011; Smukowski Heil et al. 2015; Ortiz-Barrientos et al. 2016), are not likely to leave chromosome wide signatures. Rather, the observed pattern is more likely attributable to structural properties of chromosomes, such as the location of the centromere and heterochromatin-rich regions (Copenhaver et al. 1999; Haupt et al. 2001). Roesti et al. (2013) observe similar recombination landscapes in stickleback and suggest it might be due to peripheral clustering during meiosis prophase I to facilitate homolog pairing (Harper et al. 2004; Brown et al. 2005). Regardless of the mechanism, the observed genomic architecture will drive substantial heterogeneity in the propensity of favorable and/or maladaptive alleles to come together, break apart, and introgress in heterospecific backgrounds.
Trait-Preference Co-evolution
On way in which recombination heterogeneity may be important in the study system is in facilitating trait-preference co-evolution. If trait and preference genes are coupled through physical linkage (Kirkpatrick and Hall 2004), linkage can be stronger and span wider physical distances in regions with reduced recombination. We hypothesized that recombination facilitates linkage between trait and preference genes in Laupala because a previous study showed that a major song QTL (∼ 9% of the parental difference in male song) co-localizes with a preference QTL (∼14% of parental difference for female preference) in a cross between L. kohalensis and L. paranigra (Shaw and Lesnick 2009). Contrary to our expectation, we show that the co-localizing QTL fall in a region with intermediate to high recombination rates (> 2.0 cM) compared to chromosomal averages (typically 1 - 2 cM). This suggests that reduced recombination over larger physical distances is unlikely to be driving trait-preference co-evolution in this system. Importantly, a high speciation rate and wide-spread divergence in sexual signaling phenotypes suggest a primary role for trait-preference co-evolution in Laupala speciation (Mendelson and Shaw 2005; Shaw et al. 2011). Additionally, although these species have likely diverged in allopatry (Mendelson and Shaw 2005), some level of interspecific gene exchange is likely given historical biogeography, widespread secondary contact and evidence derived from discordant nuclear and mitochondrial gene trees (Shaw 2002).
How then is linkage disequilibrium between traits and preferences maintained? First, QTL may co-localize due to very tight physical linkage or pleiotropy instead of looser linkage. Under these two mechanisms, a lack of physical space for crossing over to occur rather than low recombination rates maintains linkage disequilibrium. Linkage disequilibrium might also persist in the face of recombination if strong assortative mating results from female mate preference. In this case, genetic correlations between the sexes will evolve, coupling signal and preference independent of their genetic distance (Fisher 1930; Lande 1981). Recent simulation studies showed that the probability with which recombination rate modifiers that link co-adaptive alleles spread in a populations is lower when assortative mating is strong, recombination between loci is low, and selection on the loci themselves is strong (Feder et al. 2014; Dagilis and Kirkpatrick 2016). Third, the current test involves only a single locus and additional tests are required to more robustly examine the relationship between recombination and trait-preference co-evolution. We observed that several known male song QTL on other linkage groups fall in regions of low recombination. Additional female preference QTL covary with these song QTL as well (Wiley et al. 2012) although precise map locations are not yet known.
In summary, we find limited variation in chromosome structure among species, but strong heterogeneity in the recombination landscape across the genome. We present a de novo genome assembly and anchor a substantial part of the L. kohalensis genome to pseudomolecules. Crickets are an important model system for evolutionary and neurobiological research (Horch et al. 2017). but limited genomic resources are available. The first Orthopteran pseudomolecule-level draft genome and recombination rate map are thus important new contributions to future speciation genomics research. This study further provides important insight into the extent to which structural variation and genetic incompatibilities contribute to isolation among closely related, sexually divergent species. We also shine light on the role of recombination in trait-preference co-evolution and argue that current evidence supports that, at least in Laupala, the evolution of behavioral isolation is not contingent on structural genomic variation and locally reduced recombination.
ACKNOWLEDGEMENTS
We thank Stephen Chenoweth and two anonymous reviewers for helpful comments that strongly improved the quality of this manuscript. We further thank the Shaw lab, in particular Mingzi Xu, as well as Michael Sheehan and other members from Cornell’s Neurobiology and Behavior department for input that contributed to the interpretation of the findings. This work was supported by the National Science Foundation (DEB 1241060, IOS 1257682 and IOS 0843528).