ABSTRACT
Speciation depends on the (local) reduction of recombination between the genomes of partially isolated, diverging populations. Chromosomal rearrangements and generic molecular mechanisms in the genome affect the recombination rate thereby constraining the efficacy of (linked) selection. This can have profound impacts on trait divergence, in particular on complex phenotypes associated with speciation by sexual selection. Because of the obligate co-evolution of traits and preferences in sexual communication signals, it may be expected that these co-adapted gene complexes reside in regions of low recombination, because of the increased potential for linked selection. Here we test this hypothesis in Laupala, a genus of crickets distributed across the Hawaiian archipelago that underwent recent and rapid speciation. We generate three dense linkage maps from interspecies crosses and use the linkage information to anchor a substantial portion of a de novo genome assembly to chromosomes. Local recombination rates were then estimated as a function of the genetic and physical distance of anchored markers. These data provide important genomic resources for Orthopteran model systems in molecular biology. In line with expectations based on the species’ recent divergence and successful interbreeding in the lab, the linkage maps are highly collinear and show no evidence for large scale chromosomal rearrangements. Contrary to our expectations, a genomic region where a male song QTL peak co-localizes with a female preference QTL peak was not associated with particularly low recombination rates. This study shows that trait-preference co-evolution in sexual selection is not necessarily constrained by local recombination rates.
INTRODUCTION
Speciation is a complex process contingent on the accumulation of genomic divergence over time. Genomes diverge under the influence of selection and drift, while gene flow and recombination counteract this divergence by homogenizing the genome (Kirkpatrick and Ravigne 2002; Gavrilets 2003). The efficacy of these processes as well as the signatures they leave on the genome are expected to be constrained not only by the genetic architecture of traits under selection but also by structural properties of the chromosomes.
Chromosomal rearrangements, transpositions, and the location of GC-rich regions, heterochromatin, or centromeres shape the genomic background and generate local variation in recombination rates (Fullerton et al. 2001; Nachman 2002; Noor and Bennett 2009; Ortiz-Barrientos et al. 2016). The resulting recombination landscape in turn affects local levels of linkage disequilibrium (Felsenstein 1974, 1981) and, as such, the potential for linked selection and genetic hitchhiking (Smith and Haigh 1974; Gillespie 2000; Cutter et al. 2014; Wolf and Ellegren 2016). The recombination landscape also affects the genomic signatures of positive selection on advantageous alleles we aim to detect in genomic scans (Roesti et al. 2013; Cruickshank and Hahn 2014; Burri et al. 2015). However, the role of variation in chromosome structure and recombination across the genome and among species during speciation remains understudied (Wolf and Ellegren 2016), constraining our ability to understand the patterns of genomic divergence associated with natural and sexual selection (Slatkin 2008; Noor and Bennett 2009; Wolf et al. 2010; Wolf and Ellegren 2016).
Genetic mapping of genome-scale variation can provide powerful insights into the genomic architecture among closely related species. The earliest genetic maps were used to identify inversions (Sturtevant 1921). Considerably more common than originally thought (Kirkpatrick 2010), inversions can effectively halt gene flow locally in the genome by suppressing recombination and can thus have dramatic effects on genomic divergence and speciation (Sturtevant 1921; Noor et al. 2001; Rieseberg 2001; Kirkpatrick 2010). Inversions have been found to influence the divergence between species (Kulathinal et al. 2008; Barb et al. 2014) as well as between different sexual morphs within species (Küpper et al. 2015; Lamichhaney et al. 2015). Other chromosomal rearrangements, such as genetic transpositions or translocations, can likewise contribute to ‘chromosomal speciation’ (Rieseberg 2001) as well as preventing gene flow and furthering genetic divergence among heterospecifics.
Marker orders in linkage maps constitute important preliminary evidence for inversions and other chromosomal rearrangements. In addition, deviation in allele frequencies from 1:2:1 ratios, i.e. segregation distortion, can reveal meiotic drive and local genetic incompatibilities. It is common practice to filter out strongly distorted loci in the mapping process as they may be indicative of sequencing errors. However, closely linked groups of markers with intermediate levels of segregation distortion can result from meiotic drive associated with selfish genetic elements and other active segregation distorters as well as from the incompatibility of genomic material from one parent into the background of another parent (Burt and Trivers 2006; Presgraves 2010). When sufficiently dense, genetic maps can also be used to anchor and order scaffolds of genome assemblies (Fierst 2015). The combination of classical (genetic mapping) and next-gen (high throughput sequencing) approaches offers the opportunity to estimate recombination rate variation along the genome by observing local variation in the relationship between physical distance (usually measured in megabases, Mb) and genetic distance (measured in centimorgans, cM) at various scales and across sexes, species, and populations (Kong et al. 2010; Smukowski and Noor 2011).
There is substantial variation in recombination rates among taxa (Wilfert et al. 2007; Smukowski and Noor 2011), with social hymenopterans showing the highest rates in insects (16.1 cM/Mb for Apis melifera) and dipterans among the lowest (0.1 cM/Mb for the mosquito Armigeres subalbatus); however, just within the genus Drosophila recombination rates can vary from 0.3 cM/Mb to 3.1 cM/Mb (Wilfert et al. 2007) and 50 fold differences have been observed within single chromosomes of humans and birds (Myers et al. 2005; Singhal et al. 2015). Variation in recombination is intimately associated with adaptation (Hill and Robertson 1966; Felsenstein 1974), sex chromosome evolution (Wilson and Makova 2009), and genomic heterogeneity in genetic variation (Smith and Haigh 1974; Begun and Aquadro 1992; Nachman 2002; Cutter et al. 2014). In addition, the role of recombination in generating or eroding co-adapted gene complexes, and thereby promoting or discouraging speciation, are well known in the context of sympatric speciation (Felsenstein 1981) and reinforcement (Stevison et al. 2011). Therefore, characterizing the recombination landscape of closely related species is fundamental to understanding how genome architecture and evolution contributes to the mechanics of speciation
Here, we generate and compare three marker-dense interspecific linkage maps and infer genome-wide and local recombination rates using four closely related species of Hawaiian sword-tail crickets of the genus Laupala. The genus Laupala is one of the fastest speciating taxa known to date (Mendelson and Shaw 2005) and the product of a very recent evolutionary radiation, with each of the 38 species endemic to a single island of the Hawaiian archipelago (Otte 1994; Shaw 2000a); moreover, the four focal species in this study are each endemic to Hawaii Island, the youngest island in the chain. Evidence suggests that speciation by sexual selection on the acoustic communication system has driven this rapid diversification, as both male mating song and female acoustic preferences have diverged extensively among Laupala species (Otte 1994; Shaw 2000b; Mendelson and Shaw 2002). Sexual trait evolution strongly contributes to the onset and maintenance of reproductive isolation (Mendelson and Shaw 2002; Grace and Shaw 2011). As a consequence, substantial co-evolution of male song and female preference exists among species and among populations within species (Grace and Shaw 2011). Although the mechanisms of trait-preference co-evolution require further study, there is evidence that genetic loci controlling quantitative variation in traits and preferences are physically linked in the genome (Shaw and Lesnick 2009; Wiley et al. 2012).
A handful of other speciation model system have suggested a critical role for genome architecture in promoting speciation (Ortiz-Barrientos et al. 2016; Wolf and Ellegren 2016) but to date these have not included systems characterized by sexual selection. We have a limited basis to expect that genetic co-evolution of the sexual traits may be accompanied by chromosomal rearrangements (e.g. Küpper et al. 2015; Lamichhaney et al. 2015). In addition, theoretical models of speciation by sexual selection imply strong genetic linkage (linkage disequilibrium) of trait and preference genes (Fisher 1930; Lande 1981; Kirkpatrick 1982). The colocalization of a moderate effect quantitative trait locus (QTL) of the male song rhythm and a corresponding female preference QTL (Shaw and Lesnick 2009) thus provides a unique opportunity to test whether genetic linkage of these coevolving traits has been facilitated by chromosomal rearrangements or locally reduced recombination rates.
To characterize the genomic architecture and estimate rates of recombination in Laupala we first produce a de novo L. kohalensis draft genome and obtain thousands of SNP markers for hybrid offspring from interspecific crosses. We then generate three medium to high density genetic (linkage) maps and compare these maps to examine collinearity, chromosomal rearrangements, and genetic incompatibilities across the genome. There is some variation in the level of overall differentiation in the species pairs studied here, but all lineages are young (approximately 0.5 million years or less, Fig 1). We thus expect limited chromosomal differentiation. We also expect inversions and levels of genetic incompatibility to be more pronounced in the crosses of the more distantly related species. We then anchor our assembled scaffolds to chromosomes, and infer recombination rates across the chromosomes. Using an additional map that integrates the AFLP markers from previous QTL studies in L. kohalensis and L. paranigra, we identify the location of known male song QTL, including one co-localizing with a female acoustic preference QTL, to the chromosomes. We examine local variation in recombination rates across the genome and in relation to the location of the genetic architecture of song and preference divergence. This study provides important insight into the role of the genomic architecture in the divergence of closely related, sexually divergent species as well as pivotal resources for future studies.
MATERIAL & METHODS
De novo genome assembly
The Laupala kohalensis draft genome was sequenced using the Illumina HiSeq2500 platform. DNA was isolated with the DNeasy Blood & Tissue Kits (Qiagen Inc., Valencia, CA, USA) from six immature female crickets (c. five months of age) chosen randomly from a laboratory stock population (approximately lab generation=14). Females were chosen to balance DNA content of sex chromosomes to autosomes (female crickets are XX; male crickets are XO). DNA was subsequently pooled for sequencing. Four different libraries were created: a paired-end library with an estimated insert size of 200 bp (sequenced by Cornell Biotechnology Resource Center), a paired-end library with an estimated insert size of 500 bp, and two mate-pair libraries with insert sizes of 2 and 5 Kb (sequenced by Cornell Weill College Genomics Resources Core Facility).
Reads were processed using Fastq-mcf from the Ea-Utils package (Aronesty 2011) with the parameters -q 30 (trim nucleotides from the extremes of the read with qscore below 30) and l 50 (discard reads with lengths below 50 bp). Read duplications were removed using PrinSeq (Schmieder and Edwards 2011) and reads were corrected using Musket with the default parameters (Liu et al. 2013).
Reads were assembled using SoapDeNovo2 (Luo et al. 2012). The reads were assembled using different Kmers (k = 31, 39, 47, 55, 63, 71, 79 and 87). The 87-mer assembly produced the best assembly (based on N50/L50, assembly size, and number of scaffolds). Scaffolds and contigs were renamed using an in-house Perl script. Gaps were filled using GapCloser from the SoapDeNovo2 package.
The gene space covered by the assembly were evaluated using three different approaches. (1) Laupala kohalensis unigenes produced by the Gene Index initiative (Cricket release 2.0: http://compbio.dfci.harvard.edu/cgi-bin/tgi/gimain.pl?gudb=cricket) were mapped using Blat (Kent 2002). Only unigenes mapping with 90% or more of their length were considered; (2) RNASeq from a congeneric species, L. cerasina was mapped using Tophat2 (Kim et al. 2013). Reads were processed using the same methodology that was described above, but using a minimum length of 30 bp; (3) using BUSCO (Simão et al. 2015) search results for conserved eukaryotic and arthropod genes.
Samples
We generated three F2 interspecies hybrid families to estimate genetic maps. F1 males and females from one parental pair per hybrid cross type were intercrossed to generate an F2 mapping population, including the following: (1) a L. kohalensis female and L. paranigra male (“ParKoh”, 178 genotyped F2 hybrid offspring; previously reported in Shaw et al. 2007); (2) a L. kohlanesis female and a L. pruna male (“PruKoh”, 193 genotyped F2 hybrid offspring); a L. paranigra female and a L. kona male (“ParKon”, 263 genotyped F2 hybrid offspring). Crickets used in crosses were a combination of lab stock and wild-caught juveniles from their respective species and reared to adulthood in a temperature-controlled room (20°C) with access to Purina cricket chow and water ad libitum. Population and geographical locations from which individuals were sampled are shown in Fig 1 and in Table S1.
We lost the parents for PruKoh, so we used sequence data from a single, non-parental L. pruna individual, and three available L. kohalensis individuals, each from the same populations as the parents, to select segregation patterns and establish informativeness of the markers (resulting in a smaller number of available markers compared to the other crosses, see Results). These ‘surrogate parental samples’ were included in the genotyping pipeline described below instead of the actual parents to this cross.
Genotyping
DNA was extracted from whole adults using the DNeasy Blood & Tissue Kits (Qiagen, Valencia, CA, USA). Genotype-by-Sequencing library preparation and sequencing were done in 2014 at the Genomic Diversity Facility at Cornell University following (Elshire et al. 2011). The ApeKI methylation sensitive restriction enzyme was used for sequence digestion and DNA was sequenced on the Illumina HiSeq 2500 platform (Illumina Inc., USA).
Reads were trimmed and demultiplexed using Flexbar (Dodt et al. 2012) and then mapped to the L. kohalensis de novo draft genome using Bowtie2 (Langmead and Salzberg 2012) with default parameters. We then called SNPs using two different pipelines: The Genome Analysis Toolkit (GATK; DePristo et al. 2011; Van der Auwera et al. 2013) and FreeBayes (Garrison and Marth 2012). For GATK we used individual BAM files to generate gVCF files using ‘HaplotypeCaller’ followed by the joint genotyping step ‘GenotypeGVCF’. We then evaluated variation in SNP quality across all genotypes using custom R scripts to determine appropriate settings for hard filtering based on the following metrics (based on the recommendations for hard filtering section “Understanding and adapting the generic hard-filtering recommendations” at https://software.broadinstitute.org/gatk/ accessed on 28 February 2017): quality-by-depth, Phred-scaled P-value using Fisher’s Exact Test to detect strand bias, root mean square of the mapping quality of the reads, u-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities, u-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele. For FreeBayes we called variants from a merged BAM file using standard filters. After variant calling we filtered the SNPs using ‘vcffilter’, a Perl library part of the VCFtools package (Danecek et al. 2011) based on the following metrics: quality (> 30), depth of coverage (> 10), and strand bias for the alternative and reference alleles (SAP and SRP, both > 0.0001). Finally, the variant files from the GATK pipeline and the FreeBayes pipeline were filtered to only contain biallelic SNPs with less than 10% missing genotypes using VCFtools.
We retained two final variant sets: a ‘high-confidence’ set including only SNPs with identical genotype calls between the two variant discovery pipelines and a “full” set of SNPs which included all variants called using FreeBayes but limited to positions that were shared among the GATK and FreeBayes pipelines.
Linkage mapping
The linkage maps deriving from the three species crosses were generated independently and by taking a three-step approach, employing both the regression mapping and the maximum likelihood (ML) mapping functions in JoinMap 4.0 (van Ooijen 2006) as well as the three-point error-corrected ML mapping function in MapMaker 3.0 (Lander et al. 1987; Lincoln et al. 1993).
In the first step, we estimated “initial” maps that are relatively low (5 centimorgan, cM) resolution but with high marker order certainty. For initial maps, we first grouped (3.0 ≤ LOD ≤ 5.0) and then ordered a subset of evenly, well-spaced, highly informative (see above) markers with no segregation distortion (χ2- square associated P-value ≥ 0.05) and fewer than 95% of their genotypes in common. We then checked for concordance among the three mapping algorithms. In most cases, the maps were highly concordant (in ordering of the markers; with respect to centimorgan (cM) among markers, distances differed depending on the algorithm, especially between the regression and ML methods in JoinMap). Discrepancies among the maps produced by the different algorithms for the same cross were resolved by optimizing the likelihood and total length of a given map as well as by using the information in JoinMap’s “Genotype Probabilities” and “Plausible Positions”.
These initial maps were then filled out using MapMaker with marker loci passing more lenient criteria (drawn from the full set of SNPs, with segregation distortion χ2- square associated q-value ≤ 0.05, i.e. a 5% false discovery rate [Benjamini & Hochberg, 1995] and fewer than 99% of their genotypes in common with other markers loci (i.e. excluding marker loci with identical genotypes for all individuals), favoring those marker loci shared among the three mapping populations. First, more informative markers (no missing genotypes, > 2.0 cM distance from other markers) were added satisfying a log-likelihood threshold of 4.0 for the positioning of the marker (i.e., assigned marker position is 10,000 times more likely than any other position in the map). Remaining markers were added at the same threshold, followed by a second round for all markers at a log-likelihood threshold of 3.0.
In the second step, “comprehensive” maps were obtained in MapMaker by sequentially adding markers (meeting the lenient criteria described above) to the initial map, satisfying a log-likelihood threshold of 2.0 for the marker positions, followed by a second round with log-likelihood threshold equaling 1.0. We then used the ripple algorithm on 5-marker windows and explored alternative orders. Typically, MapMaker successfully juxtaposes SNP markers from the same scaffold. However, in marker dense regions with low recombination rates, the likelihoods of alternative marker orders coalesce. In such regions, when multiple markers from the same genomic scaffold were interspersed by markers from a different scaffold, we repositioned the former markers by forcing them in the map together. If the log-likelihood of the map decreased by more than 3.0 (factor 1000), only one of the markers from that scaffold was used in the map. The comprehensive maps provide a balance between marker density and confidence in marker ordering and spacing.
The third step was to create “dense” maps. We added all remaining markers that were not yet incorporated in step two, first at a log-likelihood threshold of 0.5, followed by another round at a log-likelihood threshold of 0.1. We then used the ripple command as described above. The dense maps are useful for anchoring of scaffolds and for obtaining the highest possible resolution of variation in recombination rates, but with the caveat that there is some uncertainty in marker order. Uncertainty is expected to be higher towards the centers of the linkage groups where crossing over events between adjacent markers become substantially less frequent (see Results).
Comparative analyses
Based on the recent divergence times and high interbreeding successes, we anticipated few chromosomal rearrangements and, accordingly, a large degree of collinearity of the linkage maps. We first searched for evidence for inversions or other chromosomal rearrangements by comparing marker ordering among the initial and comprehensive linkage maps visually using map graphs from MapChart (Voorrips 2002). Inverted or transposed markers present in two or all maps can be detected by connecting “homologs” in MapChart. Then, we tested whether marker order and marker spacing (the genetic distance between pairs of markers) were conserved across the maps using linear models. If the marker order and genetic distance among markers is similar in a pair of linkage maps, a linear relationship with regression slope equal to unity is expected between marker positions in the two maps. We regressed the marker positions in the ParKon and PruKoh maps, respectively, against the marker positions in the ParKoh map. Deviations from linearity (reflected by R2 < 1.00 of the model) indicate that relative marker positions within the maps differs. The slope of the fitted regression curve indicates whether marker distances increase at higher rates in the PruKoh or ParKon map relative to the shared markers in the ParKoh map. If the slope of the regression curve equals 1, the markers are spaced out equally in the two maps and deviations from 1 indicate that markers are closer together or further apart in one map relative to the other. The null hypothesis of collinear maps thus predicts R2 ≈ 1.00 and unit slope in both comparisons.
Genetic incompatibilities are expected to increase with increasing differentiation of the genome and the chromosomal architecture. In a linkage map, local genetic incompatibility is indicated by an excess of one of the parental alleles or a deficiency in heterozygotes in groups of adjacent markers. Because the L. kohalensis and L. paranigra are more distantly related to each other than they are to L. pruna and L. kona, respectively (Mendelson and Shaw 2005; see Fig 1.), we expected more regions with significant segregation distortion in the ParKoh map relative to the ParKon and PruKoh maps. We checked for regions of elevated segregation distortion across the linkage groups in R (R Development Core Team 2016) using the R/qtl package (Broman et al. 2003). Although we filtered out markers with very high levels of segregation distortion (using a 5% FDR cutoff), markers with medium to high levels of segregation distortion were still present. Grouped markers in a given region of a linkage group with medium to high levels of segregation distortion represent genomic regions with biased parental allele contributions, suggesting genetic incompatibilities (or, less common, selfish alleles and other active segregation distorters).
The chromosomal rearrangements and genetic incompatibilities as well as several other properties of genomes result in variation in recombination rates across and among chromosomes, constraining genetic divergence. To examine the patterns of variation in crossing over in the Laupala genome we first merged the three maps using ALLMAPS (Tang et al. 2015). Then, we calculated species-specific average recombination rates for the linkage groups by dividing the total length of the linkage group (in cM) by the physical length of the chromosome obtained using ALLMAPS (in million bases, Mb). Lastly, to evaluate recombination rate variation along the linkage groups, we fitted smoothing splines (with 10 degrees of freedom) in R to obtain functions that describe the relationship between the consensus physical distance (as per the anchored scaffolds) and the genetic distance specific to each linkage map. Variation in the recombination rate was then assessed by taking the first derivative of the fitted spline function. The estimated recombination rates are likely to be an overestimate of the true recombination rate, because unplaced/unordered parts of the assembly do not contribute to the physical length of the chromosomes but are reflected in the genetic distances obtained from crossing-over events in the recombining hybrids.
To test the hypothesis that linked trait and preference genes reside in low recombination regions to facilitate linkage, we integrated the AFLP map and song and preference QTL peaks identified in previous work (Shaw and Lesnick 2009) with the current ParKoh SNP map and projected the QTL peaks onto the anchored genome. The SNPs used in the present study were obtained from the same mapping population (same individuals) as in the 2009 AFLP study. Therefore, we simply combined high confidence SNPs described above with the AFLP markers and created a new linkage map using the stringent criteria for the “initial” maps. We projected this map onto the anchored draft genome based on common contigs. We then approximated the physical location of the QTL peaks by looking for SNP markers on scaffolds present in the draft genome flanking AFPL markers underneath the QTL peaks identified in the 2009 study.
DATA ACCESIBILITY
Raw data and R-scripts will be deposited in Dryad upon final acceptance and are available upon request. The genome sequences and chromosome assembly are available on NCBI’s GenBank under BioProject number PRJNA392944 (as of the date of manuscript submission no accession numbers are available, but these will be included in a next version of this manuscript).
RESULTS
De novo genome assembly
The sequencing of the four libraries yielded 162.5 Gb of raw sequences (Table 1). After read processing, 145.5 Gb was used for the sequence assembly. We compared among assemblies resulting from different Kmer sizes (k = 31, 39, 47, 55, 63, 71, 79 and 87). Based on the N50/L50 and the total assembly size, the assembly produced with k = 87 was retained for the final draft genome. Despite a large number of scaffolds in the final assembly, the median length of the scaffolds was high and the total length of the assembly covers about 83% of the expected complete genome in Laupala.
Gene space coverage in the assembly was evaluated using the L. kohalensis cricket gene index (Danley et al. 2007) (release 2.0), RNASeq from Laupala cerasina (unpublished) and by performing a BUSCO search using eukaryotic and arthropod specific conserved genes. Respectively 95% and 92% of the Laupala gene index and RNAseq sequences mapped to the current genome. In addition, the BUSCO search indicated very few missing genes in either database (Table 1).
Linkage maps
We obtained 815,109,126; 522,378,849; and 311,558,401 reads after demultiplexing for ParKoh, ParKon, and PruKoh, respectively. Average sequencing depth ± standard deviation was 105.6 x ± 52.0, 35.5 x ± 16.5, and 44.4 x ± 13.1, respectively.
In the initial maps, 158 (ParKoh), 170 (ParKon), and 138 (PruKoh) markers were grouped into eight linkage groups at a LOD score of 5.0, as expected based on the existence of seven autosomes and one X-chromosome in Laupala. The corresponding marker densities were 5.14, 4.85, and 7.33 markers per cM, containing 526, 650, and 325 markers at 1.91, 1.37, and 3.25 markers per cM and on the dense maps we placed 608, 823, and 383 markers at 1.69, 1.15, and 2.80 markers per cM.
Collinear linkage maps with few rearrangements
We expected few rearrangements and a high degree of collinearity among the linkage maps. The visual comparison of marker positioning showed that relative locations of shared scaffolds were similar across the linkage maps in both the initial and the comprehensive maps with only few putative inversions or genetic transpositions (Fig 2, Fig S1). Using linear regression, we then tested whether the marker order (represented by the R2) and the relative distance between markers (represented by the slope, β) shared across the maps were similar. Our results generally supported the null hypothesis of high collinearity. Although there was some variation in marker order and map length among the three maps, marker positions on most linkage groups differed only marginally from linearity and increased in genetic distance at similar rates (Table 2).
Further support came from Pearson correlation coefficients between the linkage groups from the cross-specific maps and the assembled chromosomes. These were generally high (> 0.95), indicating substantial synteny between the different genetic maps and the (consensus) anchored reference genome (Fig. S2). We did, however, observe some evidence for inverted regions on all linkage groups except LG 4, mostly in the peripheral regions of the chromosomes (Fig. 2, Fig. S1). Additionally, there was some variation in the length of the linkage groups among the crosses most notably for LG 3, 6, and 2 (Table 3).
Limited heterogeneity in segregation distortion
We expected genetic incompatibility to be more likely in the ParKoh cross than in the ParKon and PruKoh cross, because L. kohalensis and L. paranigra are more distantly related than any of the other species pairs (Fig 1). We tested this hypothesis by examining the degree of segregation distortion in groups of adjacent markers among the linkage maps. There were few patterns of regionally high segregation distortion in all three crosses (Fig 3). However, LG 1 showed a bias against L. kohalensis homozygotes in the ParKoh cross but not in any of the other crosses, lending some support to the hypothesis of limited genetic incompatibility. Additionally, there was significant variation in the frequency of heterozygotes across the linkage groups (linear model Freq[heterozygotes]~LG x cross: R2 = 0.21, F20,1547 = 20.7, P < 0.0001). Post-hoc Tukey Honest Significant Differences revealed that linkage group 6 had the lowest abundance of heterozygotes overall and within each of the intercrosses and that levels of heterozygosity on LG 6 were similar across the maps (Table S2).
Variable recombination rates across the genome
We anchored a total of 1054 scaffolds covering 720 million base pairs, a little below half the current genome assembly (see Table S3 for anchoring statistics). Average recombination rates (cM/Mb) varied from between 0.75 (ParKon) and 0.93 (ParKoh) on the X chromosome to between 3.12 (ParKon) and 4.24 (PruKoh) on LG 7 (Table 3). Both including and excluding the sex chromosome, there is a significant negative relationship (with X: β = -0.016, R2 = 0.72, F1,5 = 15.500, P = 0.0076; without X: β = -0.015, R2 = 0.66, F1,5 = 9.754, P = 0.0261) between chromosome size and broad-scale recombination rate.
Most linkage groups showed wide regions of strongly reduced recombination rates in the center of the linkage groups, revealing the metacentric location of the centromeres and the coincidence of the centromere with large recombination ‘deserts’ (Fig. 4). The general pattern of recombination rate variation across the genome was similar among the three intercrosses, but some cross-specific peaks in recombination rates were observed on almost all linkage groups (Fig 4).
The colocalizing trait and preference QTL are not associated with low recombination rates
Contrary to our expectation, the approximate location of the colocalizing song and preference QTL peak from (Shaw and Lesnick 2009) was associated with average recombination rates in the ParKoh and ParKon map and low recombination rates in the PruKoh map (Fig 4; Table S4). The majority of the other QTL peaks are located in regions of low recombination (Fig 4; Table S4).
DISCUSSION
The likelihood of speciation and the evolutionary trajectory of diverging populations is contingent on the chromosomal architecture, both by directly invoking speciation events through chromosomal rearrangements, as well as by locally affecting the efficacy of selection on complex traits through recombination rate variation (Noor et al. 2001; Rieseberg 2001; Butlin 2005; Slatkin 2008; Noor and Bennett 2009; Cutter et al. 2014; Ortiz-Barrientos et al. 2016). We find large, centric regions with low recombination rates but a major male song rhythm QTL that co-localizes with a female preference QTL did not map to a region with particularly low recombination. This is important because it challenges the hypothesis that co-evolution of traits and preferences is constrained by local recombination rates. Additionally, we projected three high density linkage maps onto a de novo assembly of the Laupala draft genome – the first published draft genome for crickets – to generate the first assembly of pseudomolecules for Orthoptera. This is a major step forward in creating genomic resources for an important model system in neurobiological, behavioral ecological, and evolutionary genetics studies (Horch et al. 2017).
Chromosomal rearrangements
Variation in the organization and structure of chromosomes contributes to postzygotic reproductive isolation after speciation and has been suggested to contribute to the speciation process directly (Noor et al. 2001; Rieseberg 2001). The Laupala radiation comprises many morphologically and ecologically cryptic, but sexually divergent species that can be successfully interbred and have diverged only recently (Otte 1994; Shaw 1996; Mendelson and Shaw 2005). We therefore expected limited variation in chromosome structure and number and few signatures of genetic incompatibility between the species. In line with these expectations, we observed limited large scale structural variation and found that linkage maps were collinear acrossin terspecies crosses (Fig 2). In addition, apart from linkage group 1 in the cross of the most distantly related species pair, we found no sign of broadly distorted allele frequencies in the F2 generations (Fig 3). Structural genomic variation thus appears to be an unlikely contributor to isolation in the recent evolutionary history of Laupala.
However, we did find signatures of several inversions when comparing the interspecific linkage maps. Inversions can locally suppress recombination and put heterozygotes at a selective disadvantage, thereby contributing to reproductive barriers between species as well as initiating the speciation process (Noor et al. 2001; Rieseberg 2001; Stevison et al. 2011). Regions showing inversions between the genetic maps were mostly located in the peripheral regions of the linkage groups. Particularly linkage group 6 showed substantial variation in marker order and genetic map length across the intercrosses. The observed deficit in heterozygotes on LG 6 compared to the other linkage groups might thus be related to the substantial variation in marker order in the periphery as well as to the variability in the total length of LG6. Laupala species hybrids have not yet been discovered in nature, however, these findings indicate a potential for underdominant selection, i.e. selection against heterozygotes for rearranged chromosomal regions (Ortiz-Barrientos et al. 2016), to act against interspecific introgression in the case of secondary contact.
Segregation distortion
We found a large region on linkage group 1 showing substantial segregation distortion due to an excess of L. paranigra alleles when crossed with L. kohalensis, but not in the other two crosses (Fig 3). This observation is in line with expectations of stronger genetic incompatibility between more distantly related species pairs. Segregation distortion by itself does not complicate the ordering of genetic markers (Hackett and Broadfoot 2002), except if it is the result of genotyping errors; accordingly, we filtered out loci with high levels of segregation distortion prior to creating the maps. Any remaining variability in levels of segregation distortion is expected to be due to random variation and genetic incompatibility (Burt and Trivers 2006; Presgraves 2010), which plays an important role in the reinforcement of species boundaries. An important cause of genetic incompatibilities are inversions and other rearrangements. However, our results do not indicate that linkage group 1 is particularly variable among the crosses. Other genetic elements, such as selfish alleles and distorting gene complexes such as the sd complex in Drosophila (Larracuente and Presgraves 2012) could also lead to the observed pattern on LG1. However, these are likely less common and would be expected to affect the different crosses in similar ways.
Recombination landscape
Chromosomal rearrangements specifically, and the structural variability within the genome in general, influence genomic divergence by locally altering recombination rates. Felsenstein drew important attention to the role of recombination rates in speciation (Felsenstein 1974, 1981). In recent years, characterization of the recombination landscape has received increasing attention (Butlin 2005; Slatkin 2008; Noor and Bennett 2009; Barb et al. 2014; Burri et al. 2015), in particular in relation to its role in linked and background selection – linkage of neutral loci to positively selected alleles and linkage of neutral loci to deleterious alleles, respectively (Cutter et al. 2014). Here, we show that there is limited variation in recombination rates among the species pairs (Fig 4). The genome-wide average varied between 1.3 and 1.5 cM/Mb (Table 3), similar to dipterans and substantially lower than social hymenopterans and lepidopterans (Wilfert et al. 2007).
However, for all three species pairs we document high variability in recombination across genomic regions. We found large regions of low recombination in all three maps, with recombination rates well below 1 cM/Mb and occasionally approaching zero, flanked by steep inclines reaching rates up to 6 cM/Mb (Fig 4). This pattern is consistent with earlier findings in plants (Anderson et al. 2003), invertebrates (Rockman and Kruglyak 2009; Niehuis et al. 2010), and vertebrates (Backström et al. 2010; Roesti et al. 2013; Singhal et al. 2015), but differs from observations in, for example, Drosophila (Kulathinal et al. 2008) and humans (Myers et al. 2005).
We found that the wide regions of low recombination were characteristic of the larger linkage groups (size in terms of the sum of the lengths of the anchored scaffolds), which also had lower average recombination rates. The smaller linkage groups also show substantial recombination rate heterogeneity with decreasing recombination rates closer to the centromere, however low levels of recombination are limited to only a narrow, centric region. Although recombination rate heterogeneity can arise from selection against recombination due to negative epistasis or the maintenance of linkage disequilibrium between mutually beneficial alleles (Smukowski and Noor 2011; Stevison et al. 2011; Smukowski Heil et al. 2015; Ortiz-Barrientos et al. 2016), the generality of this pattern across the genome and across taxa implies there are generic molecular mechanisms underlying the peripheral bias in crossing over.
In addition to the relation between recombination rates and linked and background selection, linkage disequilibrium as a result of reduced crossing over along extensive genomic regions could have important evolutionary consequences on sexual trait evolution in this system. There is known sexual trait-preference matching in this system, with co-localizing QTL for male song and female preference (Shaw and Lesnick 2009; Wiley et al. 2012). The highly polygenic genetic architectures associated with sexual (acoustic) communication in crickets would likely require genetic linkage (physical or otherwise) of trait and preference genes to co-evolve at the elevated rates observed in Hawaiian crickets (Mendelson and Shaw 2005; Shaw et al. 2011). Low recombination rates might enhance or facilitate linkage disequilibrium, thus easing co-evolution of signals and preference.
However, our data lend no support for this hypothesis, as the colocalized song and preference QTL peaks in the L. kohalensis x L. paranigra cross map to a region of relatively high recombination in the genome (Fig. 4). Interestingly, many of the remaining song QTL do map to the center of their respective chromosomes, which are associated with low recombination. Our results thus suggest that on the one hand, low recombination is not a likely driving force in the establishment of linkage between the trait and preference genes in this system. On the other hand, limited recombination around most other song QTL, suggest that interspecific gene flow in these regions is likely limited.
There is some uncertainty in the estimated recombination rates in this study. Using interspecific maps may lead to somewhat lower estimates compared to intraspecific maps. For example, in Nasonia vitripennis recombination rates estimated from interspecific maps were 1.8% lower than estimates from intraspecific maps (Beukeboom et al. 2010). In addition, although we were able to anchor about 50% of the assembly in the first version of the Laupala reference genome, there are still many scaffolds unaccounted for. These ‘missing’ scaffolds are expected to add to the length of the physical map (from which they are absent) more than to the length of the genetic map (where they are largely accounted for by genetic distance of the surrounding markers), thus lowering the recombination rate. However, our study focuses on patterns of recombination, which should not be affected, rather than aiming to infer the true recombination rate, which we can only approximate at this point.
In summary, comparing three interspecific linkage maps from a rapid insect radiation, we find limited variation in chromosome structure among species, but similarly strong heterogeneity in the recombination landscape across their genomes. We present a de novo genome assembly and anchor a substantial part of the L. kohalensis genome to putative chromosomes. Crickets are an important model system for evolutionary and neurobiological research. However, there are limited genomic resources and this first Orthopteran chromosome-level draft genome and recombination rate map offer important background information for future research on the role of selection in genetic divergence and the genomic basis of speciation. In this study, we further provide insight into the extent to which chromosomal rearrangements and structural variation contribute to genetic incompatibilities in closely related but sexually divergent populations. Importantly, we show that trait-preference co-evolution is not necessarily coupled to regions of low recombination, emphasizing that, at least in Laupala, other processes drive interspecific variation in sexual communication phenotypes during divergence.
ACKNOWLEDGEMENTS
This work was supported by the National Science Foundation grant number IOS1257682 “The Genomic Architecture underlying Behavioral Isolation and Speciation’.