Abstract
Recombination is a complex biological process that results from a cascade of multiple events during meiosis. Understanding the genetic determinism of recombination can help to understand if and how these events are interacting. To tackle this question, we studied the patterns of recombination in sheep, using multiple approaches and datasets. We constructed male recombination maps in a dairy breed from the south of France (the Lacaune breed) at a fine scale by combining meiotic recombination rates from a large pedigree genotyped with a 50K SNP array and historical recombination rates from a sample of unrelated individuals genotyped with a 600K SNP array. This analysis revealed recombination patterns in sheep similar to other mammals but also genome regions that have likely been affected by directional and diversifying selection. We estimated the average recombination rate of Lacaune sheep at 1.5 cM/Mb, identified about 50,000 crossover hotspots on the genome and found a high correlation between historical and meiotic recombination rate estimates. A genome-wide association study revealed two major loci affecting inter-individual variation in recombination rate in Lacaune, including the RNF212 and HEI10 genes and possibly 2 other loci of smaller effects including the KCNJ15 and FSHR genes. Finally, we compared our results to those obtained previously in a distantly related population of domestic sheep, the Soay. This comparison revealed that Soay and Lacaune males have a very similar distribution of recombination along the genome and that the two datasets can be combined to create more precise male meiotic recombination maps in sheep. Despite their similar recombination maps, we show that Soay and Lacaune males exhibit different heritabilities and QTL effects for inter-individual variation in genome-wide recombination rates.
Introduction
Meiotic recombination is a fundamental biological process that brings a major contribution to the genetic diversity and the evolution of eukaryotic genomes (Baudat and al., 2013). During meiosis, recombination enables chromosomal alignment resulting in proper disjunction and segregation of chromosomes, avoiding deleterious outcomes such as aneuploidy (Hassold, Hall, and Hunt 2007). Over generations, recombination contributes to shaping genetic diversity in a population by creating new allelic combinations and preventing the accumulation of deleterious mutations. Over large evolutionary timescales, divergence in recombination landscapes can lead to speciation: the action of a key actor in the recombination process in many mammals, the gene PRDM9, has been shown to have a major contribution to the infertility between two mouse species, making it the only known speciation gene in mammals today (Mihola et al. 2009).
Genetics studies on recombination were first used to infer the organisation of genes along the genome (Sturtevant 1913). With the advance in molecular techniques, more detailed physical maps and eventually whole genome assemblies are now available in many species. The establishment of highly resolutive recombination maps remains of fundamental importance for the validation of the physical ordering of markers, obtained from sequencing experiments (Groenen et al. 2012; Jiang et al. 2014). From an evolutionary perspective the relevant distance between loci is the genetic distance and recombination maps are essential tools for the genetic studies of a species, for estimation of past demography (H. Li and Durbin 2011; Boitard et al. 2016), detection of selection signatures (Sabeti et al. 2002; Voight et al. 2006), QTL mapping (Cox et al. 2009) and imputation of genotypes (Howie, Donnelly, and Marchini 2009) for genome-wide association studies (GWAS) or genomic selection. Precise recombination maps can be estimated using different approaches. Meiotic recombination rates can be estimated from the observation of markers’ segregation in families. Although this is a widespread approach, its resolution is limited by the number of meioses that can be collected within a population and the number of markers that can be genotyped. Consequently highly resolutive meiotic maps have been produced in situations where large segregating families can be studied and genotyped densely (Shifman et al. 2006; Mancera et al. 2008; Groenen et al. 2009; Rockman and Kruglyak 2009; Augustine Kong et al. 2010) or by focusing on specific genomic regions (Cirulli, Kliman, and Noor 2007; Stevison and Noor 2010; Kaur and Rockman 2014). In livestock species, the recent availability of dense genotyping assays has fostered the production of highly resolutive recombination maps (Tortereau et al. 2012; Susan E. Johnston et al. 2016; S. E. Johnston et al. 2017) in particular by exploiting reference population data from genomic selection programs (Sandor et al. 2012a; Ma et al. 2015; Kadri et al. 2016a). Another approach to study the distribution of recombination on a genome is to exploit patterns of correlation between allele frequencies in a population (i.e. Linkage Disequilibrium, LD) to infer past (historical) recombination rates (McVean, Awadalla, and Fearnhead 2002; N. Li and Stephens 2003; Chan, Jenkins, and Song 2012). Because the LD-based approach exploits in essence meioses accumulated over many generations, it can provide more precise estimates of local variation in recombination rate. For example, until recently (Pratto et al. 2014; Lange et al. 2016) this was the only indirect known approach allowing to detect fine scale patterns of recombination genome-wide in species with large genomes. Several highly recombining intervals (recombination hotspots) were detected from historical recombination rate maps and confirmed or completed those discovered by sperm-typing experiments (Simon Myers et al. 2005; Crawford et al. 2004). One important caveat of LD-based approaches is that their recombination rate estimates are affected by other evolutionary processes, especially selection that affects LD patterns unevenly across the genome. Hence differences in historical recombination between distant genomic regions have to be interpreted with caution. Despite this, historical and meiotic recombination rates usually exhibit substantial positive correlation (Rockman and Kruglyak 2009; Chan, Jenkins, and Song 2012; Brunschwig et al. 2012; J. Wang et al. 2012).
The LD-based approach does not allow to study individual phenotypes and therefore to identify directly loci influencing inter-individual variation in recombination rates. In contrast, family-based studies in human (A. Kong et al. 2008a; Chowdhury et al. 2009), Drosophila (Stevison and Noor 2010; Chan, Jenkins, and Song 2012) mice (Shifman et al. 2006; Brunschwig et al. 2012) cattle (Sandor et al. 2012a; Ma et al. 2015; Kadri et al. 2016a) and sheep (Susan E. Johnston et al. 2016) have demonstrated that recombination exhibits inter-individual variation and that this variation is partly determined by genetic factors. Two recombination phenotypes have been described: the number of crossovers per meiosis (Genome-wide Recombination Rate, GRR herein) and the fine scale localization of crossovers (Individual Hotspot Usage, IHU herein). GRR has been shown to be influenced by several genes. For example, a recent genome-wide association study found evidence for association with 6 genome regions in cattle (Kadri et al. 2016a). Among them, one of the genomic regions consistently found associated to GRR in mammals is an interval containing the RNF212 gene. In contrast to GRR, the IHU phenotype seems mostly governed by a single gene in most mammals, PRDM9. This zinc-finger protein has a key role in recruiting SPO11, thereby directing DNA double-strand breaks (DSBs) that initiate meiotic recombination. Because PRDM9 recognizes a specific DNA motif, the crossover events happen in hotspots carrying this motif. This PRDM9 associated process is however not universal, as it is only active in some mammals; canids for example do not carry a functional copy of PRDM9 and exhibit different patterns for the localization of recombination hotspots (Auton et al. 2013).
As mentioned above, recombination was studied recently in sheep (Susan E. Johnston et al. 2016), which lead to the production of precise genome-wide recombination maps, revealed a similar genetic architecture of recombination rates in sheep as in other mammals and identified two major loci affecting individual variation. Quite interestingly, one the QTL identified in this study, localized near the RNF212 gene, was clearly demonstrated to have a sex specific effect. This study was performed in a feral population of sheep which is quite distantly related to continental populations (Kijas et al. 2012) and has not managed by humans for a long time. To understand how recombination patterns and genetic determinism can vary across populations, we conducted in this work a study in another sheep population, the Lacaune, from south of France. The Lacaune breed is the main dairy sheep population in France, its milk being mainly used for the production of Roquefort cheese. Starting in 2011, a large genotyping effort started in the breed to implement a genomic selection program (Baloche et al. 2014), and young selection candidates are now routinely genotyped for a medium density genotyping array (about 50K SNP). This constitutes a large dataset of genotyped families that can be used to study recombination, although limited to one sex as only males were used for genomic selection in this population. This dataset offers an opportunity to study variation in recombination and its genetic determinism between very diverged populations of the same species. Hence, a first objective of this study was to elucidate whether these two sheep populations had similar distribution of recombination on the genome and whether they shared the same genetic architecture of the trait, and in particular the same QTLs effects.
The second objective of this study was to compare different approaches to study recombination from independent data in the same population. To this end, in addition to the pedigree data, we exploit a sample of 51 unrelated individuals genotyped with a high-density genotyping array (about 500K SNP). While, the family data was used to establish meiotic recombination maps, the sample of densely genotyped individuals was used to create historical recombination maps of higher resolution. This offered the opportunity to evaluate to which extent sheep ancestral recombination patterns match contemporary ones.
Materials and Methods
Study Population and Genotype Data
In this work, we exploited two different datasets of sheep from the Lacaune breed: a pedigree dataset of 8,085 related animals genotyped with the medium density Illumina Ovine Beadchip® including 54,241 SNPs, and a diversity dataset of 70 unrelated Lacaune individuals selected as to represent population genetic diversity, genotyped with the high density Illumina Ovine Infinium® HD SNP Beadchip including (Rochus et al. 2017; Moreno-Romieux et al. 2017)
Standard data cleaning procedures were carried out on the pedigree dataset using plink 1.9 (Chang et al. 2015), excluding animals with call rates below 95% and SNPs with call freq below 98%. After quality controls we exploited genotypes at 46,813 SNPs and 5,940 meioses. For these animals, we only selected the sires which had their own sire known and at least 2 offspring and the sires which did not have their own sire known, but at least 4 offspring. Eventually, 345 male parents, called focal individuals (FIDs) hereafter, met these criteria: 210 FIDs had their father genotype known while the remaining 135 did not (Figure 1).
Recombination Maps
Meiotic recombination maps from pedigree data
Detection of crossovers
Crossover locations were detected using LINKPHASE (Druet and Georges 2015). From the LINKPHASE outputs (recombination_hmm files), we extracted crossovers boundaries. We then identified crossovers occurring in the same meiosis less than 3 Mb apart from each other (that we call double crossovers) and considered them as dubious. This number was chosen as it corresponded to clear outliers in the distribution of inter-crossover distances. They are also quite unlikely under crossover interference. We applied the following procedure: given a pair of double crossovers, we set the genotype of the corresponding offspring as missing in the region spanned by the most extreme boundaries and re-run the LINKPHASE analysis. After this quality control step, we used the final set of crossovers identified by LINKPHASE to estimate recombination rates. This dataset consisted of 213,615 crossovers in 5,940 meioses.
Estimation of recombination rates
Based on the inferred crossover locations, meiotic recombination rates were estimated in windows of one megabase and between marker intervals of the medium SNP array using the following statistical model, inspired by (Cheung et al. 2007). For small genetic intervals such as considered here, the recombination rate (termed c in the following), is usually expressed in centiMorgans per megabase and the probability that a crossover occurs in one meiosis in an interval j (measured in Morgans) is 0.01cjlj where lj is the length of the interval expressed in megabases. When considering M meioses, the expected number of crossovers in the interval is 0.01cjljM. When combining observations in multiple individuals, we want to account for the fact that they have different average numbers of crossovers per meiosis (termed Rs for individual s). To do so we multiply the expected number of crossovers in the interval by an individual specific factor equal to (Rs/R) where R is the average number of crossovers per meiosis among all individuals. Finally, for individual s in interval j the expected number of crossovers is 0.01cjljMsRs/R. Given this expected number, a natural distribution to model the number of crossovers observed in an interval is the Poisson distribution so that the number ysj of crossovers observed in the interval j for an individual s is modelled as:
To combine crossovers across individuals, the likelihood for cj is the product of poisson likelihoods from equation (1).
We then specify a prior distribution for cj:
To set α and ß, first raw cj estimates are computed using the method of (Sandor et al. 2012a) across the genome and then a gamma distribution is fitted to the resulting genome-wide distribution (Figure S1). Combining the prior (2) with the likelihoods in equation (1), the posterior distribution for cj is:
As the localization of crossovers was usually not good enough to assign them with certainty to a single genomic interval, final estimates of cj are obtained as follows:
for each crossover overlapping interval j and localized within a window of size L, let xc be an indicator variable that takes value 1 if the crossover occurred in interval j and 0 otherwise. Assuming that, locally, recombination rate is proportional to physical distance, set P(xc = 1) = min(lj/L, 1).
Using the probability in step (i), sample xc for each crossover overlapping interval j and set
Given ysj, sample cj from equation (3)
For each interval considered, perform step (ii) and (iii) above 1000 times to draw samples from the posterior distribution of cj thereby accounting for uncertainty in the localization of crossovers.
Historical recombination maps from the diversity data
The diversity data contains 70 Lacaune individuals genotyped for a High Density SNP array comprising 527,823 autosomal markers (Rochus et al. 2017). Nineteen of these individuals are FIDs in the pedigree data. To perform the LD-based analysis on individuals unrelated to the pedigree study, these individuals were therefore removed from the dataset and the subsequent analyses performed on the 51 remaining individuals. Population-scaled recombination rates were estimated using PHASE (N. Li and Stephens 2003). For computational reasons and to allow for varying effective population size along the genome, estimations were carried out in 2 Mb windows, with an additional 100 Kb on each side overlapping with neighbouring windows, to avoid border-effect in the PHASE inference. PHASE was run on each window with default options, except that the number of main iterations was increased to obtain larger posterior samples for recombination rate estimation (option −X10) as recommended in the documentation.
From the PHASE output, 1000 samples were obtained from the posterior distribution of:
The background recombination rate: ρw = 4Nwcw, where Nw is the effective population size in the window, cw is the recombination rate comparable to the family-based estimate.
An interval specific recombination intensity λj, for each marker interval j of length lj in the window, such that the population scaled genetic length of an interval is: δj = ρwλjlj
The medians were used as point estimates of parameters and, computed over the posterior λj and δj, computed over the posterior distributions
Identification of intervals harbouring crossover hotspots
Intervals that showed an outlying λj value compared to the genome-wide distribution of λj were considered as harbouring a crossover hotspot. Specifically, a mixture of Gaussian distribution was fitted to the genome-wide distribution of log10 (λj) using the mclust R package (Fraley and Raftery 2002), considering that the major component of the mixture modelled the background distribution of λj in non-hotspots intervals. From this background distribution, a p-value was computed for each interval that corresponded to the null hypothesis that it does not harbour a hotspot. Finally, hotspot harbouring intervals were defined as those for which FDR(λj) < 5%, estimating FDR with the (Storey and Tibshirani 2003) method, implemented in the R qvalue package. This procedure is illustrated in Figure S2.
Combination of meiotic and historical recombination rates and construction of a high resolution recombination map
To construct a meiotic recombination map of the HD SNP array requires that the historical recombination rate estimates be scaled by 4 times the effective population size. Due to evolutionary pressures, the effective population size varies along the genome, so it must be estimated locally. This can be done by exploiting the meiotic recombination rate inference obtained from the pedigree data analysis as explained below.
Consider a window of one megabase on the genome, using the approach described above, we can sample values cjk (window j, sample k) from the posterior distribution of the meiotic recombination rate cj. Similarly, using output from PHASE we can extract samples from the posterior ρjk from the posterior distribution of the historical recombination rates (ρj = ρw λj). Now, considering that ρj = 4Nejcj where Nej is the local effective population size of window j, we get log(ρj) = log(Nej) + log(cj). This justifies using a model on both cjk and ρjk values: where yijk is log(cjk) when i=1 (meiotic-recombination rate sample) and yijk is log(ρjk) when i=2 (historical recombination rate sample). In this model, μ estimates the log of the genome-wide recombination rate, xijk=1 if i=2 and 0 otherwise so that α estimates log(4Ne), where Ne is the average effective population size of the Lacaune population, μ + βj estimates log(cj) combining population and meiotic recombination rates, and α+(ν2j-ν1j) estimates log(4Nej). μ and were αi were considered as fixed effects while βj and νij and were considered as independent random effects. Using this approach allows to combine in a single model LD- and pedigree-based inferences, while accounting for their respective uncertainties as we exploit posterior distribution samples.
Model (4) was fitted on 20 samples of the posterior distributions of cj and ρj for all windows of one megabase covering the genome, with an additional fixed effect for each chromosome, using the lme4 R package (Bates et al. 2015). Windows lying less than 4 Mb from each chromosome end were not used because inference on cj was possibly biased in these regions (see Results). After estimating this model, historical recombination rate estimates of HD intervals were scaled within each window by dividing them by their estimated local effective population size (i.e. For windows lying within 4 Mb of the chromosome ends, historical recombination rate estimates were scaled using the genome-wide average effective population size This led eventually to estimates of the meiotic recombination rates, expressed in centiMorgans per megabase, for all intervals of the HD SNP array, which we termed a high resolution recombination map.
Effect of recombination hotspots on the recombination rate
For each interval of the medium density SNP array, we computed the number of significant hotspots detected as explained above and the hotspot density (number of hotspots per unit of physical distance). After having corrected for the chromosome effect, the GC content effect and for windows farther than 4 Mb of the chromosome end, we fitted a linear regression model to estimate the effect of hotspots density on the meiotic recombination rate.
Comparison with Soay sheep recombination maps and integration of the two datasets to produce new male recombination maps in Sheep
In order to compare the recombination maps in Lacaune with the previously established maps in Soay sheep (Susan E. Johnston et al. 2016), we downloaded the raw data from the dryad data repository (doi: 10.5061/dryad.pf4b7) and the additional information available on https://github.com/susjoh/GENETICS_2015_185553. As the approach used in (Susan E. Johnston et al. 2016) to establish recombination maps differs from the one used here, we chose to apply the method of this study to the Soay data to perform a comparison that would not be affected by difference in methods. As the Lacaune data consist only of male meioses, we also only considered male meioses in the Soay data. The final Soay dataset used consisted of 3,445 individuals among which were 299 male FIDs, defined as in the Lacaune analysis. After detecting crossovers with LINKPHASE, one FID exhibited a very high average number of crossover per meiosis (> 100) and was not considered in the analyses (Soay individual ID: RE4844), leaving 298 FIDs. The final dataset consisted of 88,683 crossovers in 2,609 male meiosis and was used to estimated meiotic recombination maps using the exact same approach as described above, both on intervals of one megabase and on the same intervals as the ones considered in the Lacaune meiotic maps on the medium density SNP array. Note that the Soay sheep are not necessarily polymorphic for the same markers as the Lacaune, but that our method is flexible and can nonetheless estimate recombination rates in intervals bordered by monomorphic markers: in such a case adjacent intervals will have the same estimated recombination rate. As the two populations were found to have very similar meiotic recombination maps (see Results), the two sets of crossovers were finally merged to create a combined dataset of 302,298 crossovers in 8,549 male meioses and to estimate new male sheep recombination maps, again on one megabase intervals and on intervals of the medium density SNP array.
Genome-Wide Association Study on Recombination Phenotypes
Genome-wide Recombination Rate (GRR)
The set of crossovers detected was used to estimate the genome-wide recombination rate (GRR) of each FID in the family dataset from their observed number of crossovers per meiosis, adjusting for covariates: year of birth of the parent, considered as a cofactor with 14 levels for years spanning from 1997 to 2010 and insemination month of the offspring’s ewe, treated as a cofactor with 7 levels for months spanning from February to August. We used a mixed-model for estimating the population average GRR μ, covariates fixed effects β and individual breeding values us, while controlling for non genetic individual specific effects as: with where yso is the number of crossovers in the meiosis between FID s and offspring o, A is the pedigree-based relationship matrix between FIDs and xso the line of the corresponding design matrix for observation yso. We fitted this model using BLUPf90 (Misztal et al., 2002) and extracted: (i) estimates of variance components σe2, σa2 and σs2, which allows to estimate the heritability of the trait (calculated as σs2/(σs2 + σa2 + σe2)) and (ii) prediction ũs of GRR deviation for each FID.
Genotype Imputation
Nineteen of the 345 FIDs are present in the diversity dataset of HD genotypes. For the 336 remaining FIDs, their HD genotypes at 507,784 SNPs were imputed with BimBam (Guan and Stephens 2008; Servin and Stephens 2007) using the 70 unrelated Lacaune individuals as a panel. To impute, BimBam uses the fastPHASE model (Scheet and Stephens 2006), which relies on methods using cluster of haplotypes to estimate missing genotypes and reconstruct haplotypes from unphased SNPs of unrelated animal. BimBam was run with 10 expectation-maximization (EM) starts, each EM was run 20 steps on panel data alone, and an additional 1 step on cohort data, with a number of clusters of 15. After imputation BimBam estimates for each SNP in each individual an average number of alleles, termed mean genotype, computed from the posterior distribution of the three possible genotypes. This mean genotype has been shown to be efficient for performing association tests (Guan and Stephens 2008). In subsequent analyses, we used the mean genotypes provided by BimBam of the 345 FIDs at all markers of the HD SNP array. To assess the quality of genotype imputation at the most associated regions, 10 markers of the HD SNP array, 1 in chromosome 6 associated region and 9 in the chromosome 7 associated region (see Results) were genotyped for 266 FIDs for which DNA samples were still available. We evaluated the quality of imputation for the most significant SNPs by comparing for each possible genotype its posterior probability estimated by Bimbam to the error rate implied by calling it. We observed a very good agreement between the two measures (Figure S3), which denoted good calibration of the imputed genotypes at top GWAS hits.
Single- and multi-QTLs GWAS on GRR
We first tested association of individual breeding values ũs with mean genotypes at 503,784 single SNPs imputed with BimBam. We tested these associations using the univariate mixed-model approach implemented in the Genome-wide Efficient Mixed Model Association (Gemma) software (Zhou and Stephens, 2012). To accounts for polygenic effects on the trait, the centered genomic relationship matrix calculated from the mean genotypes was used. The p-values reported in the results correspond to the Wald test.
To go beyond single SNP association tests, we also estimated a Bayesian sparse linear mixed-model (Zhou, Carbonetto, and Stephens 2013) as implemented in Gemma. This method allows to consider multiple QTLs in the model, together with polygenic effects at all SNPs. The principle of the method is to have for each SNP l an indicator variable that takes value 1 if the SNP is a QTL and 0 otherwise. The strength of evidence that a SNP is a QTL is measured by the posterior probability , called posterior inclusion probability (PIP). Note that all SNPs are included in the model when doing so. Inference of the model parameters is performed using an iterative MCMC algorithm: the number of iterations was set to 10 millions and inference was made on samples extracted every 100 iterations. When a genome region harbors a QTL, multiple SNP in the region can have elevated PIPs. To summarize the strength of evidence for a region to carry a QTL, we calculated a rolling sum of PIPs over 50 consecutive SNPs using the rollsum function of the R zoo package (Zeileis and Grothendieck 2005). Given that the average physical distance between SNPs on the high-density SNP array is about 5 kilobases, this procedure interrogates the probability of the presence of a QTL in overlapping windows of approximately 250 kilobases.
For the univariate analysis, the False Discovery Rate was estimated using the ash package (Stephens 2017) and SNPs corresponding to an FDR < 10% were deemed significant and annotated. For the multivariate analysis, regions where the rolling sum of PIPs exceeded 0.15 were further annotated. The annotation of the QTL regions consisted in extracting all genes from the Ensembl annotation v87 along with their Gene Ontology (GO) annotations and interrogated for their possible involvement in recombination.
Variant Discovery and Additional Genotyping in RNF212
Identification and assignation of the RNF212 sheep genome sequence
The RNF212 gene was not annotated on the Ovis aries v3.1 reference genome. Nevertheless, a full sequence of RNF212 was found in the scaffold01089 of Ovis orientalis (assembly Oori1, NCBI accession NW_011943327). By BLAST alignment of this scaffold, ovine RNF212 could be located with confidence on chromosome 6 in the interval OAR6:116426000-116448000 of Oari3.1 reference genome (Figure S4). This location was confirmed by BLAST alignment with the bovine RNF212 gene sequence. We also discovered that the Oari3.1 unplaced scaffold005259 (NCBI accession JH922970) contained the central part of RNF212 (exons 4-9) and it could be placed within a large assembly gap. Moreover, we also observed that automatically annotated non-coding RNA in the RNF212 interval matched exonic sequence of RNF212 (Figure S4).
Variant discovery in RNF212 in the Lacaune population
Based on the genomic sequence and structure of the RNF212 gene annotated in Ovis orientalis (NCBI accession NW_011943327), a large set of primers were designed using PRIMER3 software (Table S1) for amplification of each annotated exon and some intron part corresponding to exonic region annotated in Capra hircus (Chir_v1.0). PCR amplification (GoTaq, Promega) with each primer pair was realized on 50ng of genomic DNA from 4 selected homozygous Lacaune animals exhibiting the GG and AA (non imputed) genotypes at the most significant SNP of the medium density SNP array of the chromosome 6 QTL (rs418933055, p-value 2.56e-17). Each PCR product was sequenced via the BigDye Terminator v3.1 Cycle Sequencing kit and analyzed on an ABI3730 sequencing machine (Applied Biosystems). Sequenced reads were aligned against the Ovis orientalis RNF212 gene using CLC Main Workbench Version 7.6.4 (Qiagen Aarhus) in order to identify polymorphisms.
Genotyping of mutations in RNF212
The genotyping of 266 genomic DNA from Lacaune animals for the four identified polymorphisms within the ovine RNF212 gene was done by Restriction Fragment Length Polymorphism (RFLP) after PCR amplification using dedicated primers (Table S1) (GoTaq, Promega), restriction enzyme digestion (BsrBI for SNP_14431_AG; RsaI for SNP_18411_GA; and Bsu36I for both SNP_22570_CG and SNP_22594_AG; New England Biolabs) and resolution on 2% agarose gel.
Results
High-Resolution Recombination Maps
Meiotic recombination maps: genome-wide recombination patterns
We studied meiotic recombination using a pedigree of 6,230 individuals, genotyped for a medium density SNP array (50K) comprising around 54,000 markers. After quality controls we exploited genotypes at 46,813 SNPs and identified 213,615 crossovers in 5,940 meioses divided among 345 male parents (FIDs) (see Methods). The pedigree information available varied among focal individuals (Figure 1): 210 FIDs had their father genotype known while the remaining 135 did not. Having a missing parent genotype did not affect the detection of crossovers as the average number of crossovers per meiosis in the two groups was similar (36.1 with known father genotype and 35.8 otherwise) and the statistical effect of the number of offspring on the average number of crossovers per meiosis was not significant (p>0.23). This can be explained by the fact that individuals that lacked father genotype information typically had a large number of offspring (17.4 on average, ranging from 4 to 111), allowing to infer correctly their haplotype phase from their offspring genotypes only. Overall, given that the physical genome size covered by the medium density SNP array is 2.45 gigabases, we estimate that the mean recombination rate in our population is about 1.5 cM/megabase.
Based on the crossovers identified, we developed a statistical model to estimate meiotic recombination rates (see methods) and constructed meiotic recombination maps at two different scales: for windows of one megabase and for each interval of the medium density SNP array. As this statistical approach allowed to evaluate the uncertainty in recombination rate estimates, we provide respectively in File S1 and S2, along with the recombination rate estimates in each interval, their posterior variance and 90% credible intervals. Graphical representation of the meiotic recombination maps of all autosomes are given in File S3.
The recombination rate on a particular chromosome region was found to depend highly on its position relative to the telomere and to the centromere for metacentric chromosomes, i.e. chromosomes 1, 2 and 3 in sheep (Figure S5). Specifically, for acrocentric and metacentric chromosomes, recombination rate estimates were elevated near telomeres and centromeres, but very low within centromeres. In our analysis, recombination rate estimates were found low in intervals lying within 4 megabases of chromosome ends. While this could represent genuine reduction in recombination rates near chromosome ends it is also likely due to crossovers being undetected in our analysis as only few markers are informative to detect crossovers at chromosome ends. In the following analyses, we therefore did not consider regions lying within 4 Mb of the chromosomes ends.
From local recombination rate estimates in 1 Mb windows or medium SNP array intervals, we estimated chromosome specific recombination rates (Figure S6). Difference in recombination rates between chromosomes was relatively well explained by their physical size, larger chromosomes exhibiting smaller recombination rates. Even after accounting for their sizes, some chromosomes showed particularly low (chromosomes 9, 10 and 20) or particularly high (chromosomes 11 and 14) recombination rates. In low recombining chromosomes, large regions had very low recombination, between 9 and 14 Mb on chromosome 9, 36 and 46 Mb on chromosome 10 and between 27 and 31 Mb on chromosome 20. In highly recombining chromosomes, recombination rates were globally higher on chromosome 14, while chromosome 11 exhibited two very high recombination windows between 7 Mb and 8 Mb and between 53 and 54 Mb. In addition, we found, consistent with the literature, that GC content was quite significantly positively correlated with recombination rate both in medium SNP array intervals (p-value < 10 -16, r=0.20) and in 1 Mb intervals (p-value < 10 -16, r=0.28).
Estimation of historical recombination rates and identification of crossover hotspots
We used a different dataset, with 51 unrelated individuals from the same Lacaune population genotyped for the Illumina HD SNP array (600K) comprising 527,823 autosomal SNPs after quality controls. Using a multipoint model for LD patterns (N. Li and Stephens 2003), we estimated, for each marker interval of the HD SNP array, historical recombination rates (see Methods). Compared to meiotic maps, these estimates offer a greater precision as they in essence exploit meioses cumulated over many generations. However, the historical recombination rates obtained are scaled by the effective population size (ρ = 4N2c where Ne is the effective population size and c the meiotic recombination rate) which is unknown, and may vary along the genome due to evolutionary pressures, especially selection. Thanks to the higher precision in estimation of recombination rate, LD-based recombination maps offer the opportunity to detect genome intervals likely to harbour crossover hotspots. A statistical analysis of historical recombination rates (see Methods) identified about 50,000 intervals exhibiting elevated recombination intensities (Figure S2) as recombination hotspots, corresponding to an FDR of 5%. From our historical recombination map, we could conclude that 80% crossover events occurred in 40% of the genome and that 60% of crossover events occurred in only 20% of the genome (Figure S7).
High-resolution recombination maps combining family and population data
Having constructed recombination maps with two independent approaches and having datasets in the same population of Lacaune sheep allowed first to evaluate to which extent historical crossover hotspots explain meiotic recombination, and second to estimate the impact of evolutionary pressures on the historical recombination landscape of the Lacaune population. We present our results on these questions in turn.
We studied whether variation in meiotic recombination can be attributed to the historical crossover hotspots detected from LD patterns only. For each interval between two adjacent SNPs of the medium density array, we (i) extracted the number of significant historical hotspots and (ii) calculated the historical hotspot density (in number of hotspots per unit of physical distance). We found both covariates to be highly associated with meiotic recombination rate estimated on family data (r=0.15 with hotspot density (p < 10 -16) and r=0.19 with the number of hotspots (p<10 -16)). These correlations hold after correcting for chromosome and GC content effects (respectively r=0.14 (p<10 -16) and r=0.18 (p<10 -16)). Figure 2 illustrates this finding in two one-megabase intervals from chromosome 24, one that exhibits a very high recombination rate (7.08 cM/Mb) and the second a low one (0.46 cM/Mb). In this comparison, the highly recombining window carries 36 recombination hotspots while the low recombinant one exhibits none. As the historical background recombination rates in the two windows are similar (0.7/Kb for the one with a high recombination rate, and 0.2/Kb for the other), the difference in recombination rate between these two regions is largely due to their contrasted number of historical crossover hotspots.
In order to study more precisely the relationship between historical and meiotic recombination rates, we fitted a linear mixed model (see Methods) that allowed to estimate the average effective population size of the population, the correlation between meiotic and historical recombination rates and to identify genome regions where historical and meiotic recombination rates were significantly different. We found the effective population size of the Lacaune population to be about 7,000 individuals and a correlation of 0.73 between meiotic and historical recombination rates (Figure 3). We discovered 7 regions where historical recombination rates were much lower than meiotic ones and 3 regions where they were much higher (Table 1, Figure S8). Seven of these 10 regions have extreme recombination rates compared to other genomic regions. To quantify to which extent a window is extreme, we indicate in Table 1, for each window, the proportion of the genome with a lower recombination rate (qw). For 6 of these 7 regions, the historical recombination rate is more extreme than the meiotic rate: four regions have very low meiotic recombination rate and even lower historical recombination rates (the two regions on chromosome 3 and two regions on chromosome 10, between 36-37 megabases and between 42-44 megabases); two regions have very high meiotic recombinations rates and even higher historical recombination rates (on chromosome 12 and on chromosome 23). For these six regions, the discrepancy between meiotic recombination and historical recombination estimates can be explained by the fact that we used a genome-wide prior in our model to estimate meiotic recombination rates that has the effect of shrinking our estimates toward the mean. Because historical estimates were not shrunk in the same way, for these six outlying regions the two estimates did not concur and it is possible that our meiotic recombination rate estimates were slightly over (resp. under) estimated. Out of the four remaining outlying windows, three had a low historical recombination rate but did not have particularly extreme meiotic recombination rates, so that the effect of shrinkage is not likely to explain the discrepancy between meiotic and historical recombination rates. Indeed, these three regions corresponded to previously identified selection signatures in sheep: a region on chromosome 6 spanning 2 intervals between 36 and 38 megabases contains the ABCG2 gene, associated to milk production (Cohen-Zinder et al. 2005), and the LCORL gene associated to stature (recently reviewed in (Takasuga 2015)). This region has been shown to have been selected in the Lacaune breed (Fariello et al. 2014; Rochus et al. 2017); a region spanning one interval on chromosome 10, between 29 and 30 megabases contains the RXFP2 gene, associated to polledness and horn phenotypes (Susan E. Johnston et al. 2013) and found to be under selection in many sheep breeds (Fariello et al. 2014); and a region on chromosome 13 between 63 and 64 megabases that contains the ASIP gene responsible for coat color phenotypes in many breeds of sheep (Norris and Whan 2008), again previously demonstrated to have been under selection. For these three regions, we explain the low historical recombination estimates by a local reduction of the effective population size due to selection.
Finally, one of the three regions with a high historical recombination rate, on chromosome 20 between 28 and 29 megabases had a low meiotic recombination rate, so that the effect of shrinkage cannot explain the discrepancy. This region harbours a cluster of olfactory receptors genes and its high historical recombination rate could be explained by selective pressure for increased genetic diversity in these genes (i.e. diversifying selection), a phenomenon which has been shown in other species (e.g. pig (Groenen et al. 2012), human (Ignatieva et al. 2014), rodents (Stathopoulos, Bishop, and O’Ryan 2014)). Finally, we used the meiotic recombination rates to scale the historical recombination rate estimates and produce high-resolution recombination maps on the HD SNP array (Supporting File S4).
Improved male recombination maps by combining Lacaune and Soay sheep data
Recently, recombination maps have been estimated in another sheep population, the Soay (Susan E. Johnston et al. 2016). Soay sheep is a feral population of ancestral domestic sheep living on an island located northwest of Scotland. The Lacaune and Soay populations are genetically very distant, their genome-wide Fst, calculated using the sheephapmap data (Kijas et al. 2012), being about 0.4. Combining our results with results from the Soay offered a rare opportunity to study the evolution of recombination over a relatively short time scale as the two populations can be considered separated at most dating back to domestication, about 10,000 years ago. The methods used in the Soay study are different from those used here, but the two datasets are similar, although the Soay data has fewer male meioses (2,604 vs. 5,940 in the present study). In order to perform a comparison that would not be affected by differences in estimation methods, we ran the method developed for the Lacaune data to estimate recombination maps on the Soay data. As the Soay study showed a clear effect of sex on recombination rates, we estimated recombination maps on male meioses only. Figure 4 presents the comparison of recombination rates between the two populations in marker intervals of the medium density SNP array. The left panel shows that the two populations exhibit very similar recombination rates (r = 0.82, p<10 -16), although Soay recombination rates appear higher for low recombining intervals (c < 1.5 cM/Mb in gray on the figure). We explain this by the shrinkage effect of the prior, that is more pronounced in the Soay as the dataset is smaller: the right panel on Figure 4 shows that the posterior variance of the recombination rates are clearly higher in Soays for low recombining intervals while they are similar for more recombining intervals. Overall, our results on the comparison of the recombination maps in the two populations are consistent with the two populations having the same amplitude and distribution of recombination on the genome. We therefore analyzed the two populations together to create new male recombination maps based on 302,298 crossovers detected in 8,549 meioses (Supporting file S5). Combining the two dataset together lead to a clear reduction in the posterior variance of the recombination rates, i.e. an increase in their precision (Figure S9).
Genetic Determinism of Genome-wide Recombination Rate in Lacaune sheep
Our dataset provides information on the number of crossovers for a set of 5,940 meioses among 345 male individuals. Therefore, it allows to study the number of crossovers per meiosis (GRR) as a recombination phenotype.
Genetic and environmental effects on GRR
We used a linear mixed-model to study the genetic determinism of GRR. The contribution of additive genetic effects was estimated by including a random FID effect with covariance structure proportional to the matrix of kinship coefficients calculated from pedigree records (see Methods). We also included environmental fixed effects in the model: year of birth of the FID and insemination month of the ewe for each meiosis. We did not find significant differences between the FID year of birth, however the insemination month of the ewe was significant (p = 1.3 10 -3). There was a trend in increased recombination rates from February to May followed by a decrease until July and a regain in August although the number of inseminations in August is quite low, leading to a high standard error for this month (Figure S10). Based on the estimated variance components (Table 2), we estimated the heritability of GRR in the Lacaune male population at 0.23.
Genome-wide association study identifies three major loci affecting GRR in Lacaune sheep
The additive genetic values of FIDs, predicted from the above model were used as phenotypes in a genome-wide association study. Among the 345 FIDs with at least two offsprings, the distribution of the phenotype was found to be approximately normally distributed (Figure S11). To test for association of this phenotype with SNPs markers, we used a mixed-model approach correcting for relatedness effects with a genomic relationship matrix (see Methods). Using our panel of 70 unrelated Lacaune, we imputed the 345 FIDs for markers of the HD SNP array. With these imputed genotypes, we performed two analyses. The first was an association test with univariate linear mixed models, which tested the effect of each SNP in turn on the phenotype (results in Supporting File S6) the second fitted a Bayesian sparse linear mixed model, allowing multiple QTLs to be included in the model (results in Supporting File S7).
Figure 4 illustrates the GWAS results: the top plot shows the p-values of the single SNP analysis and the bottom plot, the posterior probability that a region harbours a QTL, calculated on overlapping windows of 20 SNPs. The single SNP analysis revealed six significant regions (FDR < 10 %): two on chromosome 1, one on chromosome 6, one on chromosome 7, one on chromosome 11 and one chromosome 19. Regions of chromosome 6 and 7 exhibited very low p-values whereas the other three showed less intense association signals. The multi-QTLs Bayesian analysis was conclusive for two regions (regions on chromosome 6 and chromosome 7) while the rightmost region on chromosome 1 was suggestive (Table 3). Two additional suggestive regions were discovered on chromosome 3. Using the multi-QTL approach of (Zhou, Carbonetto, and Stephens 2013) allowed to estimate that together, QTLs explain about 40% of the additive genetic variance for GRR, with a 95% credible interval ranging from 28 to 53 %.
The most significant region was located on the distal end of chromosome 6 and corresponded to a locus frequently associated to variation in recombination rate. In our study the significant region contained 10 genes: CTBP1, IDUA, DGKQ, GAK, CPLX1, UVSSA, MFSD7, PDE6B, PIGG and RNF212. For each of these genes, except RNF212 which was not annotated on the genome (see below), we extracted their gene ontology of the Ensembl v87 database, but none was clearly annotated as potentially involved in recombination. However, two genes were already reported as having a statistical association with recombination rate: CPLX1 and GAK (Augustine Kong et al. 2014). CPLX1 has no known function that can be linked to recombination (Augustine Kong et al. 2014) but GAK has been shown to form a complex with the cyclin-G, which could impact recombination (Nagel et al. 2012). However, RNF212 can be deem a more likely candidate due to its function and given that this gene was associated with recombination rate variation in human (Chowdhury et al. 2009), (A. Kong et al. 2008b), in bovine (Sandor et al. 2012b; Kadri et al. 2016b; Ma et al. 2015) and in mouse (Reynolds et al. 2013). RNF212 is not annotated in the sheep genome assembly oviAri3, however this chromosome 6 region corresponds to the bovine region that contains RNF212 (Figure S4). We found an unassigned scaffold (scaffold01089, NCBI accession NW_011943327) of Ovis orientalis musimon (assembly Oori1) that contained the full RNF212 sequence and that could be placed confidently in the QTL region. To confirm RNF212 as a valid positional candidate, we studied further the association of its polymorphisms with GRR in results presented below.
The second most significant region was located between 22.5 and 23.1 megabases on chromosome 7. All significant SNPs in the region were imputed, i.e. the association would not have been found based on association of the medium density array alone. It matched an association signal on GRR in Soay sheep (Susan E. Johnston et al. 2016). Consistent with our finding, in the Soay sheep study, this association was only found using regional heritability mapping and not using single SNP associations with the medium density SNP array. This locus could match previous findings in cattle (association on chromosome 10 at about 20 Mb on assembly btau3.1), however the candidate genes mentioned in this species (REC8 and RNF212B) were located respectively 2 and 1.5 megabases away from our strongest association signal. In addition, none of the SNPs located around these two candidate genes in cattle were significant in our analysis. Eleven genes were present in the region: OR10G2, OR10G3, TRAV5, TRAV4, SALL2, METTL3, TOX4, RAB2B, CHD8, SUPT16H and RPGRIP1. The study of their gene ontology, extracted from the Ensembl v87 database, revealed that none of them were associated with recombination, although SUPT16H could be involved in mitotic DSB repair (Kari et al. 2011). However another functional candidate, CCNB1IP1, also named HEI10, was located between positions 23,946,971 and 23,951,850 bp, about 500 Kb from our association peak. This gene is a good functional candidate as it has been shown to interact with RNF212: HEI10 allows to eliminate the RNF212 protein from early recombination sites and to recruit other recombination intermediates involved in crossover maturation (Qiao et al. 2014; Rao et al. 2016). Again SNPs located at the immediate proximity of HEI10 did not exhibit significant associations with GRR. Hence, our association signal did not allow to pinpoint any clear positional candidate among these functional candidates (see Figure S12). However, it was difficult to rule them out completely for three reasons. First, with only 345 individuals, our study may not be powerful enough to localize QTLs with the required precision. Second, the presence of causal regulatory variants, even at distances of several hundred kilobases is possible. Finally, the associated region of HEI10 exhibited apparent rearrangements with the human genome, possibly due to assembly problems in oviAri3. These assembly problems could be linked to the presence of genomic sequences coding for the T-cell receptor alpha chain. This genome region is in fact rich in repeated sequences making its assembly challenging. Overall, identifying a single positional and functional candidate gene in this gene-rich misassembled genomic region was not possible based on our data alone.
Our third associated locus was located on chromosome 1 between 268,600 and 268,700 kilobases. In cattle, the homologous region, located at the distal end of cattle chromosome 1, has also been shown to be associated with GRR (Kadri et al. 2016a; Ma et al. 2015). In these studies the PRDM9 gene has been proposed as a potential candidate gene, especially because it is a strong functional candidate given its proven effect on recombination phenotypes. In sheep, PRDM9 is located at the extreme end of chromosome 1, around 275 megabases, 7 megabases away from our association signal (Ahlawat et al. 2016). Hence, PRDM9 was not a good positional candidate for association with GRR in our sheep population. However, the associated region on chromosome 1 contains a single gene, KCNJ15, which has been associated with DNA double-strand breaks repair in human cells (Słabicki et al. 2010).
Finally, the two regions on the chromosome 3 were analyzed. The first was located between 75,162 and 75,319 kilobases and contains only one annotated gene coding for the receptor for follicle-stimulating hormone (FSHR). Though it does not affect recombination directly it is necessary for the initiation and maintenance of normal spermatogenesis in males (Tapanainen et al. 1997). The second region on the chromosome 3 was located between 201,198 and 201,341 kilobases but does contain any annotated gene.
Mutations in the RNF212 gene are strongly associated to Genome-wide Recombination Rate variation in Lacaune sheep
The QTL with the largest effect in our association study corresponded to a locus associated to GRR variation in other species and harbouring the RNF212 gene. As it was a clear positional and functional candidate gene, we carried out further experiments to interrogate specifically polymorphisms within this gene. As stated above, we used the sequence information available for the RNF212 gene from Ovis orientalis which revealed that RNF212 spanned 23,7 Kb on the genome and may be composed of 12 exons by homology with bovine RNF212. However, mRNA annotation indicated multiple alternative exons. Surprisingly, the genomic structure of ovine RNF212 was not well conserved with goat, human and mouse syntenic RNF212 genes (Figure S4). As a first approach, we designed primers for PCR amplification (see Methods) and sequencing of all annotated exons and some intronic regions corresponding to exonic sequences of Capra hircus RNF212. By sequencing RNF212 from 4 carefully chosen Lacaune animals homozygous GG or AA at the most significant SNP of the medium density SNP array on chromosome 6 QTL (rs418933055, p-value 2.56 10 -17), we evidenced 4 polymorphisms within the ovine RNF212 gene (2 SNPs in intron 9, and 2 SNPs in exon 10). The 4 mutations were genotyped in 266 individuals of our association study. We then tested their association with GRR using the same approach as explained above (results in Supporting File S8) and computed their linkage disequilibrium (genotypic r 2) with the most associated SNPs of the high-density genotyping array (see Figure S13) (Table 4). Two of these mutations were found highly associated with GRR, their p-values being of the same order of magnitude (p<10 -16) as the most associated SNP (rs412583165), one of them was even more significant than the most significant imputed SNP (p = 6.25 10 -17 vs p = 9.8 10 -17). We found a clear agreement between the amount of LD between a mutation and the most associated SNPs and their association p-value (see Figure S13). Overall, these results showed that polymorphisms within the RNF212 gene were strongly associated with GRR, and likely tagged the same causal mutation as the most associated SNP. This confirmed that RNF212, a very strong functional candidate, was also a very strong positional candidate gene underlying our association signal.
The genetic determinism of recombination differ between Soay and Lacaune males
GWAS in the Soay identified two major QTLs for GRR, with apparent sex-specific effects. These two QTLs were located in the same genomic regions as our QTLs on chromosome 6 and chromosome 7. The chromosome 6 QTL was only found significant in Soay females, while we detect a very strong signal in Lacaune males. Although the QTL was located in the same genomic region, the most significant SNPs were different in the two GWAS (Figure 6). Two possible explanations could be offered for these results: either the two populations have the same QTL segregating and the different GWAS hits correspond to different LD patterns between SNPs and QTLs in the two populations, or the two populations have different causal mutations in the same region. Denser genotyping data, for example by genotyping the RNF212 mutations identified in this work in the Soay population, would be needed to have a clear answer. For the chromosome 7 QTL, the signal was only found using regional heritability mapping (Nagamine et al. 2012) in the Soay, and after genotype imputation in our study, which makes it even more difficult to discriminate between a shared causal mutation or different causal mutations at the same location in the two populations.
Discussion
In this study, we studied the distribution of recombination along the sheep genome and its relationship to historical recombination rates. We showed that contemporary patterns of recombination are highly correlated to the presence of historical hotspots. We showed that the recombination patterns along the genome are conserved between distantly related sheep populations but that their genetic determinism of genome-wide recombination rates differ. In particular, we showed that polymorphisms within the RNF212 gene are strongly associated to male recombination in Lacaune whereas this genomic region shows no association in Soay males. Hence, combining three datasets, two pedigree datasets in distantly related domestic sheep populations and a densely genotyped sample of unrelated animals, revealed that recombination rate and its genetic determinism can evolve at short time scales, as we discuss below.
Fine-scale Recombination Maps
In this work, we were able to construct fine-scale genetic maps of the sheep autosomes by combining two independent inferences on recombination rate. Our study on meiotic recombination from a large pedigree dataset revealed that sheep recombination exhibits general patterns similar to other mammals (Shifman et al. 2006; Chowdhury et al. 2009; Tortereau et al. 2012). First, sheep recombination rates were elevated at the chromosome ends, both on acrocentric and metacentric chromosomes. In the latter, our analysis revealed a clear reduction in recombination near centromeres. Second, recombination rate depended on the chromosome physical size, consistent with an obligate crossover per meiosis irrespective of the chromosome size. These patterns were consistent with those established in a very different sheep population, the Soay (Susan E. Johnston et al. 2016), and indeed when re-analysing the Soay data with the same approach as used in this study, the results showed a striking similarity between recombination rates in the two populations. Hence, our results show that recombination patterns were conserved over many generations and despite the very different evolutionary histories of the two populations and clear differences in the genetic determinism of GRR in males of the two populations. This similarity allowed to combine the two datasets to create more precise male sheep recombination maps than any of the two studies taken independently.
Our historical recombination maps revealed patterns of recombination at the kilobase scale, with small highly recombining intervals interspaced by more wide, low recombining regions. This result was consistent with the presence of recombination hotspots in the highly recombinant intervals. A consequence was that, as observed in other species, the majority of recombination took place in a small portion of the genome: we estimated that 80% of recombination takes place in 40% of the genome. (Kaur and Rockman 2014) suggested to use a Gini coefficient as a measure of the heterogeneity in the distribution of recombination along the genome to facilitate inter-species comparisons. When calculated on the historical recombination data, the Lacaune sheep has a coefficient of 0.52, which is similar to what is observed in Drosophila but lower than that measured in humans or mice. However, the coefficient calculated here is likely an underestimate due to our limited resolution (a few kilobases on the HD SNP array) compared to the typical hotspot width (a few hundred base-pairs). Overall, we identified 50,000 hotspot intervals which was twice the estimated number of hotspots in humans (International HapMap Consortium et al. 2007). This difference can be explained by different non mutually exclusive reasons. First, it is possible that what we detect as crossover hotspots are due to genome assembly errors and we indeed found a significant albeit moderate effect (OR 1.4) of the presence of assembly gaps in an interval on its probability of being called a hotspot. Second, our method to call hotspots could be too liberal. Indeed, a more stringent threshold (FDR=0.1%) would lead to about 25,000 hotspots, which would be similar to what is found in humans. Third, selection has been shown to impact hotspots discoveries although not with the methods we used here (Chan, Jenkins, and Song 2012). Finally, there exists the possibility that historically sheep exhibits more recombination hotspots than humans. In any case, the strong association between meiotic recombination rate and density in historical hotspots showed that our historical recombination maps were generally accurate. We tried to find enrichment in sequence motifs in the detected hotspots or specify their position relative to TSS (data not shown), but with no success mainly due to (i) the relative large hotspots intervals (about 5Kb) compared to typical hotspot motifs and (ii) the quality of the sheep genome assembly which still contains many small gaps that make such analyses difficult. Ultimately these questions would need an improved genome assembly and better resolution of crossover hotspots which should be addressed in the future from LD-based studies on resequencing data.
We combined, using a formal statistical approach, meiotic- and LD-based recombination rate estimates. Using an approach conceptually similar to that of (O’Reilly, Birney, and Balding 2008) led us to assess the impact of selection events on the sheep genome, in particular suggesting the possibility of an effect of diversifying selection at olfactory receptors genes. Based on this comparison, the correlation between historical and meiotic recombination rates was found to be high (r 0.7), but less than could be expected from previous results in humans, where the correlation was 97% on 5 Mb (S. Myers et al. 2006). However, it was closer to that of worms, mice or Drosophila, 69%, 47% and 50% respectively (Rockman and Kruglyak 2009, Brunschwig et al. 2012, Chan et al. 2012). Again, more precise estimates of both meiotic and historical recombination rates could change this number but other causes can be put forward.
A first explanation could come from the fact that the model we used to estimate historical recombination rates is based on the assumption of a constant effective population size, both in the past and along the genome. To allow for varying population size along the genome, we estimated the model in 2Mb intervals but there is still the possibility that varying population size in the past affect our historical recombination rate estimates, as the method has been shown to be somewhat influenced by demography although the identification of crossover hotspots much less so (N. Li and Stephens 2003). Also, as already mentioned above, selection has been shown to have substantial impact on estimation of recombination rates with other approaches (Chan, Jenkins, and Song 2012) although it has not been evaluated for the Li and Stephens model to our knowledge.
Second, our meiotic recombination maps are based on male meioses only while historical recombination rates are averaged over both male and female meioses. The fact that male and female recombination differ substantially, in particular in sheep (Susan E. Johnston et al. 2016), could also explain this relatively lower correlation.
Third, it is also possible that selective pressure due to domestication and later artificial breeding had the impact of modifying extensively LD patterns on the sheep genome, degrading the correlation between the two approaches. Indeed, the historical recombination estimates summarize ancestral recombinations that took place in the past and it is possible that recombination hotspots that were present in an ancestral sheep population are not longer active in today’s Lacaune individuals. This could arise, for example, if domestication led to a reduction in the diversity of hotspots defining genes, such as PRDM9, and hence a reduction in the number of motifs underlying hotspots which would in turn change the distribution of recombination on the genome. This has been shown for example in Humans were patterns of recombination differ between populations due to their different diversity at PRDM9 (Berg et al. 2010, 2011; Baudat et al. 2010). Eventually, such a phenomenon would degrade the correlation between present day recombination (measured by the meiotic recombination rates) and past recombination (measured by historical recombination rates). Further studies on the determinism of hotspots in the sheep, its related genetic factors and their diversity would be needed to elucidate this question.
Despite these different effects, the substantial correlation between meiotic and historical recombination rates motivates the creation of scaled recombination maps that can be useful for interpreting statistical analysis of genomic data. As an illustration of the importance of fine-scale recombination maps for genetic studies, we found an interesting example in a recent study on adaptation of sheep and goat (Kim et al. 2016). In this study, a common signal of selection was found using the iHS statistic (Voight et al. 2006) in these two species (Figure 5 in (Kim et al. 2016)). This signature matches precisely the low recombining regions we identified on chromosome 10. However, the iHS statistic has been shown to be strongly influenced by variation in recombination rates, and in particular to tend to detect low recombining regions as selection signatures (Ferrer-Admetlla et al. 2014; O’Reilly, Birney, and Balding 2008). Precise genetic maps such as the one we provide in this work could thus help in annotating and interpreting such selection signals.
Determinism of Recombination Rate in sheep populations
As mentioned in the introduction two phenotypes have been studies with respect to the recombination process, but only one was studied here, Genome-wide recombination rate (GRR). We found that our data was not sufficient to study the Individual Hotspot Usage, which requires either a larger number of meioses per individual (Ma et al. 2015; Kadri et al. 2016a; Sandor et al. 2012a) or denser genotyping in families (Coop et al. 2008).
Our approach to study the genetic determinism of GRR in the Lacaune population was first to estimate its heritability, using a classical analysis in a large pedigree. This analysis also allowed to extract additive genetic values (EBVs) for the trait in 345 male parents, which we used for a GWAS in a second step. The EBVs are by definition, only determined by genetic factors, as environmental effects on GRR are averaged out. Indeed, we found that the proportion of variance in EBVs explained by genetic factors in the GWAS was essentially one. A consequence was that, although this sample size could be deemed low in current standards, the power of our GWAS was greatly increased by the high precision on the phenotype. We estimated the heritability of GRR at 0.23, which was similar to estimates from studies on the same phenotype in ruminants (e.g. 0.22 in cattle (Sandor et al. 2012a) or 0.12 in male Soay sheep (Susan E. Johnston et al. 2016), but see below for a discussion on the comparison with Soay sheep). We had little information on the environmental factors that could influence recombination rate, but did find a suggestive effect of the month of insemination on GRR, especially we found increased GRR at the month of May. Confirmation and biological interpretation of this result would need dedicated studies, but it was consistent with the fact that fresh (i.e. not frozen) semen is used for insemination in sheep and that the reproduction of this species is seasonal (Rosa and Bryant 2003).
The genetic determinism of GRR discovered in our study closely resembles what has been found in previous studies, especially in mammals. Two major loci and two suggestive ones affected recombination rate in Lacaune. The two main QTLs are common to cattle and Soay sheep. The underlying genes and mutations for these two QTLs are not yet resolved but the fact that the two regions harbour interacting genes (RNF212 and HEI10 (Qiao et al. 2014; Rao et al. 2016)) involved in the maturation of crossovers, make these two genes likely functional candidates. Indeed, these two genes were identified as potential candidates underlying QTLs for GRR in mice (R. J. Wang and Payseur 2017). The third gene identified here, KCNJ15, is a novel candidate, and its role and mechanism of action in the repair of DSBs needs to be confirmed and elucidated. Interestingly, these three genes are linked to the reparation of DSBs and crossover maturation processes. Finally, the fourth candidate FSHR has well documented effects on gametogenesis but has not been linked to recombination previously.
In our study, sixty percent of the additive genetic variance in GRR remained unexplained by large effect QTLs and were due to polygenic effects. This could be interpreted in the light of recent evidence that has shown that other mechanisms, involved in chromosome conformation during meiosis, explain a substantial part of the variation in recombination rate between mouse strains (Baier et al. 2014) and bovids (Ruiz-Herrera et al. 2017). Furthermore the variations at the major mammal recombination loci (RNF212, CPLX1, REC8 or the Human inversion 17q21.31) explain only 3 to 11% (Ritz, Noor, and Singh 2017) of the phenotypic variance among individuals. Elucidating the genetic determinism of these different processes would thus require much larger sample sizes or different experimental approaches (Baier et al. 2014; Ruiz-Herrera et al. 2017).
The combination of datasets from the Lacaune population and one from the recent study of recombination in Soay sheep (Susan E. Johnston et al. 2016) allowed to study the evolution of recombination at relatively short time scales. One of the most striking difference between our two studies is that the two QTLs that were detected in common had no effect in Soay males, whereas they had strong effects in Lacaune males. However, the two populations had very similar polygenic heritability: accounting for the fact that the Lacaune QTLs explain about 40% of the additive genetic variance, we could estimate the polygenic additive genetic variance in Lacaune males at 0.16, very similar to the 0.12 found in Soay males. Combined with our results that the two populations exhibit very similar male recombination maps, both in terms of intensity and genome distribution, the combination of the two studies shows that recombination patterns are conserved between populations under distinct genetic determinism, highlighting the robustness of mechanisms that drive them. Further work is needed to get a more detailed picture of the genetic control of recombination in sheep and will likely require combining multiple inferences from genetics, cytogenetics, molecular biology and bioinformatics analyses.
Acknowledgments
Institut de l’Elevage (JM. Astruc) and breed organizations (Ovitest and Confédération de Roquefort) provided the SNP genotypes and pedigree information. This work was partially funded by the BoDeliRe grant of the INRA Selgen Metaprogram and by Région Midi-Pyrénées. We are thankful to Tom Druet, Laurent Duret, Alain Pinton and Pierre Sourdille for their helpful comments on the manuscript.