Abstract
A major research goal in evolutionary genetics is to uncover loci experiencing adaptation from genomic sequence data. One approach relies on finding ‘selective sweep’ patterns, where segregating adaptive alleles reduce diversity at linked neutral loci. Recent years have seen an expansion in modelling cases of ‘soft’ sweeps, where the common ancestor of derived variants predates the onset of selection. Yet existing theory assumes that populations are entirely outcrossing, and dominance does not affect sweeps. Here, we develop a model of selective sweeps that considers arbitrary dominance and non-random mating via self-fertilisation. We investigate how these factors, as well as the starting frequency of the derived allele, affect average pairwise diversity, the number of segregating sites, and the site frequency spectrum. With increased self-fertilisation, signatures of both hard and soft sweeps are maintained over a longer map distance, due to a reduced effective recombination rate and faster fixation times of adaptive variants. We also demonstrate that sweeps from standing variation can produce diversity patterns equivalent to hard sweeps. Dominance can affect sweep patterns in outcrossing populations arising from either a single novel mutation, or from recurrent mutation. It has little effect where there is either increased selfing or the derived variant arises from standing variation, since dominance only weakly affects the underlying adaptive allele trajectory. Different dominance values also alters the distribution of singletons (derived alleles present in one sample). We apply models to a sweep signature at the SLC24A5 gene in European humans, demonstrating that it is most consistent with an additive hard sweep. These analyses highlight similarities between certain hard and soft sweep cases, and suggest ways of how to best differentiate between related scenarios. In addition, self-fertilising species can provide clearer signals of soft sweeps than outcrossers, as they are spread out over longer regions of the genome.
Author Summary Populations adapt by fixing beneficial mutations. As a mutation spreads, it drags linked neutral variation to fixation, reducing diversity around adaptive genes. This footprint is known as a ‘selective sweep’. Adaptive variants can appear either from a new mutation onto a single genotype; from recurrent mutation onto different genotypes; or from existing genetic variation. Each of these sources leaves subtly different selective sweep patterns in genetic data, which have been explored under simple biological cases. We present a general model of selective sweeps that includes self-fertilisation (where individuals produce both male and female gametes to fertilise one another), and dominance (where fitness differences exist between one and two gene copies within an individual). Soft sweep patterns are spread out over longer genetic regions in self-fertilising individuals, while dominance mainly affects sweeps in outcrossers from either a single or recurrent mutation. Applying models to a sweep signal associated with human skin pigmentation shows that this mutation was likely introduced into Eurasia from Africa in very few numbers. These models demonstrate to what extent soft sweeps can be detected in genome data, and how self-fertilising organisms can be good study systems for determining the extent of different adaptive modes.