Abstract
Populations of Streptococcus pneumoniae are typically structured into groups of closely related organisms or lineages. Here, we employ a machine learning technique to try and tease out whether these lineages are maintained by selection or by neutral processes. Our results indicate that lineages of S. pneumoniae evolved through selection on the groESL operon, an essential component of its survival machinery. This operon contains genes which encode chaperone proteins that enable a very large range of proteins to fold correctly within the physical environment of the nasopharynx and therefore will be in strong epistasis with several other genes. These features of groESL would explain why lineage structure is so stable within S. pneumoniae despite high levels of horizontal genetic transfer. S. pneumoniae is also antigenically diverse, exhibiting a variety of distinct capsular serotypes. We show that associations may arise between lineage and capsular serotype due to immune selection and direct resource competition but these can be more easily perturbed in the presence of external pressures such as vaccination. Overall, our analyses indicate that the evolution of S. pneumoniae can be conceptualized as the rearrangement of modular functional units occurring on several different timescales under different selection pressures: some patterns have locked in early (such as the epistatic interactions between groESL and a constellation of other genes) and preserve the differentiation of lineages, while others (such as the associations between capsular serotype and lineage) remain in continuous flux.