PT - JOURNAL ARTICLE AU - Marcin J. Skwark AU - Nicholas J Croucher AU - Santeri Puranen AU - Claire Chewapreecha AU - Maiju Pesonen AU - Ying ying Xu AU - Paul Turner AU - Simon R. Harris AU - Julian Parkhill AU - Stephen D. Bentley AU - Erik Aurell AU - Jukka Corander TI - Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis AID - 10.1101/071696 DP - 2016 Jan 01 TA - bioRxiv PG - 071696 4099 - http://biorxiv.org/content/early/2016/08/25/071696.short 4100 - http://biorxiv.org/content/early/2016/08/25/071696.full AB - Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. The major human pathogen Streptococcus pneumoniae represents the first bacterial organism for which densely enough sampled population data became available for such an analysis. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. Genome data from over three thousand pneumococcal isolates identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. These results have the potential both to identify previously unsuspected protein-protein interactions, as well as genes making independent contributions to the same phenotype. This approach greatly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for experimental work.Author Summary Epistatic interactions between polymorphisms in DNA are recognized as important drivers of evolution in numerous organisms. Study of epistasis in bacteria has been hampered by the lack of both densely sampled population genomic data, suitable statistical models and powerful inference algorithms for extremely high-dimensional parameter spaces. We introduce the first model-based method for genome-wide epistasis analysis and use the largest available bacterial population genome data set on Streptococcus pneumoniae (the pneumococcus) to demonstrate its potential for biological discovery. Our approach reveals interacting networks of resistance, virulence and core machinery genes in the pneumococcus, which highlights putative candidates for novel drug targets. Our method significantly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for experimental work.