TY - JOUR T1 - Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course JF - bioRxiv DO - 10.1101/058289 SP - 058289 AU - Nicolas Tromas AU - Nathalie Reik AU - Larbi Bedrani AU - Yves Terrat AU - Pedro Cardoso AU - David Bird AU - Charles W. Greer AU - B. Jesse Shapiro Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/06/10/058289.abstract N2 - Cyanobacterial blooms occur in lakes worldwide, producing toxins that pose a serious public health threat. Eutrophication caused by human activities and warmer temperatures both contribute to blooms, but it is still difficult to predict precisely when and where blooms will occur. One reason that prediction is so difficult is that blooms can be caused by different species or genera of cyanobacteria, which may interact with other bacteria and respond to a variety of environmental cues. Here we used a deep 16S amplicon sequencing approach to profile the bacterial community in eutrophic Lake Champlain over time, to characterize the composition and repeatability of cyanobacterial blooms, and to determine the potential for blooms to be predicted based on time-course sequence data. Our analysis, based on 143 samples between 2006 and 2013, spans multiple bloom events. We found that the microbial community varies substantially over months and seasons, while remaining stable from year to year. Bloom events significantly alter the bacterial community but do not reduce overall diversity, suggesting that a distinct microbial community – including non-cyanobacteria – prospers during the bloom. Blooms tend to be dominated by one or two genera of cyanobacteria: Microcystis or Dolichospermum. Blooms are thus relatively repeatable at the genus level, but more unpredictable at finer taxonomic scales (97% operational taxonomic units; OTUs). We therefore used probabilistic assemblages of OTUs (rather than individual OTUs) to classify our samples into bloom or non-bloom bins, achieving up to 92% accuracy (86% after excluding cyanobacterial sequences). Finally, using symbolic regression, we were able to predict the start date of a bloom with 78-91% explained variance over tested data (depending on the data used for model training), and found that sequence data was a better predictor than environmental factors. ER -