RT Journal Article SR Electronic T1 Strong spurious transcription likely a cause of DNA insert bias in typical metagenomic clone libraries JF bioRxiv FD Cold Spring Harbor Laboratory SP 013763 DO 10.1101/013763 A1 Kathy N. Lam A1 Trevor C. Charles YR 2015 UL http://biorxiv.org/content/early/2015/01/15/013763.abstract AB Background Clone libraries provide researchers with a powerful resource with which to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, as well as allowed the mining of novel enzymes for specific functions of interest. These libraries are often constructed by cloning large-inserts (∼30 kb) into a cosmid or fosmid vector. Recently, there have been reports of GC bias in fosmid metagenomic clone libraries, and it was speculated that the bias may be a result of fragmentation and loss of AT-rich sequences during the cloning process. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role in library bias.Results To explore the possible mechanisms responsible for sequence bias in clone libraries, and in particular whether fragmentation is involved, we constructed a cosmid clone library from a human microbiome sample, and sequenced DNA from three different steps of the library construction process: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final constructed cosmid library, and we provide strong evidence that the sequence bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after the DNA is introduced into E. coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for consensus promoters and found that rpoD/σ70 promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the reference genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found that the bias appears to be more closely correlated with the number of rpoD/σ70 consensus sequences in the genome than with simple GC content.Conclusions The GC bias of metagenomic clone libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter consensus sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ70 consensus-like in E. coli may lead to plasmid instability or loss of insert DNA. Our results suggest that despite widespread use of E. coli to propagate foreign DNA, the effects of in vivo transcriptional activity may be under-appreciated. Further work is required to tease apart the effects of transcription from those of gene product toxicity.CE,crude extract;SS,size-selected;CL,cosmid library;F,forward reads;R,reverse reads.