Abstract
The vast biochemical repertoire found in microbial communities from a wide-range of environments allows screening and isolation of novel enzymes with improved catalytic features. In this sense, metagenomics approaches have been of high relevance for providing enzymes used in diverse industrial applications. For instance, glycosyl hydrolases, which catalyze the hydrolysis of carbohydrates to sugars, are essential for bioethanol production from renewable resources. In the current study, we have focused on the prospection of protease and glycosyl hydrolase activities from microbial communities inhabiting a soil sample by using the lacZα-based plasmid pSEVA232 in the generation of a screenable metagenomic library. For this, we used a functional screen based on skimmed milk agar and a pH indicator dye as previously reported in literature. Although we effectively identified nine positive clones in the screenings, subsequent experiments revealed that this phenotype was not because of the hydrolytic activity encoded in the metagenomic fragments, but rather due to the insertion of small metagenomic DNA fragments in frame within the coding region of the lacZα alpha gene present in the original vector. We concluded that the current method has a higher tendency for false positive recovery of clones, when used in combination with a lacZα-based vector. Finally, we discuss the molecular explanation for positive phenotype recovering and highlight the importance of reporting boundaries in metagenomic screenings methodologies.
Introduction
Renewable resources, such as plant biomass (essentially lignocellulose), have a significant potential for the production of biofuels and other biotech-produced industrial chemicals due to their higher abundancy and lower price in comparison to other commercial substrates [1]. However, the physicochemical constraints placed on cellulose and hemicellulose polymers by lignin made the saccharification procedure expensive due to a lack of biocatalysts tolerant to process-specific parameters [2,3]. The notorious resilience of bacteria against environmental fluctuations and its inherent biochemical diversity permits screening and isolation of novel enzymes to help overcoming these barriers. Thus, there is a huge amount of gene resources held within the genomes of uncultured microorganisms, and metagenomics is one of the key technologies used to access and explore this potential [4–6].
Functional metagenomics aims to recover genes encoding proteins with a valuable biochemical function [5–7]. For instance, genes considered of interest are enzymes, proteins conferring resistance to diverse physical or chemical stressors, genes coding for catabolic pathways or involved in the production of bioactive compounds, to cite some [47]. The functional metagenomic approach presents two different strategies for libraries generation. Primarily, large-insert libraries, constructed in cosmids or fosmids, allow for the stable recovery of large DNA fragments and sequence homology screening purposes [8]. This strategy would also allow the recovery of complete biosynthetic pathways or the functional expression of large multi-enzyme assemblies (as in the case of polyketide synthases or hydrogenases clusters) [9,10]. On the other hand, small-insert expression libraries (i.e. lambda phage vectors and plasmids), are constructed for activity screening from single genes or small operons [8]. In this strategy, strong vector expression signals (e.g. promoter and ribosome binding site) are used to guarantee that small DNA fragments (2-10 kb) cloned in the vector reach a good chance of being expressed and detected by activity screens [10,11]. At this point, it is of particular relevance mentioning that lacZα-based vectors are frequently used in different screenings, with high prevalence in small-insert expression metagenomic libraries [12–17]. In this sense, the blue/white screening inherent of α-based vectors is one of the most common molecular techniques that allows detecting the successful ligation and subsequent expression of the gene of interest in a vector [18–20].
Metagenomics strategies have been of high relevance for providing enzymes used in manufacturing applications [5,7,21]. The use of enzymes in industry has grown considerably, and a number of different categories of enzymes has been used in a wide variety of applications [22]. For example, proteases have been used in detergents, in pharmaceutical and chemical synthesis industries to degrade proteins into amino acids [23]. Glycosyl hydrolases, which catalyse the hydrolysis of carbohydrates to sugars, have been applied to many processes further than bioethanol production (i.e. cellulose and hemicellulose conversion to fermentable sugars), being highly relevant in the textile, paper and food production industries [24].
Studies found in the literature have reported that both enzymatic activities (protease and glycosyl hydrolase) could be found in a single pH based assay using SMA [25,26]. Authors stated that the use of pH indicators dyes such as phenol red or bromophenol blue increases the sensitivity of the assay allowing detection of the acidic shift during hydrolysis of lactose by glycosyl hydrolases (detected as a yellow halo) or casein by proteases (visualized as clear halos) [25,26]. Then, subsequent experiments should be done in order to identify the specific enzymatic activity of the recovered clones [25]. Therefore, in the current study we were interested in obtaining protease and glycosyl hydrolase activities from the microbial community’s inhabitant of a soil sample of a Secondary Atlantic Rain Forest (L. de F. Alves, unpublished results). For this, we have implemented a metagenomic approach using a functional screen based on skimmed milk agar (SMA) and a pH indicator dye (Figure 1A). The metagenomic library was constructed in Escherichia coli as a host using the broad host-range vector pSEVA232, which is lacZα-based plasmid [27] (Figure 1B).
By implementing the SMA-phenol red (SMA-PR) screening approach, we effectively obtained nine clones that were able to generate the typical yellow halos indicative of glycosyl hydrolase (GH) production - although no clear halos, indicative of protease activity, were obtained. However, subsequent experiments revealed that this phenotype was not because of exogenous genes providing hydrolytic activity in these clones. Unexpectedly, restriction profile analyses and sequencing of metagenomic inserts showed that the metagenomic fragments were too small for encoding enzymes able to display activity. Further analyses showed that the metagenomic DNA fragments were inserted in frame with the coding region of the lacZ gene present in the original vector (α peptide of the β-galactosidase). We concluded that the current SMA-PR method to obtain proteases and GHs have a higher tendency for false positive clones’ recovery, when used in combination with a lacZα-based vector. As these vectors are massively used in screenings of small-insert expression libraries, a robust strategy and previous experimental planning should be done to avoid finding and characterizing false positives clones.
Materials and Methods
Bacterial strains, plasmids and general growth conditions
E. coli DH10B (Invitrogen) cells were used for cloning, metagenomic library construction and experimental procedures. E. coli cells were routinely grown at 37°C in Luria-Broth medium [20]. When required, kanamycin (50 μg/mL) was added to the medium to ensure plasmid retention. Transformed bacteria were recovered on LB (Luria–Bertani) liquid medium for 1 hour at 37°C and 180 r.p.m, followed by plating on LB-agar plates at 37°C for at least 18 hours. Plasmids used in the present study were pSEVA232, pSEVA242 [27] and pSEVA242 bearing a 1.5 Kb insert (this study), corresponding the endoglucanase cel5A gene from Bacillus subtilis 168 [28].
Nucleic acid techniques
DNA preparation, digestion with restriction enzymes, analysis by agarose gel electrophoresis, isolation of DNA fragments, ligations, and transformations were done by standard procedures [20]. Plasmid DNA was sequenced on both strands using the ABI PRISM Dye Terminator Cycle Sequencing Ready Reaction kit (PerkinElmer) and an ABI PRISM 377 sequencer (Perkin-Elmer) according to the manufacturer’s instructions.
Screening of GH and protease activities
The metagenomic library used in this study (named LFA-USP3) was generated previously (L. de F. Alves, unpublished results) from a Secondary Atlantic Forest soil sample collected at the University of Sao Paulo, Ribeirão Preto, Brazil (21°09′58.4″S, 47°51′20.1″W). The library was constructed from a microbial community of a soil bearing specific tree litter composition (Phytolacca dioica). Metagenomic DNA was cloned into the pSEVA232 vector, a plasmid able to replicate in different gram-negative bacteria, due to its origin of replication [27]. The metagenomic library LFA-USP3 presented about 257 Mb of environmental DNA distributed into approximately 63.000 clones harbouring insert fragments size ranging from 1.5 Kb to 7.5 Kb, with an average size of 4.1 Kb. Briefly, the total microbial DNA was isolated using the UltraClean Soil DNA Isolation Kit according to the manufacturer’s recommendations. The metagenomic DNA was partially digested using Sau3AI and fragments ranging from 2 to 7 kb were directly selected from an agarose gel and purified for cloning into pSEVA232 vector previously digested with BamHI and dephosphorylated. The resulting plasmids were transformed into E. coli DH10B cells by electroporation and the cultures were grown in LB-agar plates containing kanamycin (50 μg/mL) for 18 h at 37°C, in order to amplify the library’s number of clones. Clones from the library were pooled together in LB media containing 20% (w/v) glycerol for storage at ‐80°C.
Screening of GH and protease activities was performed according to Jones and collaborators (2007). The library clones were grown in LB-agar plates containing 1% (w/v) skimmed milk, 0.25 mg/mL phenol red and kanamycin (50 μg/mL) for 24h at 37°C. Positive clones were identified due to the formation of a yellow halo around the colonies. Colonies surrounded by a yellow halo against a red background were identified as GH-positive clones and their plasmids were recovered and verified according their restriction patterns when digested using Ndel e HindIII. The restriction patterns were analysed in agarose gel 0.8% (w/v).
In silico analysis of DNA inserts and identified protein sequences
Putative ORFs from the small fragment sequences were identified using ORF Finder program, available online in (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). Comparisons between the insert amino acid sequences were performed against NCBI database using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) alignment. There-dimensional models of the chimeric LacZaα/metagenomic peptides (NS1-NS9) and α-peptide LacZ were obtained from the QUARK algorithm server (https://zhanglab.ccmb.med.umich.edu/QUARK/) and images were created with PyMOL (http://www.pymol.org/). Thermodynamic analysis of mRNA secondary structure from the different small DNA inserts were performed using the NUPACK algorithms (http://www.nupack.org/). The free energy of a given sequence in a given secondary structure was calculated using nearest-neighbor empirical parameters [29–31]. For each construct, folding energy of mRNA molecule was calculated from positions ‐4 to +70 nt relative to translation star of lacZ gene, considering previous data [32] and positions of the DNA inserts (new DNA sequences started at position +53 nt).
Results and discussion
Copy number of plasmids alters β-galactosidase expression and halo detection
Previously to the screening for enzymes in the selected SMA-PR media (Figure 1A), we carried out controls for testing the phenotype of clones carrying pSEVA232, the minimal and modular vector used in the construction of the metagenomic library (Figure 1B). For this, we streaked E. coli DH10B cultures carrying pSEVA232, pSEVA242, and pSEVA242 bearing a 1.5 Kb insert within the MCS (multiple cloning site) on SMA-PR plates to obtain single colonies. After incubation of the plates for 24 h at 37°C we observed yellow halos around colonies just as in the clones carrying pSEVA242. These results were expected since pSEVA242 is high copy number plasmid (Table 1), carrying the β-galactosidase α-fragment in its backbone [27], which guarantees the proper expression of the LacZα peptide and subsequent protein complementation. As the SMA-PR media contains lactose, its hydrolysis by LacZ produces an acidic shift detected as a yellow halo (Figure 1A).
As explained by Padmanabhan and collaborators (2016) [33], the molecular mechanism for blue/white screening (that is, recovering of functional β-galactosidase LacZ) is based on a genetic engineering of the lac operon in the E. coli chromosome (coding for the omega peptide with a N-terminal deletion) combined with a subunit complementation achieved with the cloning vector (coding for the α peptide). In this way, tetramerization to produce a functional LacZ enzyme is going to occur only if the α peptide, which correspond to the intact N-terminal portion of the omega peptide, is added in trans [34]. Thus, plasmid pSEVA242 encodes α peptide of LacZ protein, which bears an internal MCS, while the chromosome of the host strain (E. coli DH10B) encodes the remaining omega subunit to form a functional β-galactosidase enzyme upon complementation. On the other hand, plasmid pSEVA242 bearing a 1.5 Kb insert within the MCS of lacZα-gene did not produce a yellow halo, due to the α-fragment was disrupted. Finally, pSEVA232, although also being a lacZ α-based plasmid, carries a pBBR1 origin of replication, leading to a medium number of copies of plasmids per cell (Table 1), which does not allow enough expression of lacZ for proper phenotype production. This feature was essential for using the broad host-range pSEVA232 vector for library construction.
Screening for pro teases and glycosyl hydrolases in SMA-PR may lead to false positives
In order to search for genes coding for proteases and GHs, we screened a metagenomic library hosted in E. coli DH10B, which was previously generated in our laboratory (Figure 1A). The screenings were carried out in SMA-PR media, composed by LB-agar supplemented with kanamicine 50 pg/ml, skimmed milk and a pH indicator, the phenol red, that allows to distinguish between GHs (yellow halos) and proteases (clear halos) activities (Figure 1A). From around 70,000 clones screened, we recovered 280 potential positives clones for GHs, of which, just 9 maintained their phenotype when transferred to a new SMA-PR plate (i.e., colonies with yellow halos; Figure 1C). Re-transformed clones were tested to GH activity in SMA-PR plates and plasmids isolated from the colonies surrounded by yellow halos were digested with HindIII and Ndel enzymes, which revealed 6 recombinant plasmids with unique restriction patterns (Figure 2). Surprisingly, restriction profiles analyses and sequencing of metagenomic inserts showed that the metagenomic fragments were too small (between 42 and 173 bp) for encoding enzymes able to display activity (Figure 2, Table 2).
In silico analysis of the amino acid sequences (Figures 3 and 4) of the chimeric LacZα fragment/metagenomic peptides resulted from the DNA insertion showed that DNA were inserted in frame within the coding region of the lacZα gene present in the original vector. Figure 3 shows that complete (DNA inserts NS6, NS7 and NS9) and partial (DNA inserts NS1, NS2 and NS3) recovery of the LacZα-peptide were obtained after in frame DNA insertion. The N-terminal region of the chimeric α-fragment/metagenomic peptides were aligned with the LacZα-peptide looking for conserved amino acids along the N-terminal sequence, although not a clear tendency was observed (Figures 4). On the other hand, three-dimensional modelling analysis of the chimeric peptides in comparison with the original LacZα-peptide resulted in an overall structure maintenance that should assure the activity of the chimeric α peptide when is added in trans (Figure 5, Figure S1). Taken together, these results indicated that the positives clones were the result of the recovery of functional lacZα polypeptides, showing a strong limitation of the screening technique used.
Reduced free energy in mRNA secondary structure could explain increased expression levels in metagenomic clones
In the light of the evidences presented above, we hypothesized that the recovery of positive clones with very short DNA fragments should be due to the creation of functional lacZα fragments either more active than the original polypeptide or expressed in higher level. In order to elucidate the reason of having identified 9 clones (3 of them repetitive, stressing the intrinsic properties of the clones conducting to that phenotype) from around 70,000 clones screened, that were able to increase the expression of the lacZα-gene contained in pSEVA232, we examined in the literature for possible reasons. Preceding studies have shown that the thermodynamic stability of mRNA secondary structure near the start codon can regulate translation efficiency in E. coli and other organisms, and that translation is more efficient the less stable the secondary structure [32,38,39]. Although codon bias have been related to slowing ribosomal elongation during initiation and lead to increased translational efficiency [40–42], a recent systematic study using >14,000 synthetic reporters in E. coli demonstrated that reduced stability in RNA structure and not codon rarity itself is responsible for expression increases [38]. In this sense, the molecular mechanistic explanation is that tightly folded mRNA obstruct translation initiation, thereby reducing protein synthesis [43].
To get evidence supporting the hypothesis that recovering of the nine positive clones was due to higher expression levels of the chimeric lacZα-genes in respect to the original from pSEVA232 (with no phenotype in SMA-PR), we analyzed the local mRNA secondary structure of the different DNA inserts in comparison to the lacZα gene. Thus, for each construct (NS1-NS9 and lacZ without insert) we computed the predicted minimum free energy ΔG) associated with the secondary structure of its entire mRNA, or the 5′-end region of its mRNA (Table 2). The folding energy of the entire mRNA did not show to be reduced (Table 2). By contrast, the folding energy in position ‐4 to +70 nt relative to translation start showed that in all the new sequences originated by metagenomic DNA insertion the stability of the mRNA molecules was lower than the original, that is, with less negative ΔG values (Figure 6, Table 2). Kudla and collaborators (2009) obtained similar results in respect to the region used for free energy calculation. In this way, studies showed that the region of strongest correlation between folding energy and expression did not overlap with the Shine-Dalgarno sequence [43,44], but with the 30-nt ribosome binding site centered around the start codon [32]. Therefore, results obtained here should explain the identification of the nine clones as positives in the screenings. Consequently, our data are in accordance with previous studies, which demonstrate that reduced mRNAs stability near the translation-initiation site had increased protein expression [32,38,39].
Conclusions
In the present study, we have used a metagenomic functional approach intending to recover two different types of enzymes in a single assay (i.e., GHs and proteases) using a methodology previously described in the literature [25,26]. For this, we used vector pSEVA232 for library construction, since it displays unique features, such as being minimalist, synthetic, modular and broad host-range [27]. Plasmid pSEVA232 is a lacZα-based plasmid, as most of the plasmids used in small-insert metagenomic libraries [12–17]. After the screening in SMA-PR we successfully obtained nine clones showing the typical yellow halos indicative of GH production. However, all were false positive, since small DNA fragments were inserted in frame within the lacZα-gene present in the original vector. The possible explanation for the positive phenotype of these clones is that the new sequences generated by the metagenomic DNA insertions produced a less stable mRNA molecule at the 5′-end region, which was associated to positively influence protein expression [32,38,39].
Considering that activity-driven screenings (metagenomic or other type, such as in vitro evolution or rational design screens) are time-consuming and use a significant quantity of materials for plate’s preparation, we believe that methods having a higher tendency for false positive clones’ recovery should be avoided. Accordingly, the SMA-PR method when used in combination with lacZα-based vectors seems to be extremely inadequate, having high probabilities of obtaining false positive clones. In general, screening strategies using pH based assays, appears to be not sufficiently robust [26]. For instance, we found that microbial secretion, probably derivative from the bacterial metabolism (such as biogenic amines), might act as alkalinizing agents of the solid media, interfering with phenotype visualization [45,46]. We observed that just within a few hours after incubation at 37 °C, halos could turn from yellow to red, hindering clone recovery and reproducibility of the assays. Hence, before embarking on activity-based screenings assays comprising thousands or millions of clones, a robust and straightforward strategy should be planned to avoid finding and characterization of false positive clones. Finally, we strongly encourage the scientific community to report biases in highly accepted protocols, such being the case.
Supplementary materials
Acknowledgements
This work was supported by the National Counsel of Technological and Scientific Development (CNPq 472893/2013-0 and 441833/2014-4) and by the Sao Paulo State Foundation (FAPESP, grant number 2015/04309-1 and 2012/21922-8). LFA, TCB and CAW are beneficiaries of FAPESP fellowship (Numbers 2016/06323-4, 2016/06922-5, 2016/05472-6, respectively).