ABSTRACT
Background Sequencing-based analyses of low-biomass samples are known to be prone to misinterpretation due to the potential presence of contaminating molecules derived from laboratory reagents and environments. Due to its inherent instability, contamination with RNA is usually considered to be unlikely.
Results Here we report the presence of small RNA (sRNA) contaminants in widely used microRNA extraction kits and means for their depletion. Sequencing of sRNAs extracted from human plasma samples was performed and significant levels of non-human (exogenous) sequences were detected. The source of the most abundant of these sequences could be traced to the microRNA extraction columns by qPCR-based analysis of laboratory reagents. The presence of artefactual sequences originating from the confirmed contaminants were furthermore replicated in a range of published datasets. To avoid artefacts in future experiments, several protocols for the removal of the contaminants were elaborated, minimal amounts of starting material for artefact-free analyses were defined, and the reduction of contaminant levels for identification of bona fide sequences using ‘ultraclean’ extraction kits was confirmed.
Conclusion This is the first report of the presence of RNA molecules as contaminants in laboratory reagents. The described protocols should be applied in the future to avoid confounding sRNA studies.
BACKGROUND
The characterization of different classes of small RNAs (sRNAs) in tissues and bodily fluids holds great promise in understanding human physiology as well as in health-related applications. In blood plasma, microRNAs and other sRNAs are relatively stable, and microRNAs in particular are thought to reflect a system-wide state, making them potential biomarkers for a multitude of human diseases [1]. Different mechanisms of sRNA delivery as a means of long-distance intercellular communication have been recognized in several eukaryotes [2–7]. In addition, inter-individual, inter-species and even inter-kingdom communications via sRNAs have been proposed [8–12], and some cases of microRNA-based control by the host [13,14] or pathogens [15,16] have been demonstrated.
As exogenous RNAs have been detected in the blood plasma of humans and mice [17,18], the potential for exogenous RNA-based signalling in mammals is the subject of significant current debate [19,20]. Diet-derived exogenous microRNAs have been proposed to exert an influence on human physiology [21,22], as have bacterial RNAs, which can be secreted in the protective environment of outer membrane vesicles [23–25]. However, a heated discussion has at the same time been triggered around the genuineness of the observations of these exogenous sRNAs in human blood [26–28] and the possibility of dietary uptake of sRNAs [29–31]. This discussion happens at a time where DNA sequencing-based analyses of low-biomass samples have been recognized to be prone to confounding by contaminants [32]. From initial sample handling [33], to extraction kits [34], to sequencing reagents [35], multiple sources of DNA contamination and artefactual sequencing data have been described.
Here, we report the contamination of widely used silica-based columns for the isolation of micro- and other small RNAs with RNA, which was apparent from sRNA sequencing data and was subsequently validated by qPCR. These artefactual sRNA sequences were also apparent in numerous published datasets. Furthermore, approaches for the depletion of the contaminants from the columns as well as an evaluation of a newer ultra-clean kit are presented, along with the determination of a minimum safe input volume to suppress the signal of the contaminant sequences in RNA sequencing data of human blood plasma samples. The potential presence of bona fide exogenous sRNA species in human plasma is examined. Finally, recommendations for the control and interpretation of sRNA sequencing data from low-biomass samples are provided.
RESULTS
Initial detection of exogenous sRNAs in human blood plasma
sRNA was extracted from 100 μl blood plasma samples of ten healthy individuals and sequenced using regular RNeasy columns (workflow in Figure 1). The read profiles were mined for putative exogenous (non-human) sequences (Material and Methods). Among the potential exogenous sequences were 19 sequences that occurred with more than 1,000 counts per million (cpm) in all samples. To rule out sequencing errors or contamination during sequencing library preparation, a qPCR approach was developed to assess the presence of non-human sequences in the sRNA preparations from plasma. Six of the 19 highly abundant sRNA sequences from plasma that could not be mapped to the human genome were chosen for validation by qPCR (Table 1).
qPCR assays for putative exogenous sRNAs in human blood plasma
Synthetic sRNAs with the putative exogenous sequences found in plasma were poly-adenylated and reverse transcribed to yield cDNA, used for optimisation of PCR primers and conditions (Table 1). All primer sets yielded amplicons with single peaks in melting temperature analysis and efficiency values above 80 %. The optimised qPCR assays were then employed to test for the presence of the highly abundant sRNAs potentially representing exogenous sequences (workflow in Figure 1) in the human plasma samples used for the initial sequencing experiment. The qPCR assays confirmed the presence of these sRNAs in the sRNA preparations used for sequencing (Figure 2A), yielding amplicons with melting temperatures expected from the synthetic sRNAs. To rule out contamination of the water used in the sRNA preparations, a water control was also examined. No amplification was observed in all but one assay, where amplification of a product with a different melting temperature occurred (Figure 2A). Thus, for the assays, contamination of the water could be ruled out.
Non-human sequences derived from column contaminants
To analyse whether the validated non-human sequences occurring in the sRNA extracts of plasma were present in any lab wear, a series of control experiments were carried out (Additional Figure 1). When nucleic acid- and RNase-free water (QIAGEN) was used as input to the miRNeasy Serum/Plasma kit (QIAGEN) instead of plasma (“mock-extraction”), all tested non-human sequences could be amplified from the mock-extract (Figure 2B). This indicates that one of the components of the extraction kit or lab-ware was contaminated with the non-human sequences. To locate the source of contamination, mock-extractions were performed by omitting single steps of the RNA-isolation protocol except for the elution step. Amplification from the resulting mock-extracts was tested for the most abundant non-human sequence (sRNA 1). In all cases, the sRNA 1 could be amplified (data not shown). We therefore carried out a simple experiment, in which nucleic acid- and RNase-free water was passed through an otherwise untreated spin column. From this column eluate, all target sequences could be amplified, in contrast to the nucleic acid- and RNase-free water (Figure 2B). The most abundant non-human sequences in the plasma sequencing experiments were therefore most likely contaminants originating from the untreated RNeasy columns.
Detection of contaminant sequences in public datasets
To assess whether our observation of contaminant sRNAs was also pertinent in other sequencing datasets of low-input samples, the levels of confirmed contaminant sRNA sequences in published datasets [17,18,29,36–53] were assessed. Irrespective of the RNA isolation procedure applied, nontarget sequences were detected (making up between 5 and over 99 % of the sequencing libraries for the human samples; Additional Table 1). As shown in Figure 3, the six contaminant sequences which had been confirmed by qPCR were found in all analysed samples of low biomass samples which were extracted with regular miRNeasy kits, but the sequences were found at lower levels in studies with more biomass input [29,37,39] and hardly ever [40] in studies where samples were extracted using other methods (Additional Table 1). Within each study where the confirmed contaminant sequences were detected, the relative levels of the contaminant sequences were remarkably stable (Additional Figure 2).
Depletion of contaminants from isolation columns
In order to eliminate contamination from the columns to allow their use in studies of environmental samples or potential exogenous sRNAs from human samples, we were interested in the nature of these contaminants. The fact that they can be poly-adenylated by RNA-poly-A-polymerase points to them being RNA. Treatment of the eluate with RNase prior to cDNA preparation also abolished amplification (data not shown), but on-column DNase digest did not reduce their levels (Figure 2C). These findings suggest that the contaminants were RNAs.
Contaminating sequences could potentially be removed from the RNeasy columns using RNase, but as RNases are notoriously difficult to inactivate and RNases remaining on the column would be detrimental to sRNA recovery, an alternative means of removing RNA was deemed desirable. Loading and incubation of RNeasy columns with the oxidant sodium hypochlorite and subsequent washing with RNase-free water to remove traces of the oxidant reduced amplifyability of unwanted sRNA by at least 100 times (Figure 2D), while retaining the columns’ efficiency to isolate sRNAs from samples applied afterwards. Elimination of contaminant sRNAs from the RNeasy columns by washing with RNase-free water (Figure 2D; average +/− standard deviation of the contaminant reduction by 80 +/− 10 %) or treatment with sodium hydroxide (average +/− standard deviation of the contaminant reduction by 70 +/− 15 %) was not sufficient to remove the contaminants completely.
Ultra-clean extraction kits
Recently, RNeasy columns from an ultra-clean production have become available from QIAGEN within the miRNeasy Serum/Plasma Advanced Kit. We compared the levels of the previously analysed contaminant sequences in the flow-through of mock-extractions using 4 batches of ultraclean RNeasy columns to 2 batches of the regular columns by qPCR. In all cases, marked reductions in the contaminant levels were observed in the clean columns (Figure 4A; 4 to 4,000 fold; median 60). To obtain an overview over potential other contaminants, sRNA sequencing of the mock-extracts from these six batches of spin columns was performed. With regards to the six previously analysed contaminant sequences, the results were similar to those of the qPCR assays (Additional Figure 3). Additionally, for the ultra-clean RNeasy columns, a smaller spectrum of other potential contaminant sequences was observed (Figure 4B&C) and those sequences made up a smaller proportion of the eluate sequences (Figure 4D).
As our initial analyses of plasma samples extracted using regular RNeasy spin columns had revealed contaminant levels of up to 7000 cpm, we were interested to define a safe input amount for human plasma for both column types that would be sufficient to suppress the contaminant signals to below 100 cpm. For this, we performed a titration experiment (Additional Figure 3B), isolating sRNA from a series of different input volumes of the same human plasma sample on four batches of RNeasy columns (2 batches of regular columns, 2 batches of ultra-clean columns) with subsequent sequencing. As expected from reagent contaminants, the observed levels of the contaminant sequences were generally inversely dependent on the plasma input volume (Figure 5A). In addition and in accordance with the earlier mock-extraction results, the levels of contaminant sequences were lower or they were completely absent in the ultra-clean columns (see levels for 100 μl input in Figure 5B). An input volume of 100 μl plasma was sufficient to reduce all contaminant sequences to below 100 cpm when using the ultra-clean spin columns.
Potential plasma-derived exogenous RNAs
Finally, to detect potential exogenous sRNAs, we mined the plasma datasets used in the well-controlled titration experiment for sequences that do not originate from the human genome and were not detected in any of the mock-extracts. On average, 5 % of the sequencing reads of sRNA isolated from plasma did not map to the human genome. 127 sequences which did not map to the human genome assembly hg38 were detected in the majority of the plasma samples and were not represented in the control samples (empty libraries, column eluates or water). Out of these, 3 sequences had low complexity and 81 could be matched to sequences in the NCBI-nr that are not part of the current version of the human genome assembly (hg38) but annotated as human sequences or to sequences from other vertebrates. Of the 43 remaining sequences which matched to bacterial, fungal or plant sequences, 22 matched best to genera which have previously been identified as a source of contaminations of sequencing kits [35]. The remaining 21 sequences displayed very low (up to 47 cpm), yet consistent relative abundances in the 28 replicates of a plasma sample from the one healthy individual. Their potential origins were heterogeneous, including fungi and bacteria, with a notable enrichment in Lactobacillus sequences (Additional Table 2).
DISCUSSION
Several instances of contamination of laboratory reagents with DNA, which can confound the analysis of sequencing data, have been reported in recent years [32,35,54,55]. In contrast, the contamination of reagents with RNA has not yet been reported. Contamination with RNA is usually considered very unlikely, due to the ubiquitous presence of RNases in the environment and RNA’s lower chemical stability due to being prone to hydrolysis, especially at higher pH. However, our results suggest that the detected contaminants were not DNA, but RNA, because treatment with RNase and not DNase could decrease the contaminant load. In addition, the contaminating molecules could not be amplified without poly-adenylation and reverse-transcription. The stability of the contaminants is likely due to the extraction columns being RNase-free and their silica protecting loaded sRNAs from degradation. While the results presented here focused on one manufacturer’s spin column-based extraction kit, for which contaminants were validated, other RNA-stabilizing or extraction reagents may carry RNA contaminations. This is suggested by previously observed significant batch effects of sequencing data derived from samples extracted with a number of different extraction kits [27]. Based on the analysis of the published data sets, where significant numbers of sequences that did not map to the source organism’s genome were found independent of the RNA extraction kit used, the potential contaminants in other extraction kit would have different sequences than the ones confirmed by qPCR here.
The results presented here should help to assess the question whether exogenous sRNA species derived from oral intake [18] or the human microbiome [17,38,56] really occur frequently in human plasma or are merely artefacts [26]. While the limited data from this study (one healthy person) points to very low levels and a small spectrum of potential foreign sRNAs, properly controlled studies using laboratory materials without contaminants on individuals or animals with conditions that limit gastrointestinal barrier function will shed more light on this important research question in the future.
CONCLUSIONS
The reported contaminant sequences can confound studies of organisms whose transcriptomes contain sequences similar to the contaminants. They can also give rise to misinterpretation in studies without a priori knowledge of the present organisms as well as lead to the overestimation of miRNA yields in low-biomass samples. Therefore, based on the present study, care has to be taken when analysing low-input samples, in particular for surveys of environmental or otherwise undefined sources of RNAs. A number of recommendations can be conceived based on the presented data (Figure 6): Extraction columns should be obtained as clean as possible. Simple clean-up procedures can also reduce contaminants. The input mass of sRNA should be as high as possible, e.g. for human plasma volumes above 100 μl are preferable. Extraction controls should always be sequenced with the study samples. To facilitate library preparation for the extraction controls, spike-in RNAs with defined sequences can be used. They should be applied at concentrations similar to the levels of RNA found in the study samples. As the spike-in signal can drown out the contaminants, it is necessary to avoid too high concentrations for the spike-ins. Sequences found in the extraction controls should be treated as artefacts and removed from the sequencing data. Independent techniques that are more robust to low input material, such as qPCR or ddPCR, should be applied to both study samples and controls in case of doubt.
METHODS
Blood plasma sampling
Written informed consent was obtained from all blood donors. The sample collection and analysis was approved by the Comité d’Ethique de Recherche (CNER; Reference: 201110/05) and the National Commission for Data Protection in Luxembourg. Blood was collected by venepuncture into EDTA-treated tubes. Plasma was prepared immediately after blood collection by centrifugation (10 min at 1,000 × g) and platelets were depleted by a second centrifugation step (5 min at 10,000 × g). The blood plasma was flash-frozen in liquid nitrogen and stored at −80 °C until extraction.
Use of sRNA isolation columns
Unless stated otherwise, 100 μl blood plasma was lysed using the QIAzol (QIAGEN) lysis reagent prior to binding to the column, as recommended by the manufacturer. RNeasy MinElute spin columns from the miRNeasy Serum/Plasma Kit (QIAGEN) were then loaded, washed and dried, and RNA was eluted as recommended by the manufacturer’s manual. We further tested four batches of ultra-clean RNeasy MinElute columns, which underwent an ultra-clean production process (UCP) to remove potential nucleic acid contaminations, including environmental sRNAs. These columns were treated as recommended in the manual of the miRNeasy Serum/Plasma Advanced Kit (QIAGEN). All eluates were stored at −80 °C until analysis.
For the mock-extractions, ultra-clean or regular RNeasy columns were loaded with the aqueous phase from a QIAzol extraction of nucleic acid- and RNase-free water (QIAGEN) instead of plasma. For mock-extractions with a defined spike-in, the aqueous phase was spiked with synthetic hsa-miR-486-3p RNA (Eurogentec) to yield 40,000 copies per μl eluate. To obtain column eluates, spin columns were not loaded, washed or dried. Instead, 14 μl of RNase-free water (QIAGEN) was applied directly to a new column and centrifuged for 1 min.
To eliminate environmental sRNAs from the regular RNeasy columns, the columns were incubated with 500 μl of a sodium hypochlorite solution (Sigma; diluted in nuclease free water (Invitrogen) to approx. 0.5 %) for 10 min at room temperature. Columns were subsequently washed 10 times with 500 μl nuclease free water (Invitrogen), before use. Similarly, in the attempt to remove sRNAs by application of sodium hydroxide, 500 μl 50 mM NaOH were incubated on the spin columns for 5 min, followed by incubation with 50 mM HCl for 5 min, prior to washing the columns 10 times with 500 μl nuclease-free water (Invitrogen) before use.
Real-time PCR
5 μl of eluted RNA was polyadenylated and reverse-transcribed to cDNA using the qScript microRNA cDNA Synthesis Kit (Quanta BIOSCIENCES). 1 μl of cDNA (except for the initial plasma experiment, where 0.2 μl cDNA were used) was amplified by use of sequence-specific forward primers (see Table 1, obtained from Eurogentec) or the miR486-5p specific assay from Quanta BIOSCIENCES, PerfeCTa Universal PCR Primer and PerfeCTa SYBR Green SuperMix (Quanta BIOSCIENCES) in a total reaction volume of 10 μl. Primers were added at a final concentration of 0.2 μM. Primer design and amplification settings were optimised with respect to reaction efficiency and specificity. Efficiency was calculated using a dilution series covering seven orders of magnitude of template cDNA reverse transcribed from synthetic sRNA. Real-time PCR was performed on a LightCycler® 480 Real-Time PCR System (Roche) including denaturation at 95 °C for 2 min and 40 cycles of 95 °C for 5 sec, 54–60 °C for 15 sec (for annealing temperatures see Table 1), and 72 °C for 15 sec. All reactions were carried out in duplicates. No-template-controls were performed analogously with water as input. Cp values were obtained using the second derivative procedure provided by the LightCycler® 480 Software, Version 1.5. Cp data were analysed using the comparative CT method (ΔΔCT).
sRNA seq: library preparation and sequencing
sRNA libraries were made using the TruSeq small RNA library preparation kit (Illumina) according to the manufacturer’s instructions, except that the 3’ and 5’ adapters were diluted 1:3 before use. PCR-amplified libraries were size selected using a PippinHT instrument (Sage Science), collecting the range of 121–163 bp. Completed, size-selected libraries were run on a High Sensitivity DNA chip on a 2100 Bioanalyzer (Agilent) to assess library quality. Concentration was determined by qPCR using the NEBNext Library Quant kit (NEB). Libraries were pooled, diluted and sequenced with 75 cycle single-end reads on a NextSeq 500 (Illumina) according the manufacturer’s instructions. The sequencing reads can be accessed at NCBI’s short read archive via PRJNA419919 (for sample identifiers and accessions see Additional Table 1).
Initial analysis: plasma-derived sRNA sequencing data
For the initial analysis of plasma-derived sRNA sequencing data, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) was used to determine over-represented primer and adapter sequences, which were subsequently removed using cutadapt (http://dx.doi.org/10.14806/ej.17.1.200). This step was repeated recursively until no over-represented primer or adapter sequences were detected. 5’-Ns were removed using fastx_clipper of the FASTX-toolkit. Trimmed reads were quality-filtered using fastq_quality_filter of the FASTX-toolkit (with −q 30 −p 90; http://hannonlab.cshl.edu/fastx_toolkit). Finally, identical reads were collapsed, retaining the read abundance information using fastx_collapser of the FASTX-toolkit. The collapsed reads were mapped against the human genome (GRCh37), including RefSeq exon junction sequences, as well as prokaryotic, viral, fungal, plant and animal genomes from Genbank [57] and the Human Microbiome Project [58] using Novoalign V2.08.02 (http://www.novocraft.com; Additional Table 3). These organisms were selected based on the presence in the human microbiome, human nutrition and the public availability of the genomes. As reads were commonly mapping to genomic sequences of multiple organisms, and random alignment can easily occur between short sequences and reference genomes, the following approach was taken to refine their taxonomic classification: First, reads were attributed to the human genome if they mapped to it. Secondly, reads mapping to each reference genome was compared to mapping of a shuffled decoy read set. Based on this, the list of reference genomes was limited to the genomes recruiting at least one read with a minimum length of 25 nt. Loci on non-human genomes were established by the position of the mapping reads. The number of mapping reads per locus was adjusted using a previously established cross-mapping correction [59]. Finally, the sequences of the loci, the number of mapping reads and their potential taxonomy were extracted.
sRNA sequence analysis of controls
For the subsequent analysis of the mock-extractions, column eluates and nucleic acid- and RNase-free water, and no-template controls as well as human plasma samples, extracted using either regular or ultra-clean RNeasy columns, the trimming and quality check of the reads was done analogously to the description above. Collapsed reads were mapped against the most recent version of the human genome (hg38) either to remove operator-derived sequences or to distinguish the reads mapping to the human genome in the different datasets. Sequencing was performed in two batches, with one batch filling an entire flow cell, and one mixed with other samples. The latter batch of samples was sequenced on the same flow cell as sRNAs extracted from Salmonella typhimurium LT2. To avoid misinterpretations due to multiplexing errors, reads mapping to Salmonella typhimurium LT2 [60] (Genbank accession AE006468) were additionally removed in this batch. To limit the analysis to only frequently occurring sequences and therefore avoid over-interpretation of erroneous sequences, only read sequences that were found at least 30 times in all analysed samples together were retained for further analysis. Public sRNA datasets of low-input samples (see Additional Table 1) were analysed in a fashion analogous to the study’s control and plasma samples. As the published studies consisted of different numbers of samples, no overall threshold was imposed, but to limit the analysis to frequently occurring sequences, singleton reads were removed.
To compare the sequencing results to the qPCR-based results and to detect the same sequences in public datasets, reads matching the sequences assayed by qPCR were determined by clustering the trimmed, filtered and collapsed sRNA reads with 100 % sequence identity and 14 nt alignment length with the primer sequences, while allowing the sRNA reads to be longer than the primer sequences, using CD-HIT-EST-2D (parameters -c 1 -n 8 -G 0 -A 14 -S2 40 -g 1 -r 0) [61].
To compare the diversity and levels of putative contaminant sequences in the different samples, identical reads derived from all study samples (that did not map to the human genome) were clustered using CD-HIT-EST [61], and a table with the number of reads sequenced for each sample per sequence was created using R v.3.0.2. This table was also used to extract candidate sequences from the study plasma samples that are likely exogenous plasma sRNAs, based on the following criteria: for a sequence to be considered a potential exogenous plasma sRNA, it had to be non-identical to any of the sequences assigned to the confirmed contaminant sequences (Table 1), and it had to be absent from at least 90 % of the controls (no-library controls, water and spike-in controls, eluates and mock-extracts) and never detected in any of these controls with at least 10 copy numbers, and it had to be detected by more than 3 reads in more than 7 of the 28 libraries generated from the plasma titration experiment. These thresholds were chosen in order to make the analysis robust against multiplexing errors (e.g. which would result in false-negative identifications if a sequence that is very dominant in a plasma sample is falsely assigned to the control-samples), while at the same time making it sensitive to low-abundant sequences (which would not be detected in every library). To confirm the non-human origin and find potential microbial taxa of origin for these sequences, they were subsequently searched within the NCBI nr database using megablast and blastn web tools, with parameters auto-set for short inputs [62–64]. All sequences with best hits to human sequences or other vertebrates were removed, because they were potentially human. The remaining sequences were matched against a set of genera previously reported [35] to be common sequencing kit contaminants. Sequences with better hits to non-contaminant taxa than contaminant taxa were kept as potential exogenous sequences.
Additional Files
The following Additional Files are available online: Additional Figures 1–3; Additional Table 1: list of the generated datasets and analysed published datasets; Additional Table 2: potential exogenous sRNA sequences detected in human plasma after removal of contaminants; Additional Table 3: list of the species whose reference genomes and cDNA collections were used in the initial analysis.
LIST OF ABBREVIATIONS
- qPRC
- : real-time quantitative polymerase chain reaction
- sRNA
- : small RNA
DECLARATIONS
Ethics approval and consent to participate
Written informed consent was obtained from all blood donors. The sample collection and analysis was approved by the Comité d’Ethique de Recherche (CNER; Reference: 201110/05) and the National Commission for Data Protection in Luxembourg.
Consent for publication
Written consent for analysis of genetic material and publication was obtained from all blood donors.
Availability of data and materials
The datasets generated and analysed during the current study are available in the NCBI short read archive under BioProject PRJNA419919. Human reads from some datasets generated and analysed during the current study are not publicly available due to privacy concerns, but are available from the corresponding authors on reasonable request. Accessions of publically available data analysed during the current study are listed in Additional Table 1. Scripts for the analysis of the data from sRNA sequencing of column eluates and the plasma titration experiment is available at https://git.ufz.de/metaOmics/contaminomics.
Competing interests
P.W. has received funding and in-kind contributions toward this work from QIAGEN GmbH, Hilden, Germany. All other authors declare that they have no competing interests.
Funding
This work was supported by the Luxembourg National Research Fund (FNR) through an ATTRACT programme grant (ATTRACT/A09/03), CORE programme grant (CORE/15/BM/10404093) and Proof-of-Concept Programme Grant (PoC/13/02) to P.W., an Aide à la Formation Recherche grant (Ref. no. 1180851) to D.Y., an Aide à la Formation Recherche grant (Ref. no. 5821107) and a CORE grant (CORE14/BM/8066232) to J.V.F., a National Institutes of Health Extracellular RNA Communication Consortium award (1U01HL126496) to D.J.G., and by the University of Luxembourg (ImMicroDyn1). The funding bodies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Authors ‘ contributions
AH-B designed the experiments, performed experiments and sequencing data analyses, coordinated the study and wrote the manuscript. DY designed and performed the initial sequencing data analyses. AK, JVF and AG performed experiments. AE performed the sRNA sequencing. PM and BBU performed additional computational analyses. CdB obtained donor consents, performed the blood sampling and contributed to the initiation of the study. DJG and PW initiated and supervised the study. DY, AK, AE, JVF, PM and PW contributed to the writing of the manuscript. All authors contributed to the interpretation of the data and read and approved the final manuscript.
Acknowledgements
In silico analyses presented in this paper were carried out using the HPC facilities of the University of Luxembourg [65] whose administrators are acknowledged for excellent support.
Footnotes
present address: Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, United States; email: anubrata{at}mit.edu