ABSTRACT
Virus and host factors contribute to cell-to-cell variation in viral infections and determine the outcome of the overall infection. However, the extent of the variability at the single cell level and how it impacts virus-host interactions at a systems level are not well understood. To characterize the dynamics of viral transcription and host responses, we used single-cell RNA sequencing to quantify at multiple time points the host and viral transcriptomes of human A549 cells and primary bronchial epithelial cells infected with influenza A virus. We observed substantial variability of viral transcription between cells, including the accumulation of defective viral genomes (DVGs) that impact viral replication. We show a correlation between DVGs and viral-induced variation of the host transcriptional program and an association between differential induction of innate immune response genes and attenuated viral transcription in subpopulations of cells. These observations at the single cell level improve our understanding of the complex virus-host interplay during influenza infection.
AUTHOR SUMMARY Defective influenza virus particles, which are products of error-prone viral replication, carry incomplete versions of the genome and can interfere with the replication of competent viruses. These defective genomes are thought to modulate disease severity and pathogenicity of the influenza infection. Different defective viral genomes can have different interfering abilities, and introduce another source of variation across a heterogeneous cell population. Evaluating the impact of defective virus genomes on host cell responses cannot be fully resolved at the population level, requiring single cell transcriptional profiling. Here we characterized virus and host transcriptomes in influenza-infected cells, including that of defective viruses that arise during influenza A virus infection. We profiled single cell transcriptional landscapes over the course of the infection and established an association between defective virus transcription and host responses. We identified dominant defective viral genome species and validated their interfering and immunostimulatory functions in vitro. This study demonstrates the intricate effects of defective viral genomes on host transcriptional responses and highlights the importance of capturing host-virus interactions at the single-cell level.
INTRODUCTION
The productivity of viral replication at the cell population level is determined by cell-to-cell variation in viral infection [1]. The genetically diverse nature of RNA viruses and the heterogeneity of host cell states contribute to this inter-cell variability, which can impact therapeutic applications [2, 3]. Although previous studies of cell-to-cell variation during viral infection have mainly centered on non-segmented viruses such as poliovirus [2, 4], vesicular stomatitis virus (VSV) [5, 6], or dengue (DENV) and zika (ZIKV) viruses [7], heterogeneity across cells may be further complicated for viruses with a segmented genome, such as influenza A virus (IAV). IAV-infected cells display substantial cell-to-cell variation as it pertains to relative abundance of different viral genome segments [1, 8], transcripts [9], and their encoded proteins [10], which can result in non-productive infection in a large fraction of cells.
Defective interfering (DI) particles readily generated during successive high multiplicity of infection (MOI) in cell culture passages [11-14] and observed in natural infections [15, 16] are likely by-products of an inefficient IAV replication process and have a significant effect on productive infection. DI viruses have incomplete viral genomes with large internal deletions and they possess the ability to interfere with the replication of infectious viruses. Because DIs diminish the productivity of infectious progenies and could impact disease outcome, they are of great interest for therapeutic and prophylactic purposes (reviewed in [17]). A specific influenza DI (i.e., DI244) was shown to effectively provide prophylactic and therapeutic protection against IAV infection in mice [18]. One of the ways influenza DIs are thought to modulate viral infections is through interaction with a cytosolic pathogen-recognition receptor (PRR), RIG-I, essential for interferon (IFN) induction [19]. Besides enhanced IFN induction during infection with DI-rich influenza virus populations observed in vitro and in vivo, it has been assumed that DI virus could also compete with standard virus for cellular resources (reviewed in [17, 20]).
While diverse DI viruses can arise during IAV infection [21], the emergence and accumulation of distinct DIs, as well as other defective virus genomes (DVGs), has not been characterized at a single cell resolution, although the diversity of DIs present could be contributing to the observed cell-to-cell variation in host transcription.
We probed viral and host transcriptomes simultaneously in the same cells using single-cell RNA-seq to monitor host-virus interactions in cultured cells over the course of the infection. This data established a temporal association between the level of viral transcription and the alteration of the host transcriptome, and characterized the diversity and accumulation of DVG transcripts.
RESULTS
Cell-to-cell variation in virus gene expression
To determine how both the viral and host cell transcriptional programs relate to each other over the course of an influenza infection, we infected two cell types—the adenocarcinomic human alveolar basal epithelial A549 cell line and human primary bronchial epithelial cells HBEpC—at high multiplicity of Infection (MOI) with A/Puerto Rico/8/34(H1N1) (PR8) and performed single cell and bulk RNA-seq expression analyses. A high MOI infection ensures that virtually all the cells can be rapidly infected, promotes the accumulation of DVGs, and consequently enables the characterization of both host response and DVG diversity. We first determined the percentage of reads that uniquely aligned to viral genes from the total number of mapped reads to obtain the relative abundance of virus transcripts within cells at each time point. Similar to what has been observed at early stages of infection during a low MOI infection of IAV [9], the relative abundance of viral transcripts was heterogeneous across cells from both cell types, with 0 to 70% of the total reads in each cell being derived from viral transcripts, with the relative abundance of these transcripts increasing over time (Supplementary Fig. 1a). The same trend was also seen when analyzing segment-specific viral transcripts within individual cells over the course of the infection (Supplementary Fig. 1b).
Heterogeneity of defective virus segment expression across the cell population
Since DVGs are known to accumulate in cell culture and can serve as templates for transcription, we characterized their abundance and diversity by examining gap-spanning reads in the sequencing data. As the large majority of DVGs in IAV originate from the polymerase segments (i.e., PB2, PB1, and PA) [15], we focused on the detection of DVG transcripts derived from those segments. We collected the reads with large internal deletions spanning ≥1000 nucleotides (nt) (Fig. 1a) and identified the junction coordinates of these gap-spanning reads. We observed a diverse pool of DVG transcripts, including some shared with the viral segments (vRNA) from the PR8 stock (Supplementary Fig. 2-4). The sizes of the majority of these DVG transcripts are estimated to be between 300nt and 1000nt.
To quantify the relative abundance of DVGs identified, we calculated the frequency of each unique DVG transcript type (determined by 5’ / 3’ gap coordinates) across single cells and the ratio of these to non-gap-spanning transcripts derived from the corresponding viral segments (i.e., DVG/FL ratio). The DVG/FL ratio, which represents the ratio of DVG transcripts to that of the full-length (FL) transcripts, derived from a given polymerase segment increased significantly between the early (6hpi) and late (24hpi) stages of the infection (p < 2.2 × 10-16) and displayed a high level of heterogeneity across single cells from both cell types (Supplementary Fig. 5). Interestingly, the junction sites aggregated in specific genomic regions, as they systematically occurred in a high percentage of cells. We identified two predominant types of DVG transcripts corresponding to PB2 and PB1 defective segments (Fig. 1b-c and Supplementary Fig. 6-7). These same DVG transcripts were present in the stock, increased in prevalence over the course of the infection, and were conserved in different cell types and at different MOIs, (Supplementary Fig. 8-9) indicating likely carryover from the stock virus rather than de novo formation in each new cell type. For PA, one particular DVG type that was present in the stock was found in A549 and MDCK cells at different MOIs, but not in HBEpC cells. Its prevalence was also much lower than for PB2 and PB1 DVGs (Supplementary Fig. 10-11). The dominant DVG PB2 and PB1 transcripts derived from corresponding defective segments in both the stock and the infected cells showed stable relative abundance across individual cells over the course of the infection, suggesting the persistence of the defective viral segments.
To determine if any of the inferred defective polymerase segments could lead to the generation of defective interfering particles (DIs), we tested their interfering potential experimentally. We generated clonal PB8-DI162 and PR8-DI222 viruses carrying the deletion sites in the PB2 segment identified as those in two dominant PB2-DVG species and then co-infected each of the DI viruses with PR8 wild-type (WT) virus in A549 cells. Both PR8-DI162 and PR8-DI222 viruses can inhibit the productivity of WT virus, comparable to the interfering ability of PR8-DI244 that is known to be an effective DI [18] (Fig. 1d). This confirmed the interfering ability of the two types of defective PB2 segments identified in the single cell data.
Differential expression of host genes linked to defective viral genomes
Viral infection can trigger massive changes in the host transcriptional program, including the activation of the interferon (IFN) response. Since the induction of IFNs is also subject to cell-to-cell variation [22], we first evaluated the expression frequency of type I (i.e., IFN beta) and III (i.e., IFN lambda) IFNs to determine the extent of the variation during infection. While these were significantly differentially expressed over the course of the infection at the population level, as measured by bulk RNA-seq (Fig. 2a), the expression of IFNs was only detected in less than 3% of cells until the late stage of infection, although this proportion increased over time (Fig. 2b and Fig. 2c). The same expression profile was observed for a number of IFN-stimulated genes (ISGs), such as RSAD2, CXCL10, GBP4, GBP5, IDO1, and CH25H in A549 cells, and IFI44L, CMPK2, IFIT1, BST2, OASL, and XAF1 in HBEpC (Fig. 2). In silico pooling of the single cell data to mimic the population level measurement resembles the bulk RNA-seq data, thus excluding the possibility of substantial technical limitation of single-cell RNA-seq (Supplementary Fig. 12).
To evaluate cell-to-cell variation in the host response manifested by the alteration in the host transcriptome following IAV infection, we performed un-supervised cell clustering using the host transcriptional profile. We then annotated single cells in each subpopulation with information about the relative abundance of viral transcripts and the DVG/FL ratio. We detected significant effects of viral transcription on the host transcriptome, as cells with a high level of viral transcription formed a separate cluster by 12hpi in both cell types (Fig. 3a and Fig. 3b), while the other cells clustered primarily according to their cell cycle stage (Supplementary Fig. 13a and Supplementary Fig. 13b). Consistent with previous reports on G0/G1 cell-cycle arrest [23, 24], the most drastic alteration in cell-cycle distribution was observed in A549 cells at 24hpi, when the host gene expression pattern varied along a gradient based on the relative abundance of viral transcripts (Fig. 3a).
To characterize the host genes driving the changes associated with viral transcription, we assessed the over-representation of Gene Ontology (GO) terms associated with differentially expressed genes in each subpopulation of A549 and HBEpC cells compared to the rest of the cells at 24hpi. In two subpopulations of A549 cells with a high relative abundance of viral transcripts (clusters 0 and 1 in Fig. 4a), there is enrichment of genes involved in transcription, RNA processing, translation, SRP-dependent co-translational protein targeting to the membrane, mitochondrial electron transport and ATP synthesis (Fig. 4a). These GO terms are also associated with genes highly expressed in HBEpC cells with the highest level of viral transcription at 24hpi (cluster 3 in Fig. 4b; p < 2.2 × 10-16 compared to the other clusters). In contrast, genes involved in antiviral responses, such as the type I IFN signaling pathway and the negative regulation of viral genome replication, are over-expressed in the other two subpopulations of A549 cells with a similar or severely reduced relative abundance of viral transcripts (clusters 3 and 2, respectively in Fig. 4a). However, some antiviral genes are differentially induced in cells with different levels of viral transcription. For example, type I and III IFNs, as well as a subset of ISGs, are over-expressed in cluster 3, and another subset of ISGs are over-expressed in cluster 2. In HBEpC cells, the antiviral responses, such as the type I IFN signaling pathway, are primarily observed in a cluster (cluster 2 in Fig. 4b) that has a low level of viral transcription (Fig 3b at 24hpi) and mostly comprised of cells in the G0/G1 phase (Supplementary Fig. 13b at 24hpi).
Since DVGs are known to play an important role in IFN induction and inhibition of viral replication (reviewed in [20]), we determined whether the accumulation of DVGs was associated with attenuated viral transcription and induction of innate immune antiviral response genes. We observed an inverse trend between the relative abundance of viral transcripts and the accumulation of defective PB2 and PB1 segments at the mid (12hpi) and late (24hpi) stages of infection (Fig. 5a-f). The cluster of cells with the highest level of viral transcription (cluster 1 of A549 cells at 12hpi in Fig. 5a and cluster 3 of HBEpC cells at 24hpi in Fig. 5b) has the lowest level of PB2 and PB1 DVG/FL ratios (Fig. 5c-d; p < 2.2 × 10-16 comparing PB2 and PB1 DVG/FL ratios in cluster 1 of A549 cells or cluster 3 of HBEpC cells against the other clusters). Similarly, in A549 cells at 24 hpi, we see the lowest level of viral transcription in cells (cluster 2 in Fig. 5e) that have a higher level of PB2 DVG/FL ratios (cluster 2 in Fig. 5f; p < 2.2 × 10-16 compared to clusters 0 and 1) and the highest level of PB1 DVG/FL ratios (p = 1.943 × 10-9 comparing cluster 2 against the other clusters). Moreover, given the fact that the innate immune response genes are highly expressed in cells with an elevated relative abundance of both DVG PB2 and PB1 transcripts (cluster 2 of A549 cells at 24 hpi in Fig. 5f), or just DVG PB2 transcripts (cluster 3 in Fig. 5f; p = 4.036 × 10-11 compared to clusters 0 and 1), it suggests an association between the accumulation of DVGs and the strong stimulation of the innate immune response compared to the rest of the cells in the same population, where secreted IFNs are accessible to all the cells. Notably, the DVG PB2 transcripts carrying the same deletion sites as for PR8-DI222 were highly abundant in the cluster of cells with the lowest level of viral transcription (cluster 2 of A549 cells at 24 hpi in Fig. 5g; p < 2.2 × 10-16 compared to the other clusters), while the other DVG PB2 transcripts corresponding to PR8-DI162 were highly abundant in both cluster 2 and cluster 3 (Fig. 5g; p = 2.067 × 10-10 and p = 0.001241, respectively, compared to clusters 0 and 1), suggesting a potential difference in the induction of innate immune response genes by different DVG species. We did not detect a difference in the relative abundance of defective PA segments among cells with different levels of viral transcription (Supplementary Fig. 14).
To further understand how transcriptional activity is coordinated in clusters of cells that confront different levels of viral stimuli, including DVGs, we performed gene co-expression network analyses using multiscale embedded gene co-expression network analysis (MEGENA) [25]. This was done independently within each cluster of A549 and HBEpC cells collected at 24 hpi to identify modules of co-expressed genes representing coherent functional pathways. The relevance of a module to viral infection in terms of the host response and viral replication was evaluated by the enrichment of differentially expressed disease gene (DEDG) signatures compared to the mock-infected cells and by correlating with the relative abundance of viral transcripts and DVG/FL ratios. Consistent with the processes identified in the GO over-representation test (Fig. 4), those related to the host cell cycle and viral life cycle, especially at the late stage of viral replication, including protein synthesis and transport, are enriched in the top-ranked modules identified in the clusters of A549 cells with a high level of viral transcription (Fig. 6a, Supplementary Fig. 15a and 15b). Conversely, the host innate immune response (e.g., type I IFN signaling pathway) was primarily associated with several top-ranked modules in cells with the lowest level of viral transcription (cluster 2 of A549 cells at 24hpi in Fig. 6b) and in high IFN-producing cells with a high level of viral transcription (cluster 3 of A549 cells at 24hpi in Fig. 6a). For example, module M100 in cluster 2 of A549 cells had key regulators involved in the innate antiviral response that were significantly up-regulated, compared to the mock-infected cells, including ISG20, RNF213, IFI35, STAT1, and RBCK1 (Fig. 6b) and showed a strong negative correlation with the level of viral transcription (ρ = -0.26, p = 2.9 × 10-5); and Module M3 in cluster 3 of A549 cells had key regulators including IFNL1 and some ISGs (e.g., OASL and ISG15) (Fig. 6a). A top-ranked module associated with the type I IFN signaling pathway was also observed in cluster 1 of A549 cells with a high level of viral transcription (M13 in Supplementary Fig. 15b) and it shared a common key regulator, ISG15, with the other IFN signaling modules detected in clusters 2 and 3 (Supplementary Fig. 15c and Fig. 6a). However, a top-ranked module (M234 in Supplementary Fig. 15c) enriched for the chemokine-mediated signaling pathway, particularly the regulation of granulocytes (e.g., neutrophils) chemotaxis, was observed in another cluster of A549 cells with a high level of viral transcription (i.e., cluster 0). Nonetheless, in HBEpC cells, top-ranked modules enriched for type I IFN signaling genes were detected in all four clusters of cells with different levels of viral transcription (Fig. 6c and Supplementary Fig. 16). These modules in three clusters of HBEpC cells with a low level of viral transcription (i.e., clusters 0, 1, and 2 in Fig. 6c and Supplementary Fig. 16), including Module M227 in cluster 2 of HBEpC cells with a significant positive correlation with the level of viral transcription (ρ = 0.10, p = 0.021) (Fig. 6c), also shared the same key regulator, ISG15, as observed in A549 cells.
Given the presence of PB2 DVGs in cluster 2 of A549 cells, and the association in that cluster between module M100 and the over-expression of ISGs compared to the rest of the cells in the same population as well as to mock-infected cells, we experimentally validated the immunostimulatory impact of PB2 DVGs on the host cells. We infected A549 cells with each of our 3 PR8-DIs (PR8-DI162, -DI222, or -DI244) or with WT virus, and quantified the expression level of ISGs that were shown to be significantly up-regulated in cluster 2 of A549 cells, including IFI35, IFI27, and MX1, as well as type I and III IFNs that were shown to be significantly up-regulated in cluster 3 of A549 cells (Supplementary Fig. 17a and 17b). At 24hpi, we observed a higher induction of the ISGs in DI-infected cells compared to cells infected with WT virus, while a generally higher induction of IFN beta (IFNB) and IFN lambda (IFNL) was observed in the WT-infected cells (Supplementary Fig. 17c). Together, these data demonstrated that PB2 DVGs have a strong immunostimulatory effect on the host cells, manifested by a significant induction of ISGs at the late stage of the infection.
DISCUSSION
The goal of this study was to quantitatively and qualitatively characterize viral and host factors that contribute to cell-to-cell transcriptional variation observed during IAV infection. We identified DVGs as potentially important factors in the temporal variation of the host cell transcriptome. The substantial cell-to-cell variation in viral replication and host response that we detected highlight the potential of single-cell virology to provide novel insight into complex virus-host interactions.
The inherent stochasticity of molecular processes, especially during the initial stages of infection, can contribute to the cell-to-cell variation in viral replication [1]. Evidence from studies examining low MOI infections with different virus strains and cell lines suggests that this intrinsic stochasticity can impact viral replication and result in the loss of viral genome segments, transcripts, or proteins [1, 8-10]. Similarly, at different time points of the high MOI infection, we observed cell-to-cell variation in viral transcription, including that in segment-specific transcription. Although the mechanisms leading to this observation are unclear based on our data, it may be explained by the following possibilities as discussed in [10], including (i) the absence of one or more viral genome segments in infecting virions, (ii) a deficiency in intracellular trafficking of incoming viral ribonucleoprotein complexes (vRNPs), (iii) the random degradation of vRNAs prior to transcription, or (iv) mutations resulting in decreased stability of viral mRNAs.
Another source of variability in viral replication stems from the emergence and accumulation of various vRNA species synthesized by the error-prone viral polymerase. Indeed, viral genetic diversity, including amino acid mutations in the NS1 and PB1 proteins, the absence of the NS gene, and an internal deletion in the PB1 gene, contributes to the cell-to-cell variation in the innate immune response as shown in a recent report of low MOI infection [26]. Although the 3’ sequencing strategy employed in our study hinders the identification of mutations in the full-length segments, our experimental setup provides a unique opportunity to evaluate the diversity of DVGs by characterizing the DVG transcripts, since a high MOI infection promotes the accumulation of DVGs. Late stages of infection also emphasize the impact of accumulated DVGs on the host response. We identified a diverse pool of DVG transcripts derived from the 3 polymerase segments, including several dominant species. Notably, the fact that all the dominant DVG PB2 and PB1 transcripts identified in the infected cells were also present in the virus stock, and that the dominant DVG PB1 transcripts were detected in an increasing proportion of cells over the course of infection, suggests potential transmission of DI viruses carrying these DVGs, although the possibility that the regions where the hot spots were located are most prone to polymerase skipping cannot be ruled out.
The accumulation of DVGs can have consequences on viral replication by directly competing with the full-length genomes and modulating host innate immune responses. The immune-modulatory effect of DVGs has been attributed to efficient activation of the IFN induction cascade and antiviral immunity, a mechanism well-known for the non-segmented negative-sense RNA viruses, such as the murine parainfluenza virus Sendai (SeV) (reviewed in [20]). Consistent with previous reports, we detected enrichment of highly expressed genes involved in the IFN response in one subpopulation of cells with significantly attenuated viral transcription; In this cluster of cells we also observed that DVG transcripts accumulated to significantly higher levels. The co-expression network analysis provides us an opportunity to identify modules of co-expressed host genes and directly correlate them with viral factors, such as the level of viral transcription and DVG accumulation, and enables us to identify groups of innate response genes that are potentially differentially expressed under the effect of viral factors. The observation was further validated in our DI- or WT-infection assay, in which we detected a significantly higher expression of some ISGs in DI-infected cells and that of IFNs in WT-infected cells at the late stage of the infection. Given the fact that some ISGs could be highly expressed in cells with a high level of viral transcription or a high level of DVGs suggests that other viral factors, besides the accumulation of DVGs, may also play an important role in triggering the innate immune response. Nevertheless, unlike SeV DVGs that form “copy-back” structures due to complementary termini generated by “copying back” the authentic 5’ terminus at the 3’ terminus [27, 28], IAV DVGs share identical termini with the full-length genomes. In the copy-back SeV DVGs, a stretch of dsRNA adjacent to a 5’-triphosphate serves as an effective RIG-I ligand [20]; however, the mechanism of IAV DVGs underlying a more effective immunostimulation compared to the full-length genomes remains elusive [20]. A proposed hypothesis is that the unencapsidated replication products of small DVGs can potentially activate RIG-I if they can reach the cytoplasm [20], given the observations in vitro and in vivo of short influenza RNA template replication in the absence of nucleoprotein (NP) [29]. In addition, the interfering and immune-modulatory abilities of DVGs could also be attributed to DVG-encoded proteins that directly interact with mitochondrial antiviral-signaling (MAVS) and act independently of RIG-I in IFN induction, as previously reported [30].
Our approach enables us to distinguish DVG species accumulated in individual cells and associate the enrichment of certain DVG species with attenuated viral transcription and the induction of innate immune responses with single cell resolution. The accumulation of one type of dominant PB2-DVG (PR8-DI222) in one subpopulation of cells was associated with a strong host innate immune response manifested by over-expression of some ISGs, although the accumulation of other non-dominant defective PB2-DVGs, DVGs derived from other polymerase segments, and certain mutations in stimulating host responses could also have occurred. Considering the significantly attenuated viral transcription in the subpopulation of cells where PR8-DI222 was enriched, the different ability to induce innate immune responses is likely to provide an explanation for the distinct effects of the DVG species. However, further investigation is necessary to pinpoint causal relationships and identify the specific mechanism by which this happens. Our characterization of DVGs and corresponding host responses reveals complex virus-host interactions and an underappreciated single-cell level of immune-modulation by DVGs. The complex gene expression pattern underlying the host innate immune response demonstrates the intricate effects of various viral factors, including—but probably not limited to—the accumulation of DVGs, on shaping infection outcome at the single-cell level.
MATERIALS AND METHODS
Cells and virus
Human lung adenocarcinoma epithelial A549 cells (ATCC, Virginia, USA) were maintained in Kaighn’s modified Ham’s F-12 medium (F-12K) supplemented with 10% fetal bovine serum (FBS). Primary human bronchial epithelial cells (HBEpC) (PromoCell, Heidelberg, Germany) were maintained in PromoCell airway epithelial cell growth media with SupplementMix (PromoCell). Madin-Darby canine kidney (MDCK) cells were maintained in minimum essential medium (MEM) supplemented with 5% FBS. Human embryonic kidney epithelial 293T cells (ATCC, Virginia, USA) were maintained in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% FBS. PB2 protein-expressing modified MDCK (AX4/PB2) [31] cells were maintained in MEM supplemented with 5% Newborn Calf Serum (NBCS), 2 µg/ml puromycin and 1 µg/ml blasticidin. 293T and AX4/PB2 cells were co-cultured with DMEM supplemented with 10% FBS. Viral strain A/Puerto Rico/8/34 (H1N1) was plaque purified and propagated in MDCK cells. Viral titers were determined by plaque assay in MDCK cells and sequences confirmed by Illumina MiSeq sequencing.
Infection assays
Subconfluent monolayers of A549 cells in T-25 flasks were washed with A549 infection media (F-12K supplemented with 0.15% bovine serum albumin (BSA) fraction V and 1% antibiotic-antimycotic) and infected with influenza virus at a multiplicity of infection (MOI) of 5 in 0.5 mL pre-cooled A549 infection media. The flasks were incubated at 4°C for 30 minutes with agitation every 5-10 minutes followed by addition of 4.5 mL pre-warmed A549 infection media and transferred to a 37°C, 5% CO2 incubator for further incubation. HBEpC cells were infected similarly except the HBEpC infection media was PromoCell airway epithelial cell growth media. The inoculum was back-titrated to confirm the desired MOI was used. The virus-infected cells were collected at 6, 12, and 24 hours post infection (hpi) and the mock-infected cells were collected at 12 hpi. Cells were extensively washed before re-suspension in PBS containing 0.04% BSA. A proportion of cells from one of the two duplicate flasks per time point were subject to 10X Genomics Chromium single-cell library preparation. The remaining cells in the two flasks were used for conventional bulk RNA-seq library preparation. Notably, due to a failure in the single-cell library preparation for the PR8-infected HBEpC cells collected at 12hpi, we replaced this sample by repeating the same infection assay with HBEpC cells.
Sample preparation, library construction, and sequencing
10X Genomics single cell library preparation with Chromium 3’ v2 chemistry was performed following the manufacturer’s protocol. A range of 3,900-7,500 cells were used as input into each single cell preparation, with a median of ∼3,353 cells (range 2,075-5,254) obtained following sequencing, as described below. Sequencing was performed on the HiSeq 2500 in HighOutput mode (v2) with one library per lane following the manufacturer’s recommended sequencing configuration (i.e., paired-end read 1: 27 bp, read 2: 99 bp, and i7 index: 8 bp). An average of 59,200 reads per cell were obtained. For bulk mRNA sequencing, total RNA from two replicate flasks per time point was extracted using RNeasy Mini Kit (Qiagen, Hilden, Germany) with on-column DNA digestion using RNase-free DNase (Qiagen). The conventional bulk RNA-seq library preparation was performed using NEBNext Ultra II RNA library prep kit for Illumina following poly(A) mRNA enrichment. Libraries were multiplexed and sequenced on an Illumina NextSeq 500 in HighOutput 2×75 bp mode (v3). Viral genomic RNA (vRNA) of the virus stocks used in the infection assay was amplified using multi-segment reverse-transcription PCR (M-RTPCR) [32]. The M-RTPCR product was subject to Illumina library preparation with Nextera DNA library prep kit and sequenced on the HiSeq 2500 in Rapid 2×150 bp mode (v2).
Upstream computational analyses of single-cell RNA-seq data
The Cell Ranger (v2.1.0) single cell software suite (10X Genomics) was applied to the single-cell RNA-seq data to perform the alignment to the concatenated human (hg19) and influenza A virus (A/Puerto Rico/8/34) reference with STAR (v2.5.3a) [33] and gene counting with the processed UMIs for each cell. The UMI counts for the viral reads with the CIGAR string containing a gap (i.e., “MNM”) and different UMI sequences comparing to the ungapped reads were added into the expression matrix later, since they were not considered during gene counting. Given the fact that we performed 3’-end sequencing, transcripts derived from alternative splicing of M and NS segments, encoding M1/M2 and NS1/NEP respectively, could not be distinguished in our analyses, as those transcribed from the same genes share the identical 3’-end.
To identify and exclude low quality cells in the single cell data from the downstream analyses, the following metrics were calculated for each sample and were employed to filter out cells that failed to meet these criteria: 1) remove cells with fewer than 2,000 detected genes; 2) remove cells with an alignment rate less than the mean minus 3 standard deviations; 3) remove cells with a number of reads after log10 transformation not within 3 standard deviations below or above the mean; 4) remove cells with the number of UMI after log10 transformation not within 3 standard deviations below or above the mean; 5) remove cells with the percentage of reads aligned to mitochondrial genes not within 3 standard deviations below or above the mean; 6) remove mock-infected cells with 1 viral read detected. Although we filtered cells based on the size of the total mRNA pool (i.e., total reads) as one of the criteria described above, we also checked the distribution of the size of the host mRNA pool (i.e., host reads) for individual cells. As reported in recent virus-infected single cell studies [9, 34], while virus-induced host shutoff of gene expression [35] is likely to be mainly mediated by host mRNA degradation, it remains unclear what effect viral transcription has on the size of the total mRNA pools. In our study, we observed that virtually all the cells with substantially smaller or larger host mRNA pools–where the number of host reads after log10 transformation was not within 3 standard deviations below or above the mean–were already being excluded when applying the 6 criteria described above, thus we decided to filter cells based on these. Approximately 120-250 cells were eliminated from each sample, with a median of 164 cells per sample. We further removed genes that were detected in fewer than three cells. After initial cell and gene quality control, the majority of samples had approximately 3,000-3,500 cells left for downstream analysis. The expression levels of host genes were normalized based on the size of the host mRNA pool.
Cell clustering with Seurat
An unsupervised cell clustering to identify subtle changes in the population structure after infection was performed on each time point data separately following the procedures of the Seurat package (v2.1.0) [36] using the normalized, scaled, and centered host gene expression matrix. Briefly, the highly variable genes with average expression < 4 and dispersion > 1 were used as input for the PCA. The statistically significant and biological meaningful PCs determined by the built-in jackstraw and elbow analyses and manually exploration were retained for visualization by t-distributed stochastic neighbor (t-SNE) and subsequent clustering by a shared nearest neighbor (SNN) graph-based approach. The legitimacy of the initially identified clusters was validated using the “ValidateClusters” function in Seurat, which built a support vector machine (SVM) classifier with significant PCs and then applied the accuracy cutoffs of 0.9 and the minimal connectivity threshold of 0.001. To identify markers that are differentially expressed among clusters, the “FindMarkers” function in Seurat was used with the different test options, including “bimod” [37], “poisson”, “negbinom”, and “MAST” (v1.4.1) [38]. Only the genes that showed at least 0.25-fold difference on the log-scale between two groups and were expressed in at least 25% of the cells in either group were tested for differential expression. Significantly differentially expressed genes for each cluster (i.e., the markers) were identified by applying the adjusted p-value cut-off of 0.05. Markers identified by all four methods were retained. GO term over-representation analysis of the up-regulated markers was performed with the online service DAVID [39, 40] by applying the Benjamini p-value cut-off of 0.05. To determine the cell cycle stage associated with individual cells, cell-cycle scoring and assignment were performed with Seurat using the “CellCycleScoring” function based on the expression of canonical markers [41].
Computational analyses of bulk RNA-seq and virus stock sequencing data
The raw bulk RNA-seq and virus stock sequencing data was first trimmed with trimmomatic (v0.36) [42] to remove the adaptors and trim off low quality bases. Reads with a minimal length of 36 bases in the trimmed bulk RNA-seq dataset were aligned to the concatenated human (hg19) and influenza A/Puerto Rico/8/34 (H1N1) reference with STAR (v2.5.3a) [33] with the default parameters, and counted with featureCounts [43] in the Subread package (v1.5.1) [44], while reads in the virus stock sequencing dataset were aligned to the influenza A/Puerto Rico/8/34 (H1N1) reference with STAR (v2.5.3a) [33]. Differential expression analysis was performed with DESeq2 (v1.18.1) [45] and edgeR (v3.20.5) [46, 47] using the bulk RNA-seq and the merged single-cell RNA-seq data. Host genes with the adjusted p-value < 0.05 were identified as significantly differentially expressed at each time point.
Deletion junction identification, filtering, and quantification
Reads that aligned to both ends of the polymerase segments, and that contained large internal deletions (i.e. ≥ 1000 nucleotides, with each aligned portion of at least 10 nucleotides in length) were collected. Following UMI de-duplication, reads with junction coordinates within a 10-nucleotide window were grouped together. We excluded from downstream analyses reads with junction coordinates that occurred fewer than 10 times in each sample. To compare quantitatively gap-spanning reads for each segment across cells and samples, the number of gap-spanning reads was normalized to the total number of non-gap-spanning reads aligned to a 100nt region centering the coverage peak at the 3’ end. As cells with low infection (especially those harvested at the early stage of infection or inoculated at a low MOI) typically have poor coverage for the viral segments of interest, the DVG/FL ratio in those cells calculated as described above is typically inflated and may even fail to be calculated because all the viral reads corresponding to a given segment are gap-spanning or there are no viral reads. To mitigate this effect, we overwrote those values to 0 in the datasets, including the dataset collected from HBEpC cells at 6hpi.
Generation of PR8-derived DI162 and DI222 virus
To generate the PR8 PB2-DI162, -DI222, and -DI244 reverse genetics plasmids, gBlocks Gene Fragments (Integrated DNA Technologies, California, USA) were ordered using the corresponding defective PB2 genomic sequences identified in this study (PR8-DI162 and PR8-DI222) or previously reported (DI244) [18] and cloned into the pBZ61A18 reverse genetics vector as previously described [48]. To rescue the PR8 DI viruses, each of the sequence confirmed PB2-DI plasmids was co-transfected into 293T-AX4/PB2 co-cultured cells with the 7 reverse genetics plasmids for PR8 PB1, PA, HA, NP, NA, M, and NS and a PB2 protein-expression plasmid using Lipofectamine™ 3000 Transfection Reagent (Invitrogen, California, USA). The supernatant was collected on day 2 post-infection and passaged twice in the AX4/PB2 cell line, which expresses the PB2 protein in trans [31]. Viral titers were determined by TCID50 assay in AX4/PB2 cells.
Interference test for PR8-DI162 and PR8-DI222 viruses and validation of their immunostimulatory effects
To test the interfering ability of PR8-DI162 and PR8-DI222 viruses that carry the deletions identified in two dominant PB2-DVG species from the single cell dataset, their inhibitory effects on wild type PR8 virus replication was quantified in cultured cells. Subconfluent monolayers of A549 cells in 12-well plates were washed with infection media (MEM supplemented with 0.15% BSA fraction V, 1% antibiotic-antimycotic and 1 µg/mL TPCK-treated trypsin) and each well was co-infected with one type of DI virus at a MOI of 5 and the PR8 virus at a MOI of 0.005 in 1 mL infection media. Plates were transferred to a 37°C, 5% CO2 incubator. At 2 hours post infection, and 1, 2, and 3 days post infection, supernatants were collected and wild type PR8 yield was determined by TCID50 assay using MDCK cells, which do not support replication of DI viruses due to the lack of functional PB2 proteins.
To determine if the DI viruses could stimulate a higher expression of some ISGs identified in the network analysis, subconfluent monolayers of A549 cells in 12-well plates were infected with each DI virus (PR8-DI162, -DI222, or -DI244) or WT virus at a MOI of 10 in triplicate. Total RNA from cells collected at 24hpi was extracted using RNeasy Mini Kit (Qiagen, Hilden, Germany). 100ng RNA was subsequently used as template for reverse transcription using SuperScript IV system (Invitrogen, California, USA) with Oligo d(T)20 primer. qPCR was performed using PowerUp SYBR Green (Applied Biosystems, Massachusetts, USA) in triplicate with the following primers: IFI35: Hs.PT.58.38490206 (IDT, Iowa, USA); IFI27: IFI27fw (5’-GCCTCTGGCTCTGCCGTAGTT-3’) and IFI27rev (5’-ATGGAGGACGAGGCGATTCC-3’) [49]; MX1: MX1fw (5’-CTTTCCAGTCCAGCTCGGCA-3’) and MX1rev (5’-AGCTGCTGGCCGTACGTCTG-3’) [49]; IFNB: IFNBfw (5’-TCTGGCACAACAGGTAGTAGGC-3’) and IFNBrev (5’-GAGAAGCACAACAGGAGAGCAA-3’) [50]; IFNL: IFNLfw (5’-GCCCCCAAAAAGGAGTCCG-3’) and IFNLrev (5’-AGGTTCCCATCGGCCACATA-3’) [50]; β-actin (ACTB): ACTBfw (5’-GAACGGTGAAGGTGACAG-3’) and ACTBrev (5’-TTTAGGATGGCAAGGGACT-3’) [51]. For all the targets, qPCR parameters according to manufacturer’s recommendation were: 95°C for 10 min and then 45 cycles of 95°C for 15 s, 57°C for 15 s, and 60°C for 60 s. Fold change of target gene expression was calculated using the method normalized to ACTB.
Gene co-expression network analysis
Multiscale Embedded Gene Co-Expression Network Analysis (MEGENA) [25] was performed to identify host modules of highly co-expressed genes in influenza infection. The MEGENA workflow comprises 4 major steps: 1) Fast Planar Filtered Network construction (FPFNC), 2) Multiscale Clustering Analysis (MCA), 3) Multiscale Hub Analysis (MHA), 4) and Cluster-Trait Association Analysis (CTA). A cutoff of 0.05 after perturbation-based FDR calculation was used. The total relevance of each module to influenza infection was calculated by summarizing the combined enrichment of the differentially expressed disease gene (DEDG) signatures: Gj = Πigji, where, gji is the relevance of a module j to a signature i; and gji is defined as (maxj(rji)+1- rji)/ ∑jrji, where rji is the ranking order of the significance level of the overlap between the module j and the signature i. Here, DEDGs are differently expressed genes between virus-infected cells of a particular cluster and the bulk of mock-infected cells determined by t-test. Only the genes that showed at least 0.25-fold difference on the log-scale between two groups and were expressed in at least 25% of the cells in either group were tested for differential expression. Significantly differentially expressed genes were identified by applying the adjusted p-value cut-off of 0.05.
Identification of enriched pathways, key regulators in the host module, relative abundance of virus transcripts as well as DVGs associated with host modules
To functionally annotate gene signatures and gene modules identified in this study, enrichment analysis was performed of the established pathways and signatures—including the gene ontology (GO) categories and MSigDB—and the subject area-specific gene sets—including influenza host factors, Inflammasome, Interferome, and InnateDB. The hub genes in each subnetwork were identified using the adopted Fisher’s inverse Chi-square approach in MEGENA; Bonferroni-corrected p-values smaller than 0.05 were set as the threshold to identify significant hubs.
The relative abundance of virus transcripts and the DVGs associated with the host modules were identified using Spearman correlation between the first principal component of the gene expression in the corresponding module and the traits, including the relative abundance of each or total viral transcripts and the DVG/FL ratio. Significantly associated traits were identified using the Benjamini-Hochberg FDR-corrected p-value 0.05 as the cutoff.
STATISTICAL ANALYSIS
The statistical significance of the changes in the relative abundance of viral and DVG transcripts between 3 or more groups of cells was first determined by the one- or two-tailed Kruskal-Wallis rank sum test, followed by the one-tailed Wilcoxon rank sum test to calculate the pairwise comparisons. The statistical significance of expression fold changes in the qPCR validation assay was determined using the two-tailed Student’s t-test. A p-value of ≤ 0.05 was considered statistically significant.
CODE AVAILABILITY
The code used to generate all the results is available on Github (https://github.com/GhedinLab/Single-Cell-IAV-infection-in-monolayer).
DATA AVAILABILITY
Sequencing data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) repository with the accession codes GSE118773 (currently private record).
AUTHOR CONTRIBUTIONS
All authors read and approved the manuscript. E.G. conceived and designed the experiments, supervised research, and wrote the manuscript. B. Zhou conceived and designed the experiments, supervised research, performed the infection assays, and wrote the manuscript. C.W. performed the infection assays and bulk RNA-seq library preparation, analyzed the data, and wrote the manuscript. C.V.F. performed the network analysis and wrote the manuscript. T.C. performed the DI virus generation, interfering, and validation assay and wrote the manuscript. A.G. performed the library preparation for the virus stock. M.W. and B. Zhang contributed to data analyses. W.H., M.S., and R.S. performed the single-cell library preparation.
ACKNOWLEDGEMENTS
We thank members of the Ghedin laboratory for feedback and discussion. We thank Tara Rock, Nicholas Rouilard, Olivia Micci-Smith, and Mohammed Khalfan of the Genomics Core Facility at the Center for Genomics and Systems Biology, New York University for sequencing. We thank Yoshihiro Kawaoka for providing the AX4/PB2 cells and Peter Palese and Adolfo Garcia-Sastre for providing the PR8 RGs plasmids. This work was supported by NIAID/NIH U01 AI111598. This work was also supported in part through the NYU IT High Performance Computing resources, services, and staff expertise.