Abstract
Viral infection can dramatically alter a cell’s transcriptome. However, these changes have mostly been studied by bulk measurements on many cells. Here we use single-cell mRNA sequencing to examine the transcriptional consequences of Influenza virus infection. We 1nd extremely wide cell-to-cell variation in production of viral gene transcripts – viral transcripts compose less than a percent of total mRNA in many infected cells, but a few cells derive over half their mRNA from virus. Some infected cells fail to express at least one viral gene, and this gene absence partially explains variation in viral transcriptional load. Despite variation in total viral load, the relative abundances of viral mRNAs are fairly consistent across infected cells. Activation of innate immune pathways is rare, but some cellular genes co-vary in abundance with the amount of viral mRNA. Overall, our results highlight the complexity of viral infection at the level of single cells.
Introduction
Viruses can cause massive and rapid changes in a cell’s transcriptome as they churn out viral mRNAs and hijack cellular machinery. For instance, cells infected with Influenza virus at high multiplicity of infection (MOI) express an average of 50,000 to 100,000 viral mRNAs per cell, corresponding to 5 to 25% of all cellular mRNA (Hatada et al., 1989). Infection can also trigger innate-immune sensors that induce the expression of cellular anti-viral genes (Killip et al., 2015; Iwasaki and Pillai, 2014; Crotta et al., 2013). This anti-viral response is another prominent transcriptional signature of high-MOI Influenza virus infection in bulk cells (Geiss et al., 2002).
However, initiation of an actual Influenza infection typically involves just a few virions infecting a few cells (Varble et al., 2014; Poon et al., 2016; Leonard et al., 2017; McCrone et al., 2017). The dynamics of viral infection in these individual cells may not mirror bulk measurements made on many cells infected at high MOI. Over 70 years ago, Max Delbruck showed that there was a ~100-fold range in the number of progeny virions produced per cell by clonal bacteria infected with clonal bacteriophage (Delbruck, 1945). Subsequent work has shown similar heterogeneity during infection with other viruses (Zhu et al., 2009; Schulte and Andino, 2014; Combe et al., 2015; Akpinar et al., 2016), including Influenza virus (Heldt et al., 2015).
In the case of Influenza virus infection, targeted measurements of specific proteins or RNAs have shed light on some factors that contribute to cell-to-cell heterogeneity. The Influenza virus genome consists of eight negative-sense RNA segments, and many infected cells fail to express one more of these RNAs (Heldt et al., 2015; Dou et al., 2017) or their encoded proteins (Brooke et al., 2013). In addition, activation of innate-immune responses is inherently stochastic (Shalek et al., 2013, 2014; Bhushal et al., 2017; Hagai et al., 2017), and only some Influenza-infected cells express anti-viral interferon genes (Perez-Cidoncha et al., 2014; Killip et al., 2017). However, the extent of cell-to-cell variation in these and other host and viral factors remains unclear, as does the association among them in individual infected cells.
Here we use single-cell mRNA sequencing to quantify the levels of all cellular and viral mRNAs in cells infected with Influenza virus at low MOI. We 1nd extremely large variation in the amount of viral mRNA expressed in individual cells. Both co-infection and activation of innate-immune pathways are rare in our low-MOI infections, and do not appear to be the major drivers of cell-to-cell heterogeneity in viral transcriptional load. Individual infected cells often fail to express specific viral genes, and such gene absence explains some but certainly not all of the cell-to-cell heterogeneity. A variety of cellular genes, including ones involved in the oxidative-stress response, co-vary with viral transcriptional load. Overall, our work demonstrates remarkable heterogeneity in the transcriptional outcome of Influenza virus infection among nominally identical cells infected with a relatively pure population of virions.
Results
Strategy to measure mRNA in single virus-infected cells
We performed single-cell mRNA sequencing using a droplet-based system that physically isolates individual cells prior to reverse transcription (Zheng et al., 2017; Macosko et al., 2015; Klein et al., 2015). Each droplet contains primers with a unique cell barcode that tags all mRNAs from that droplet during reverse-transcription. Each primer also contains a unique molecular identifier (UMI) that is appended to each mRNA molecule during reverse transcription. The 3’ ends of the mRNAs are sequenced and mapped to the human and Influenza virus transcriptomes to determine transcript identities. This information is combined with that provided by the UMIs and cell barcodes to quantify the number of molecules of each mRNA species that have been captured for each cell.
Infected cells will express viral as well as cellular mRNAs – however the cell barcodes and UMIs cannot distinguish whether a cell was initially infected by one or multiple viral particles. We therefore engineered an Influenza virus (strain A/WSN/1933) that additionally carried viral barcodes consisting of synonymous mutations near the 3’ end of each transcript (Figure 1A). Critically, these synonymous mutations did not greatly impact viral growth kinetics (Figure 1B). We infected A549 human lung carcinoma cells with an equal mix of the wild-type and synonymously barcoded viruses. Cells infected by a single virion will exclusively express mRNAs from either wild-type or synonymously barcoded virus, whereas cells that are co-infected with multiple virions will often express mRNAs from both the wild-type and synonymously barcoded viruses (Figure 1C).
We took care to generate stocks of virus that were relatively “pure” of defective particles. Stocks of viruses typically contain an array of biologically active viral particles, some of which are defective for replication owing to mutations or deletions in essential viral genes (von Magnus, 1954; Huang et al., 1970; Brooke, 2014; Fonville et al., 2015; Lauring and Andino, 2010; Dimmock et al., 2014; Saira et al., 2013). These defective particles become prevalent when a virus is grown at high MOI, where complementation permits the growth of otherwise deleterious genotypes. To minimize the levels of defective particles, we propagated our viral stocks at low MOI for a relatively brief period of time (Xue et al., 2016). We validated that our stocks exhibited greater purity of infectious particles than a stock propagated at high MOI by verifying that they had a higher ratio of infectious particles to virion RNA (Figure 2A) and to particles capable of inducing expression of a single viral protein (Figure 2B). In addition, viral stocks with many defective particles are more immunostimulatory (Tapia et al., 2013; Lopez, 2014). We confirmed that our viral stocks induced less interferon than a stock propagated at higher MOI (Figure 2C).
Single cells show an extremely wide range of expression of viral mRNA
We infected A549 cells at low MOI with a mixture of the wild-type and synonymously barcoded viruses, and collected cells for sequencing at 6, 8, and 10 hours post-infection, including a replicate a 8-hours. We replaced the infection inoculum with fresh media at one-hour post-infection, thereby ensuring that most infection was initiated during a narrow time window. The exception was the replicate 8-hour sample, for which we did not perform this media change and instead left the cells in the original infection inoculum. We recovered between 3,000 and 4,000 cells for each sample (Figure 3A). As expected for a low-MOI infection, most cells expressed little or no viral mRNA (Figure 3B). Also as expected, the amount of viral mRNA per cell among infected cells increased over time (Figure 3B). But what was most notable was how widely the number of viral mRNA molecules varied among infected cells. While the fraction of mRNA derived from virus was <0.1% for most cells, viral mRNA constituted half the transcriptome in a few cells at 8 and 10 hours (Figure 3B).
A complicating factor is that uninfected cells could have small amounts of viral mRNA due to leakage of transcripts from lysed cells. It is therefore important to establish a threshold for identifying truly infected cells. We can do this by taking advantage of the fact that roughly half the infecting virions bear synonymous barcodes. Reads derived from lysed cells will be drawn from both wild-type and synonymously barcoded viral transcripts. However, most cells are infected by at most one virion, and so the reads from truly infected cells will usually derive almost entirely from one of the two viral variants. Figure 4A shows the fraction of viral reads in individual cells from each viral variant, and Figure 4B indicates the fraction of viral reads from the most abundant variant in that cell. Most cells with large amounts of viral mRNA have viral transcripts exclusively derived from one viral variant – indicating non-random partitioning as expected from viral infection. However, cells with a small amount of viral mRNA often have viral transcripts from both variants, as expected from the random partitioning associated with simple mRNA leakage. Finally, a few cells with large amounts of viral mRNA have viral transcripts from both variants, reflecting co-infection.
We determined the threshold amount of viral mRNA per cell at which the barcode partitioning clearly resulted from infection rather than leakage (Figure 4C), and used this threshold to annotate cells that we were confident were truly infected. We also annotated as co-infected cells above this threshold that had mRNA from both viral variants. Figure 4D shows the number of cells annotated as infected and co-infected for each sample – these cells are just a small fraction of the number of cells with any viral read. These annotation thresholds are conservative, and will miss some true low-level infections, as well as any co-infections with the same viral variant. However, it is important that the analyses below are restricted to cells that are truly infected with virus, so we accepted the loss of some low-level infections in order to avoid false positives. Because most cells are not infected, we subsampled the uninfected cells to the numbers shown in Figure 4D to balance the proportions of infected and uninfected cells for all subsequent analyses.
Strikingly, the extreme variation in the number of viral transcripts per cell remains even after we apply these rigorous criteria for annotating infected cells (Figure 4E). The fraction of viral mRNA per infected cell follows a roughly exponential distribution, with many cells having few viral transcripts and a few cells having many. At 6 and 8 hours <10% of infected cells are responsible for over half the viral transcripts, while at 10 hours ≈15% of infected cells produce over half the viral transcripts (Figure 4-Figure supplement 2). Notably, co-infected cells do not express dramatically more viral mRNA than other cells (Figure 4), suggesting that the initial infectious dose is not the major driver of cell-to-cell variation in our low-MOI infections.
Absence of viral genes partially explains cell-to-cell variability in viral load
The influenza genome is segmented, and cells can fail to express a viral mRNA if the encoding gene segment is not packaged in the infecting virion or fails to initiate transcription after infection. Indeed, several groups have reported that the majority of infected cells fail to express at least one viral gene (Brooke et al., 2013; Heldt et al., 2015; Dou et al., 2017). We wondered if the absence of specific viral genes might be associated with reduced amounts of viral mRNA within single infected cells. In particular, transcription of Influenza virus mRNAs is performed by the viral ribonucleoprotein (RNP) complex, which consists of the three proteins that encode the tripartite polymerase (PB2, PB1, and PA) as well as nucleoprotein (NP) (Huang et al., 1990). Each viral gene segment is associated with one RNP in incoming infecting virions, but secondary transcription by newly synthesized RNPs requires the presence of the viral genes encoding each of the four RNP proteins (Vreede et al., 2004; Eisfeld et al., 2015). This secondary transcription is a major source of viral mRNAs, as evidenced by the fact that blocking synthesis of the RNP proteins reduces the amount of viral mRNA by several orders of magnitude in bulk cells (Figure 5-Figure supplement 1).
We examined the total amount of viral mRNA versus the expression of each viral gene (Figure 5A, Figure 5-Figure supplement 2). Cells that lack an RNP gene never derive more than a few percent of their mRNAs from virus, confirming the expected result that all four RNP genes are essential for high levels of viral transcription. However, we observe cells that lack each of the other non-RNP genes but still derive ≈40% of their mRNAs from virus, suggesting that none of the other genes are important for high levels of viral transcription. These results are statistically supported by Figure 5B, which shows that absence of any RNP gene but not any other viral gene is associated with reduced amounts of viral mRNA. However, gene absence clearly does not explain all of the variability in viral gene expression, since even cells expressing all viral genes exhibit a very wide distribution in the amount of viral mRNA that they express. Specifically, at both 8 and 10 hours, the amount of viral mRNA in individual cells expressing all eight viral genes still ranges from <1% to >50% (Figure 5A, Figure 5-Figure supplement 2).
We also quantified the fraction of infected cells that completely failed to express a given gene. We limited this analysis to examining the presence / absence of the non-RNP genes in cells expressing all four RNP genes, since we might fail to detect viral transcripts that are actually present at low levels in RNP-deficient cells due to the lower viral burden in these cells. At the 8- and 10-hour time points, between 5% and 17% of cells fail to express any one of the four non-RNP genes (Figure 5C). The absence of a given gene appears to be an independent event, as the probability of observing all four non-RNP genes in a cell is well predicted by simply multiplying the probabilities of observing each gene individually (Figure 5C). If we extrapolate the frequencies at which cells lack non-RNP genes to the RNP genes, then we would predict that 40-50% of infected cells express mRNAs from all eight genes. This estimate of the frequency at which infected cells express mRNAs from all eight gene segments is slightly higher than previous estimates of 13% (Brooke et al., 2013) and 20% (Dou et al., 2017). At least one difference is that Brooke et al. (2013) stained for proteins whereas we examined the expression of mRNAs – it is likely that some cells contain mutated viral genes that fail to produce stable protein even when mRNA is expressed.
The relative amounts of different viral mRNAs are more consistent across cells
The results above show that the amount of viral mRNA in infected cells varies over several orders of magnitude. Does the relative expression of viral genes exhibit similar cell-to-cell variability? To address this question, we focused on cells that derived >5% of their mRNA from virus, since estimates of relative viral gene expression will be less noisy in cells with more viral mRNAs.
In contrast to the extreme variability in the total viral mRNA per cell, the fraction of this mRNA derived from each gene is much more consistent across cells (Figure 6A). Total viral mRNA varies by orders of magnitude, but the fraction from any given viral gene is fairly tightly clustered around the median value for all cells (Figure 6B). The relative levels of each viral mRNA in our cells are similar to prior bulk measurements made by Northern blots (Hatada et al., 1989), which also found an expression hierarchy of M > NS ≫ NP > NA > HA ≫ PB2 ~ PB1 ~ PA. The cell-to-cell consistency in the relative expression of different viral genes is even tighter if we limit the analysis only to cells that express all eight viral genes (Figure 6C,D). Therefore, with the exception of complete gene absence, the factors that drive the dramatic cell-to-cell variability in the amount of viral mRNA have roughly similar effects on all viral genes in a given cell. This 1nding is consistent with prior work showing positive correlations among the abundance of several viral genome segments in individual cells (Heldt et al., 2015). It is also reminiscent of the observation that total cellular mRNA varies among single cells as a function of factors such as volume that affect all mRNA species concordantly (Padovan-Merhar et al., 2015).
Co-infection can provide infected cells with the full complement of viral genes
Our sequencing enables us to identify the rare cells that were co-infected with both wild-type and synonymously barcoded viral variants. Overall, we captured 10 such co-infected cells that had >5% of their mRNA derived from virus (Figure 7). Seven of these 10 cells expressed all eight viral genes. Remarkably, the majority (4 of 7) of these cells would not have expressed all the viral genes in the absence of co-infection, since they have at least one gene exclusively derived from each viral variant. For instance, the cell with 11.2% of its mRNA from virus in the upper right of Figure 7 expresses M only from the wildtype viral variant, and NP and HA only from the synonymously barcoded variant. Our data therefore provide direct single-cell support for the idea that co-infection can rescue missing viral genes (Brooke et al., 2013, 2014; Fonville et al., 2015; Aguilera et al., 2017).
Another observation from Figure 7 is that co-infected cells usually express roughly equal amounts of transcripts from each of the two viral variants. This observation is consistent with the 1nding by Dou et al. (2017) and Huang et al. (2008) that the temporal window for co-infection is short – if both viral variants infect a cell at about the same time, then neither will have a headstart and so each will have a roughly equal opportunity to transcribe its genes.
To support this idea with a larger dataset albeit at lower resolution, we generated a virus in which the HA coding sequence was replaced by GFP. We then co-infected cells with a mix of wildtype and ΔHA-GFP virus and used 2ow cytometry to score cells for the presence of HA only (infection by wildtype virus), GFP only (infection by ΔHA-GFP virus), or both (co-infection) as shown in Figure 7 Figure supplement 1. As in our single-cell sequencing data, we found that expression of HA and GFP were highly correlated, indicating that co-infected cells typically expressed roughly equal amounts of transcript from each viral variant.
Activation of the interferon response is rare in single infected cells
Because our sequencing captured all polyadenylated transcripts, we can examine whether there are prominent changes in the host-cell transcriptome in sub-populations of infected cells. Influenza virus infection can trigger innate-immune sensors that lead to the transcriptional induction of type I and III interferons, and subsequently of anti-viral interferon-stimulated genes (Killip et al., 2015; Iwasaki and Pillai, 2014; Crotta et al., 2013). However, activation of the interferon response is stochastic and bi-modal at the level of single cells (Chen et al., 2010; Shalek et al., 2013, 2014; Perez-Cidoncha et al., 2014; Bhushal et al., 2017; Hagai et al., 2017). We therefore hypothesized that we might see two sub-populations of infected cells: one in which the interferon response inhibited viral transcription, and another in which the virus was able to express high levels of its mRNA by evading or blocking this response.
To examine whether there were distinct sub-populations of virus-infected cells, we used a semi-supervised t-SNE approach (Van der Maaten and Hinton, 2008) to cluster cells by genes that co-varied with viral infection status. As shown in Figure 8A,B, this approach effectively grouped cells by the amount of viral mRNA that they expressed. Sample-to-sample variation was regressed away during the clustering, as cells did not obviously group by time-point, with expected exception that the uninfected and 6-hour samples had few cells in the region of the plot corresponding to large amounts of viral mRNA (Figure 8C).
But to our surprise, we did not see a prominent clustering of infected cells into sub-populations as expected if the interferon response was strongly activated in some cells. To investigate further, we annotated each cell by the total number of type I and III interferon transcripts detected. Remarkably, only a single cell expressed detectable interferon (Figure 8D). We also examined interferon-stimulated genes, which are induced by autocrine and paracrine interferon signaling. Figure 8E shows expression of one such gene, IFIT1 (Fensterl and Sen, 2011). As with interferon itself, expression of IFIT1 was rare and most prominent in the single interferon-positive cell, presumably due to the higher eZciency of autocrine versus paracrine signaling. Notably, interferon and interferon-stimulated genes were also relatively ineffective at blocking viral transcription in the single cell in which they were potently induced, since >10% of the mRNA in this cell was derived from virus (Figure 8A,B,D,E).
We posited that the paucity of interferon induction might be due to the activity of Influenza virus’s major interferon antagonist, the NS1 protein (García-Sastre et al., 1998; Hale et al., 2008). We therefore identified cells that expressed substantial amounts of viral mRNA but lacked the NS gene (Figure 8F). Consistent with the idea that NS1 is important for suppressing interferon, the one interferon-positive cell lacked detectable expression of the NS gene. But other cells that lacked NS expression still failed to induce a detectable interferon response, despite often having a substantial amount of their mRNA derived from virus (Figure 8). This result is in line with other work showing that NS1-deficient Influenza virus does not deterministically induce interferon (Killip et al., 2017; Kallfass et al., 2013). Therefore, many individual infected cells fail to activate innate-immune responses even when the virus lacks its major interferon antagonist.
Some host genes co-vary with viral gene expression
We examined whether any host genes were differentially expressed in cells with more viral mRNA. We restricted this analysis to infected cells with all eight viral genes in order to focus on cellular genes that were associated with viral mRNA burden independent of effects due to the presence or absence of particular viral transcripts. We identified 45 cellular genes that co-varied with viral gene expression at a false discovery rate of 0.1 (Figure 9).
Many of the genes with increased expression in cells with more viral mRNA are known or suspected to be regulated by the Nrf2 master regulator in response to oxidative stress. These genes produce proteins that are involved in detoxification of reactive oxygen species or resultant products, the management of misfolded proteins, the electron transport chain, or a general stress response (Figure 9-Figure supplement 1). We additionally see reduced expression of the nitric oxide synthase interacting protein (NOSIP). Transient oxidative stress is known to occur during viral infection, and may act in a proviral fashion via MAPK activation driving vRNP export (Amatore et al., 2014). The antioxidant response is thought to be largely antiviral, potentially through inhibition of MAPK activity (Lin et al., 2016; Sgarbanti et al., 2014). Our data do not reveal whether the expression of genes involved in the response to oxidative stress are a cause or a symptom of higher levels of viral mRNA, and further investigation of this topic is an interesting area for future work.
Discussion
We have quantified the total transcriptome composition of single cells infected with Influenza virus. While we observe a general increase in the amount of viral mRNA over time as expected from bulk measurements (Hatada et al., 1989; Shapiro et al., 1987), there is wide variation in viral gene expression among individual infected cells.
The most obvious form of heterogeneity is the complete failure of some infected cells to express one or more viral genes, which we estimate occurs in about half the infected cells in our experiments. The absence of some viral genes in some infected cells has been noted previously (Brooke et al., 2013; Heldt et al., 2015; Dou et al., 2017), and our work provides a holistic view by quantifying the total viral transcriptional load as a function of the level of each mRNA. We 1nd that cells lacking expression of any of the four genes that encode the viral RNP express much less total viral mRNA, consistent with prior bulk studies (Vreede et al., 2004; Eisfeld et al., 2015). Interestingly, the reason some cells fail to express some viral genes remains unclear. The prototypical Influenza virion packages one copy of each of the eight gene segments (Noda et al., 2006; Hutchinson et al., 2010), but some virions surely package fewer (Brooke et al., 2014). However, it is also possible that much of the viral gene absence is due to stochastic loss of viral RNPs after infection but prior to the initiation of viral transcription in the nucleus.
The absence of viral genes only partially explains the cell-to-cell variation in amount of viral mRNA, which still varies from <1% to >50% among cells expressing all the viral genes. It is likely that other viral genetic factors explain some of this remaining heterogeneity. The 3’-end sequencing strategy used in our experiments detects the presence of a viral gene, but does not identify whether that gene contains a mutation that might hinder viral replication. However, viral mutations are also unlikely to explain all the observed heterogeneity, since current consensus estimates of Influenza virus’s mutation rate suggest that the typical virion in a stock such as the one used in our experiment should contain less than one mutation per genome (Parvin et al., 1986; Suárez et al., 1992; Suárez-López and Ortín, 1994; Nobusawa and Sato, 2006; Bloom, 2014; Pauly et al., 2017).
The rest of the heterogeneity must be due to some combination of cellular factors and inherent stochasticity. Some features of the cellular transcriptome co-vary with the amount of Influenza mRNA. In particular, the viral load in individual cells is associated with the expression of genes involved in response to cellular stresses, including oxidative stress. It will be interesting to determine if these cellular transcriptional signatures are simply a consequence of the stress imposed by viral replication, or if their stronger activation in some cells is a causative factor that promotes viral transcription. However, it also would not be surprising if a substantial amount of the cell-to-cell heterogeneity cannot be ascribed to pre-existing features of either the viral genome or cellular state. Apparently stochastic heterogeneity is a common feature of many processes at a single-cell level (Cai et al., 2006; Raj et al., 2006; Buganim et al., 2012; Shalek et al., 2013; Avraham et al., 2015) – especially when those processes are initiated by very small numbers of initial molecules (Elowitz et al., 2002), as is the case for low-MOI viral infection.
Our data do suggest that the factors driving the heterogeneity in viral transcriptional load exert relatively concordant effects on all viral genes in a given cell. Specifically, despite the extreme heterogeneity in total viral mRNA per cell, the relative levels of the viral mRNAs are reasonably consistent across cells, and generally reflective of classical bulk measurements (Hatada et al., 1989). Therefore, despite the stochasticity inherent in initiating transcription and replication of each gene from a single copy carried by the incoming virion, as long as a gene is not completely lost then the virus possesses mechanisms to control its relative expression (Shapiro et al., 1987; Hatada et al., 1989; Perez et al., 2010; Heldt et al., 2012; Chua et al., 2013).
One factor that surprisingly does not appreciably contribute to the heterogeneity in our experiments is activation of innate-immune interferon pathways. Only one of the hundreds of virus-infected cells expresses any detectable interferon, despite the fact that a number of cells fail to express the Influenza-virus interferon antagonist NS1. It is known that interferon activation is stochastic at the level of single cells in response to both synthetic ligands (Shalek et al., 2013, 2014; Bhushal et al., 2017; Hagai et al., 2017) and actual infection (Rand et al., 2012; Perez-Cidoncha et al., 2006; Avraham et al., 2015; Killip et al., 2017). But interferon expression is a prominent transcriptional signature of high-MOI Influenza virus infection of bulk cells, including in the epithelial cell line and at the time-points used in our experiments (Geiss et al., 2002; Sutejo et al., 2012). So it is notable how rarely single cells express interferon. Interferon expression would surely be more common at later times or with a viral stock passaged at higher MOI, since paracrine interferon signaling (Crotta et al., 2013) and accumulation of defective viral particles enhance innate-immune detection (Tapia et al., 2013; Lopez, 2014). However, the early events of physiological Influenza infection involve just a few virions (Varble et al., 2014; McCrone et al., 2017), and so it is interesting to speculate whether rare events such as interferon activation during the 1rst few cycles of viral replication could contribute to heterogeneity in the eventual outcome of infection.
Overall, our work shows the power and importance of charactering cellular infection at the level of single cells (Avraham et al., 2015). The dynamics of viral infection in any given cell is shaped by the genetic composition of the incoming virion, the host-cell state, the bi-modality of innateimmune activation, and the inherent stochasticity of molecular processes initiated by a single copy of each viral gene. We have shown how the confluence of these factors leads to extreme cell-to-cell heterogeneity in the transcriptional outcome of Influenza virus infection. Further deconstruction of the contributions of each factor will enable a deeper understanding of how the bulk features of infection emerge from the processes occurring within individual virus-infected cells.
Methods and Materials
Cell lines and viruses
The following cell lines were used in this study: the human lung epithelial carcinoma line A549 (ATCC CCL-185), the MDCK-SIAT1 variant of the Madin Darby canine kidney cell line overexpressing human SIAT1 (Sigma-Aldrich 05071502), and the human embryonic kidney cell line 293T (ATCC CRL-3216). All cells were maintained in D10 media (DMEM supplemented with 10% heat-inactivated fetal bovine serum, 2 mM L-glutamine, 100 U of penicillin/ml, and 100 μg of streptomycin/ml) at 37 °C at 5 % CO2.
Wildtype A/WSN/1933 (H1N1) Influenza virus was generated by reverse genetics using the plasmids pHW181-PB2, pHW182-PB1, pHW183-PA, pHW184-HA, pHW185-NP, pHW186-NA, pHW187-M, and pHW188-NS (Hoffmann et al., 2000). The sequences of the viral RNAs encoded in these plasmids are in Figure 1-source data 1. Reverse-genetics plasmids encoding the synonymously barcoded WSN virus were created by using site-directed mutagenesis to introduce two synonymous mutations near the 3’ end of the mRNA for each viral gene. The sequences of the synonymously barcoded viral RNAs are in Figure 1-source data 1.
To generate viruses from these plasmids, we transfected an equimolar mix of all eight plasmids into cocultures of 293T and MDCK-SIAT1 cells seeded at a ratio of 8:1. At 24 hours post-transfection, we changed media from D10 to Influenza growth media (Opti-MEM supplemented with 0.01% heatinactivated FBS, 0.3% BSA, 100 U of penicillin/ml, 100 μg of streptomycin/ml, and 100 μg of calcium chloride/ml). At 48 hours post-transfection we harvested the virus-containing supernatant, pelletted cellular material by centrifugation at 300 x g’s for 4 minutes, and stored aliquots of the clarified viral supernatant at -80 °C. We then titered thawed aliquots of viral by TCID50 on MDCK-SIAT1 cells, computing titers via the formula of Reed and Muench (1938). To generate our “high-purity” stocks of viruses for the single-cell sequencing experiments, we then infected MDCK-SIAT1 cells at an MOI of 0.01, and let the virus replicate for 36 hours prior to harvesting aliquots that were again clarified by low-speed centrifugation, aliquoted, stored at -80 °C, and titered by TCID50. The high-MOI passage (high-defective particle) stock used in Figure 2 was generated by instead passaging in MDCK-SIAT1 cells twice at an MOI of 1 for 48 hours.
For the experiments in Figure 7-Figure supplement 1, we created a virus that carried an HA gene segment in which GFP replaced most of the HA coding sequence, following a scheme 1rst described by Marsh et al. (2007). Briefly, we created a plasmid encoding a viral RNA with GFP in place of the HA coding sequence in the context of the pHH21 (Neumann et al., 1999) reverse-genetics plasmid, removing potential start codons upstream of the GFP (see Figure 7-source data 2 for the sequence of the viral RNA). We then generated GFP-carrying virus by reverse-genetics in cells constitutively expressing HA (Doud and Bloom, 2016). To obtain suZcient titers, this HA-eGFP virus was expanded for 44 rather than 36 hours after initiating infection at an MOI of 0.01.
qPCR
For the qPCR in Figure 2 and Figure 5-Figure supplement 1, A549 cells were seeded at 3×105 cells per well in a 6-well tissue culture plate in D10 the day prior to infection. On the day of infection, a single well was trypsinized and the cells were counted in order to determine the appropriate amount of virus to use to achieve the intended MOI. Immediately before infection, D10 was replaced with Influenza growth media. For cells incubated with cyclohexamide, the compound was added to a 1nal concentration of 50 μg/ml at the time of infection – previously confirmed to be suZcient to halt viral protein production (Killip et al., 2014). RNA was purified using the QIAGEN RNeasy plus mini kit following manufacturer’s instructions. cDNA was synthesized using an oligoDT primer and the SuperScriptTM III 1rst-strand synthesis supermix from ThermoFisher using the manufacturer’s protocol. Transcript abundance was measured using SYBRTM green PCR master mix, using a combined anneal/extension step of 60 °C for one minute with the following primers: HA: 5’-GGCCCAACCACACATTCAAC-3’, 5’-GCTCATCACTGCTAGACGGG-3’, IFNB1: 5’-AAACTCATGAGCAGTCTGCA-3’, 5’-AGGAGATCTTCAGTTTCGGAGG-3’, L32: 5’-AGCTCCCAAAAATAGACGCAC-3’, 5’-TTCATAGCAGTAGGCACAAAGG-3’. Biological triplicates were performed for all samples.
For the measurements of viral genomic HA content in Figure 2A, vRNA was harvested from 80 μl of viral supernatant by the addition of 600 μl of RLT plus before proceeding with the standard QIAGEN RNeasy Plus Mini kit protocol. The cDNA was generated using SuperScriptTM III 1rst-strand synthesis supermix using the manufacturer’s protocol, and using the universal vRNA primers of Hoffmann et al. (2001) with the modifications described in Xue et al. (2017). The qPCR was then performed as for mRNA measurements. A standard curve was generated from three independent dilutions of the HA-encoding reverse genetics plasmid. All vRNA values represent three independent RNA extractions with two replicate qPCR measurements.
Flow cytometry titering and analyses
To determine viral titers in terms of HA-expressing units and for the 2ow cytometry in Figure 7-Figure supplement 1, A549 cells were seeded in a 6-well plate and infected as described above for the qPCR analyses. Cells were harvested by trypsinization, resuspended in phosphate-buffered saline supplemented with 2% heat-inactivated FBS, and stained with 10 μg/ml of H17-L19, a mouse monoclonal antibody confirmed to bind to WSN HA in a prior study (Doud et al., 2017). After washing in PBS plus 2% FBS, the cells were stained with a goat anti-mouse IgG antibody conjugated to APC. Cells were then washed, 1xed in 1% formaldehyde, and washed further before a 1nal resuspension and analysis. We then determined the fraction of cells that were HA positive and calculated the HA-expressing units.
For Figure 7-Figure supplement 1, after gating to exclude multiplets in FlowJo, data were extracted using the R package 2owCore (Le Meur et al., 2007) and analyzed using a custom Python script. Guassian kernel density estimates were obtained using the scipy stats package method, guassian_kde, using automatic bandwidth determination (van der Walt et al., 2017).
Infections for single-cell mRNA sequencing
Single-cell sequencing libraries were generated using the 10x Chromium Single Cell 3’ platform (Zheng et al., 2017) using the V1 reagents.
All time points except for the 8-hour replicate were prepared on the same day. For the infections, A549 cells were seeded in a 6-well plate, with two wells per time point. A single well of cells was trypsinized and counted prior to initiation of the experiment for the purposes of calculating MOI. Wild-type and synonymously barcoded virus were mixed to an estimated ratio of 1:1 based on prior, exploratory, single-cell analyses (data not shown). At the initiation of our experiment, the wells for all time points were changed from D10 to Influenza growth media. Cells were then infected with 0.3 HA-expressing units of virus per cell (a determined by 2ow cytometry). The infections were performed in order of time point: 1rst the 10-hour time point, then the 8-hour, and then the 6-hour time point. At one hour after infection, the media for each time point was changed to fresh Influenza growth media. Note that the HA-expressing units were calculated without this additional washing step, and so likely represent an overestimate of our 1nal infectious dose (consistent with the fact that fewer than 30% of cells appear infected in the single-cell sequencing data). All cells were then harvested for single-cell analysis concurrently – ensuring all had spent equivalent time in changed media. For our replicate 8-hour time point, cells were infected as above except that the cells were infected at 0.1 HA-expressing units of virus per cell but no wash step was performed, and the sample was prepared on a different day. After harvest, cells were counted using disposable hemocytometers and diluted to equivalent concentrations with an intended capture of 3000 cells/sample following the manufacturer’s provided by 10x Genomics for the Chromium Single Cell platform. All subsequent steps through library preparation followed the manufacturer’s protocol. Samples were sequenced on an Illumina HiSeq. The deep sequencing data are available on the Sequence Read Archive under accession [Upload in progress.]
Computational analysis of single-cell mRNA sequencing data
Jupyter notebooks that perform all of the computational analyses are available in Supplementary file 1.
Briefly, the raw deep sequencing data were processed using the 10X Genomics software package CellRanger (version 2.0.0). The reads were aligned to a concatenation of the human and influenza virus transcriptomes. The human transcriptome was generated by 1ltering genome assembly GRCh38 for protein coding genes defined in the GTF File GRCh38.87. The Influenza virus transcriptome was generated from the reverse-complement of the wildtype WSN viral RNA sequences as encoded in the reverse-genetics plasmids (Figure 1-source data 1). CellRanger calls cells based on the number of observed cell barcodes, and creates a cell-gene matrix. We used custom Python code to annotate the cells in this matrix by the number of viral reads that could be assigned to the wildtype and synonymously barcoded virus. Only about half of the viral reads overlapped the barcoded regions of the genes (Figure 1A) and could therefore be assigned to a viral barcode (Figure 4-Figures supplement 1). So for calculations of the number of reads in a cell derived from each viral barcode for each viral gene, the total number of detected molecules of that gene are multiplied by the fraction of those molecules with assignable barcodes that are assigned to that barcode. This annotated cell-gene matrix is in Supplementary File fl. A Jupyter notebook that performs these analyses is in Supplementary file 1.
The annotated cell-gene matrix was analyzed in R, primarily using the Monocle package (version 2.4.0) (Qiu et al., 2017; Trapnell et al., 2014). A Jupyter notebook that performs these analyses is in Supplementary File 1. For each sample, cell barcodes that had >2.5-fold fewer or more UMI counts mapping to cellular transcripts than the sample mean were excluded from downstream analyses (see red vertical lines in Figure 3B).
In order to determine an appropriate cutoff for how many reads a cell needed to contain in order to be classified as infected, we calculated the mean viral barcode purity across all cells that contained at least a given fraction of viral mRNA and had multiple viral reads that could be assigned a barcode (Figure 4B,C). We then determined the threshold fraction of viral mRNA at which the mean purity no longer increased as a function of the amount of viral mRNA. This threshold represents the point at which we have effectively eliminated cells that have low barcode purity simply due to lysis-acquired reads sampled randomly from both viral barcodes. The 10-hour sample has the most viral mRNA, and so is expected to have the most lysis-associated influenza reads. We therefore determined the threshold using just this sample, and then applied this threshold to all samples. The threshold is shown in Figure 4C. This procedure is expected to be conservative, and may miss some truly infected cells with very low amounts of viral mRNA. For subsequent analyses, we retained all infected cells and a subsample of uninfected cells (the greater of 50 or the number of infected cells for that sample). The rationale for subsampling the uninfected cell is that the vast majority of cells are uninfected, and we did not want these cells to completely dominate the downstream analyses. Cells were classified as co-infected if both viral variants had an RNA level that exceeded the threshold, and if the minor variant contributed at least 5% of the viral mRNA.
For the semi-supervised t-SNE clustering, we used Monocle’s cell hierarchy function to bin cells into those with no viral mRNA, <2% viral mRNA, between 2% and 20% viral mRNA, and >20%. Candidate marker genes for t-SNE dimensionality reduction were then determined using the Monocle function markerDiffTable, excluding the effects of sample variation and the number of genes identified in a given cell, using a q-value cutoff of 0.01. The specificity of these markers was determined using the function calculateMarkerSpecificity – the top 40 markers were retained, and used to place populations in a two-dimensional plane based on tSNE dimensionality reduction.
For the analyses of cellular genes that differed in expression as a function of the amount of viral mRNA, we only considered cells that expressed all 8 viral mRNAs to avoid effects driven simply by viral gene absence. We also only considered cellular genes in the differential gene analysis, since viral gene expression will tautologically co-vary with the amount of viral mRNA. Additionally, because influenza virus has the capacity to degrade or prevent the synthesis of host mRNAs (Bercovich-Kinori et al., 2016) and contributes significantly to the total number UMIs in some cells, we calculate size factors (a scalar value representing eZciency of UMI capture) based on cellular transcripts alone. Finally, we assigned all cells a ceiling fraction of mRNA from virus of 25% so that a few extremely high-expressing cells did not dominate. Cellular genes with expression that co-varied with the fraction of viral mRNAs in a cell were then determined using the Monocle differentialGeneTest, after removing variance explained by sample to sample variation. Figure 9 shows all genes that were significantly associated with the fraction of mRNA from virus at a false discovery rate of 0.1.
Acknowledgments
We thank Xiaojie Qiu for advice about use of the Monocle software package, David Bacsik and Robert Bradley for comments on the manuscript, and the Fred Hutch Genomics Core for performing the Illumina deep sequencing. This work was supported by NIH grant R01GM102198 (to JDB), NIH grant R01AI127893 (to JDB), NIH grant DP2HD088158 (to CT), a Burroughs Wellcome Young Investigator in the Pathogenesis of Infectious Diseases grant (to JDB), and a W. M. Keck Foundation grant (to CT). ABR is supported by a Damon Runyon Fellowship. JDB is partly supported by a Faculty Scholars Award from the Howard Hughes Medical Institute and the Simons Foundation. CT is partly supported by an Alfred P. Sloan Foundation Research Fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.