Abstract
In embryonic development, cells must differentiate through stereotypical sequences of intermediate states to generate mature states of a particular fate. By contrast, direct programming can generate similar fates through alternative routes, by directly expressing terminal transcription factors. Yet the cell state transitions defining these new routes are unclear. We applied single-cell RNA sequencing to compare two mouse motor neuron differentiation protocols: a standard protocol approximating the embryonic lineage, and a direct programming method. Both undergo similar early neural commitment. Then, rather than transitioning through spinal intermediates like the standard protocol, the direct programming path diverges into a novel transitional state. This state has specific and abnormal gene expression. It opens a ‘loop’ or ‘worm hole’ in gene expression that converges separately onto the final motor neuron state of the standard path. Despite their different developmental histories, motor neurons from both protocols structurally, functionally, and transcriptionally resemble motor neurons from embryos.
Introduction
Embryonic development proceeds through defined intermediate states, such as germ layer intermediates, and lineage-specific progenitors. Intermediates bifurcate into multiple states over time, and specialize their behaviors, ultimately producing a lineage tree that defines each mature cell type by a particular sequence of intermediates. This was first appreciated through classical lineage tracing and cell ablation studies. These studies showed that specifically labeled intermediate states generate stereotyped sets of downstream cell types, and that these downstream cell types fail to form if an intermediate that is upstream in their lineage is ablated1,2. Furthermore, embryos in general do not produce a mature cell type through multiple differentiation paths.
In contrast to this rigid and hierarchical process, recent protocols that experimentally directly program cell fate suggest that the exact sequence of intermediates defining a lineage may be more flexible3-11. These studies reveal that mature cell states can be reached through paths that do not involve activation of the intermediate progenitor genes that are essential in embryos. Mouse embryonic stem cells (mESCs), for example, can be converted into motor neurons (MNs) by a process that involves overexpression of three transcription factors, Ngn2+Isl1+Lhx33,12, and that never expresses the neural progenitor transcription factors Sox1 and Olig23. mESCs can also be driven rapidly into a terminal muscle phenotype without normal upregulation of intermediate genes such as Pax7 and Myf5, through a combination of cell-cycle inhibition and MyoD overexpression4. This plasticity of differentiation extends further, to the interconversion of mature cell states. Fibroblasts can be converted into mature neuron phenotypes6,11, including MNs5, seemingly without completely dedifferentiating and retracing the embryonic lineage, as indicated by lack of expression of specific core pluripotency (Oct4, Sox2 and Nanog) and neural progenitor genes (Nestin)5.
Although these direct programming (DP) experiments imply the existence of differentiation paths that differ from those in embryos, much of what actually occurs in these new programs remains mysterious. Does DP bypass normal intermediates by short-circuiting the natural lineage, or does it transition through alternative intermediates (Fig. 1a)? Does it diverge only briefly to bypass specific early or late states, or does it utilize an entirely distinct path (Fig. 1b)? And can DP converge fully to the same final state that is produced in embryos despite taking an alternative path (Fig. 1c)? These questions have been challenging to answer in part due to the high degree of heterogeneity in direct programming experiments, where unbiased bulk measurements of global gene expression obscure changes, and also because marker genes allowing the isolation of new but potentially important DP-specific intermediates are not known in advance. Here we aimed to overcome these issues by applying single cell RNA sequencing to compare the gene expression trajectories of DP and growth factor guided differentiation of mESCs over time into MNs. Our core research questions are summarized in Figure 1.
Results
Dissection of two MN differentiation protocols using InDrops single cell RNA sequencing
We compared two in vitro differentiation protocols that convert mESCs into MNs. Spinal MNs were chosen for study because these protocols have been highly optimized. The first, standard protocol (SP) is a widely used method that attempts to recapitulate the known embryonic intermediates through sequential exposure of developmental signals (Fgfs, Retinoic Acid, and Sonic hedgehog)(Fig. 1A)13,14. It provides an approximation of the lineage through which motor neurons develop in the embryo. The second, direct programming (DP) protocol involves driving the expression of transcription factors (Ngn2+Isl1+Lhx3)3,12 that characterize the mature motor neuron state and at the same time favoring and stabilizing the G1 state by incubating in a growth factor free medium4.
We used single-cell RNA sequencing (InDrops)15 to track the differentiation trajectories of both protocols over time (Figs. 1B and 1C). Single-cell data has emerged as a powerful way to trace differentiation processes, particularly in populations that are not pure and that contain rare intermediates11,16-20. We profiled a total of 4,590 cells sampled from early (day 4/5) and late (day 11/12) timepoints for each protocol, and also used our previously published data from 975 mES cells15. To visualize the single cell data and identify cell states we applied t-distributed stochastic neighbor embedding (tSNE) to reduce dimensionality21,22, defined cell states using an unsupervised density gradient clustering approach, and then found specific marker genes with known annotations to reveal the identity of each state (Fig. 2B - 2E; Supp. Figs. S1 and S2; Supplementary methods). For each protocol, the dominant feature was a continuous gene expression trajectory sweeping across the 2D plot. These trajectories correlate with chronology: they begin with mESCs, pass through neural progenitor states and terminate in mature MN states. In both protocols we also observed a mixture of off-target differentiation byproducts from all three germ layers.
Our single cell data allowed us to define the efficiency of MN production for each method. For DP, MN production was observed as early as day 4 (19%), and increased over time to 66% by day 11 (Fig. 2F). A minority of off-target neuron subtypes, glia, and miscellaneous cells were also identified. In the SP 23% and 9% of the population resembled MNs at days 5 and 12 respectively (Fig. 2G). This lower efficiency was accompanied by a far larger fraction of off-target products including oligodendrocytes (7.2%), astrocytes (8.6%), muscle (30.6%), and stroma (8.9%) that together accounted for 55.3% of the population by day 12.
The DP differentiation trajectory lacks intermediates expressing Olig2 and Nkx6-1
What are the differentiation paths taken by each protocol? In the SP differentiation path, cells transit through seven states (Fig. 2C and 2E). These state transitions parallel patterning events in the embryo13,23,24: cells first commit to the neural lineage (Sox1+/Sox3+), then are posteriorized (Hoxb8+/Hoxd4+), ventralized (Nkx6-1+/Olig2+), enter the committed MN progenitor state (Mnx1+), and then mature into a neuronal phenotype (Tubb3+/Map2+). This is not a surprise as the growth factor cocktail defining this method was designed to reflect the signaling events taking place in the embryo. By contrast, we found that the path produced by DP was condensed relative to the SP path (Fig. 2B and 2D), consisting of only four states as opposed to seven. After neural commitment (Sox1+), cells immediately began expressing committed MN markers (Mnx1+/Tubb3+), seemingly without the typical spinal embryonic intermediates (Olig2-/Nkx6-1-). A lack of Olig2 expression during DP has been observed previously3, and our results confirm at the single cell level that intermediates expressing Olig2 and Nkx6-1 appear entirely absent. Olig2 is necessary for MN development in embryos1, indicating the DP drives differentiation from ESC into MNs through a new route.
We confirmed that our inferred dynamics from snapshot single-cell data correspond to the actual underlying differentiation dynamics by performing a dense qPCR time course for a panel of MN genes (Supp. Fig. S3). These bulk measurements confirmed that, for DP, committed MN markers are upregulated immediately following early neural progenitor genes in real time.
The DP and SP trajectories bifurcate after early neural commitment and converge separately to the MN state
Since the DP omits spinal embryonic intermediates characteristic of the SP path, there must be one of two possible trajectories. Either DP must discontinuously transition from an early neural progenitor into a MN, or it must transit through alternate intermediate state(s). To determine which of these possibilities was the case, we employed a data visualization technique called SPRING25 to directly compare the topology of both paths. While tSNE is a powerful method for identifying discrete cell states, SPRING provides a complementary description emphasizing continuum gene expression topologies. SPRING builds a k-nearest-neighbor graph over cells in high-dimensional gene expression space, and then renders an interactive 2D visualization of the cell graph using a force directed layout. This representation revealed that the DP and SP trajectories overlap during early neural commitment, but that they then bifurcate and transit distinct paths that converge independently to the same MN state (Fig. 3A). The dynamics of gene expression over these trajectories resembled the behavior inferred using tSNE, with DP omitting intermediate progenitor genes following its bifurcation from the SP path (Fig. 3B).
The bifurcation and subsequent convergence of the two differentiation paths can also be appreciated by two other complementary analyses. Pairwise cosine similarities between the cell states from both trajectories (Fig. 3C; Supplementary methods) indicate similarities between the early states (ESC and NP; cosine similarity > 0.64) and late states (LMN; cosine similarity = 0.55), but not the intermediate states (PNP, PVNP, and MNP; cosine similarity < −0.28, −0.09, and 0.06 respectively). We also assigned every individual cell along the DP path to its most similar cluster in the SP path using a maximum likelihood method (Fig. 3D; Supplementary methods). This showed that it was virtually impossible to find a single cell resembling the SP intermediate progenitors in the DP approach. Similarity was again seen only at the early and late states.
DP transitions through an abnormal intermediate state with forebrain gene expression
The bifurcation of the SP and DP trajectories leads to different intermediate cell states in each case. A total of 26 transcription factors (TFs) are differentially expressed between the DP and SP intermediate states (Fig. 4A). A majority of these (61%) were involved in an anterior-posterior positional gene expression axis. The SP intermediates were enriched more than 6-fold for nine posterior and spinal TFs including Olig2, Nkx6-1, Lhx3, and six Hox genes with a corrected p-value < 0.001. Each of these TFs is expressed in embryonic MNs. By contrast, the DP intermediates were enriched for seven forebrain TFs including Otx2, Otx1, Crx, Six1, Dmrta2, Zic1, and Zic3 at the same stringency, despite the absence of MNs in the forebrain of embryos. Anterior gene expression was previously observed through bulk measurements of DP3, and our results reveal that it occurs within a specific subpopulation of cells in the process of differentiating into MNs. We validated these expression differences by isolating intermediate populations of each differentiation path using a Mnx1::GFP reporter cell line, since Mnx1 expression is localized precisely to the distinct intermediate populations of each path (Fig. 3B). Bulk comparisons of these two populations confirmed the enrichment of forebrain TFs, and depletion of spinal progenitor and positional genes in the DP intermediates with just one exception – Zic1 was enriched in DP by our single cell comparison but in SP by microarray (Fig. 4A).
The abnormal positional gene expression signature that characterizes the DP intermediate state appears transient. Forebrain gene expression is upregulated along the DP differentiation path as cells exit the early NP state into the EMN intermediate state (Fig. 4B). This transition is also accompanied by the downregulation of proliferation-associated genes (Fig. 4B; Supp. Fig. S4). By the time cells exit the EMN state and transition into the more differentiated LMN state, they downregulate forebrain genes and replace this abnormal positional signature with a spinal Hox expression signature characteristic of normal MNs (Fig. 4B). Thus cells converge to the MN state in positional as well as neuronal identity gene expression in the final stages of DP.
Both DP and SP approach a transcriptional state similar to bona-fide MNs in embryos
Given that the two protocols induce distinct – and in the case of DP, unnatural – differentiation paths, we were curious how their final products compared with primary MNs. We harvested MNs from the embryo of a Mnx1::GFP reporter mouse and performed inDrops measurements on 874 E13.5 MNs after FACS purification. Though the majority of Hb9+ sorted cells were MNs (73.8%), this population also contained glia (20.1%), fibroblast-like cells (1.8%), and immune-type cells (1.2%; Fig. 5A; Supp. Fig. S5). Using only the cells identified as bona-fide MNs, we probed the similarity of the in vitro derived cells to the primary MNs, using three measures: global similarity of the transcriptomes (cosine similarity); co-clustering frequency; and differential gene expression analysis. In both paths, neurons become more similar to primary MNs over time (Fig. 5B). The clusters most highly correlated with primary MNs were the LMN state from the DP protocol (cosine similarity = 0.62), and the LMN state from the SP (cosine similarity = 0.37). Notably, only 2.7% of output cells from the SP were in the LMN state, compared to 62.7% for DP. Thus, the ratio of the efficiency of forming LMNs by the DP protocol to the SP protocol is 23 fold, seven-fold higher than what we calculated based on a comparison of marker genes alone. At the level of single cells, co-clustering of the different experiments showed that 95% of DP MNs robustly co-cluster with primary MNs compared with 26% for SP derived MNs (Supp. Fig. S6). However, differential gene expression analysis revealed that neither protocol perfectly recapitulates the gene expression profile of MNs isolated from the mouse. Both protocols showed a depletion of the most posterior Hox genes, perhaps indicating an anterior spinal cord identity of in vitro MNs, and a small enrichment of several genes related to microtubule function and cell cycle exit that may indicate subtle differences in neuronal maturation (Supp. Fig. S7).
DP MNs have structural and functional properties of true MNs
Having established that MNs derived via both gene expression trajectories reach roughly the same MN transcriptional state, we wished to validate that their function and structural organization was also independent of their distinct developmental histories. The SP has been characterized extensively as giving rise to functional MNs13, so here we examined structural and functional characteristics following DP. We confirmed that selected protein content matches the mRNA markers by immunostaining for Tubb3, Map2, VACht, Isl1, and Hb9 (Fig. 5C). Tubb3 and Map2 were present, and VACht was seen at discrete puncta on the axons (suggesting localization to acetylcholine secretory vesicles). TFs Isl1 and Hb9 were localized in the nucleus. Finally, the GFP from the Mnx1::GFP reporter was activated and expressed in the cytoplasm. To test the functional properties of the DP MNs, we performed whole-cell patch clamp recordings. Depolarization induced single or multiple action potentials in current-clamp experiments (Fig. 5D). Depolarizing voltage steps induced fast inward currents and slow outward currents characteristic of sodium and potassium channels, respectively (Fig. 5E). Exposure to 500 nM Tetrodotoxin (TTX) blocked the inward current, indicating sodium channel involvement. We then tested whether our DP neurons would respond to neurotransmitters that act on MN (Fig. 4F). Exposure of the neurons to AMPA, kainate, GABA, and glycine (100 μM each) induced in each case inward currents similar to that seen in primary embryonic MNs. To see if the DP neurons could also form neuromuscular junctions, we co-cultured the neurons with differentiated C2C12 skeletal muscle myotubes and incubated them for 7 days. We observed clustering of acetylcholine receptors on the C2C12 myotubes near contact points with the DP neurons, which can be seen with alpha-bungarotoxin (α-BGT), which binds to acetylcholine receptors (Fig. 4G). We then observed regular contractions of some C2C12 myotubes that began after several days in co-culture (Fig. 4G, Supp. Video 1). These contractions could be stopped by the addition of 300 μM Tubocurarine (curare), an antagonist of acetylcholine receptors, indicating that the contractions were induced by the acetylcholine release from the MNs. Similarly, we noticed that the DP MNs could induce contractions in DP muscle myotubes that we previously generated with MyoD (Supp. Video 2)4. These results confirm that DP MNs have the expected functional properties of bona-fide MNs.
Discussion
The results we have described provide evidence at the single cell level that differentiation can proceed by multiple routes yet converge onto similar transcriptional states. We show that while using the SP cells are driven to retrace the embryonic lineage, DP induces cells to differentiate through a dramatically different path. The DP path bypasses multiple intermediate progenitor-states that evolved in the embryo, and yet still converges to the same discrete and recognizable MN phenotype. This convergence occurs from an abnormal intermediate state, and does not appear to involve a shared set of terminal cell state transitions; it is highly orthogonal. Moreover, as cells converge they manage to not only establish gene expression related to MN functions, but they also correct positional gene expression defects (exchanging forebrain for spinal gene expression) in the absence of external signals. Relative to our initial research questions (Fig. 1), we conclude that DP of mESCs into MNs occurs via a late bypass that involves alternative intermediate states not seen in the embryo, and that this new route converges near perfectly to the same final state. Convergence into a MN therefore does not appear to depend rigidly on the precise history of intermediate states through which cells differentiate.
This ‘history independence’ of the final state is consistent with a dynamical view of gene regulation in which cell states correspond to ‘attractor basins’, i.e. stable states of gene expression that are robust to modest perturbations. If attractor basins do not exist, the precision of the observed overlap between DP and SP MNs would require a special coincidence, like finding a needle in a high dimensional haystack. The concept of cell states behaving as attractors has been proposed previously to explain several properties of blood cell types26-28. There are at least two important corollaries of this behavior applying in development. From a practical perspective, it is a common concern that DP methods may generate cell types with subtle defects due to their unusual developmental histories9. Attractors would be robust to this vulnerability and indeed our results show that it is not necessary to recreate the precise sequences of steps taken in embryos to generate bona-fide MNs. It could also hint at a mechanism that might help animal body plans evolve flexibly. Specifically, by decoupling the identity of mature cell state attractors from their developmental histories evolution would be able to act on each independently. In principle this could contribute to evolvability by allowing mature cell states to be transposed onto new lineages in new body locations.
The mechanisms that define the MN attractor basin and allow the artificial DP trajectory to converge onto the correct final state are largely unclear. The MN state is thought to be stabilized by a network of self-reinforcing TFs24, involving Mnr2, Mnx1, Lim3, Isl1, Isl2, and Lhx3, Ngn2, Myt1l, Nefl, and Nefm. DP aims to kick-start this network by activating a subset of important components. Yet, far from immediately activating this network, our data show that DP initially drives cells to differentiate into an early NP state through the same pathway as the SP trajectory, seemingly oblivious to the DP TFs, and then even activates non-MN genes in the transitional state. Understanding why the activation of the MN program lags behind TF induction may provide important clues into how the DP factors act. One possible source for a lag is that activating a complete neuronal program requires first activating additional core TFs (so-called ‘feed-forward’ circuitry). Indeed, recent studies have shown that Ebf and Onecut are activated by Ngn2 (one of the DP TFs), and that both are required to subsequently direct binding of Isl1 and Lhx3 (the other two DP TFs) to MN target genes across the genome during DP12,29. A second possible source of lag is that extracellular signaling provides inputs that immediately affect cell state, but take time to sensitize cells to the DP factors. For example, signaling changes might activate DP TFs through post-translational modifications, by activating co-factors, or by inducing chromatin state changes. We have indeed observed that MNs are not generated if DP TFs are induced in cells cultured in pluripotency media, indicating a requirement for changes in signaling (not shown). Conversely, when mES cells are transferred to minimal media without inducing DP factors, they acquire a forebrain neural progenitor identity by default30-32. This suggests that the early dynamics and abnormal forebrain / MN expression of the DP transitional state might in fact be driven by the signaling environment and not the DP TFs. These alternatives suggest future experiments to better resolve the mechanisms driving the DP, by re-mapping the trajectory induced during DP after changing signaling conditions, or the choice of DP TFs.
As a methodology, DP is significantly more efficient than the SP without loss of quality in the MN populations produced (Fig. 2F-G; Fig. 5). The high-efficiency of DP likely derives from both its more uniform experimental conditions as well as its more direct differentiation path. Experimentally, DP: relies on 2D rather than 3D tissue culture (as in the SP), minimizing uncontrolled cell-cell communication; forces every individual cell to express MN TFs from a genetically integrated construct, increasing uniformity; and employs a defined-media without growth factors that may minimize proliferation of progenitor states. The more direct differentiation path induced by DP should also itself increase MN conversion efficiency by minimizing error propagation through chained opportunities for off-target fate choices. During sequences of intermediate cell state transitions, each transition can have competing off-target fates. Thus, differentiation processes that involve many sequential intermediate transitions suffer from multiplicative efficiency losses. Indeed, the longer sequence of intermediate states the in SP generates a far larger fraction of off-target populations that increases with time, suggesting a progressive loss of efficiency. Targeting terminal attractor basins through the shortest possible differentiation paths may prove to be a generally effective strategy to generate desired cell states.
Competing interests
V.L. and M.K. are cofounders of StemCellerant, LLC. A.K. and M.K. are cofounders of 1CellBio, Inc.
Author contributions
All authors helped design the study and its experimental questions. V. L. established the MN differentiation protocols and performed their molecular characterization. J. B. performed the single-cell data collection and analysis. S.L. did the electrophysiology recordings. M. K., A. K. and C.W. assisted with analysis and interpretation of experimental results. All authors contributed to the writing of the manuscript and preparation of figures.
Acknowledgements
We are grateful to Esteban Mazzoni for providing the transcription factor cassette containing ES cell line used in the DP experiments presented here.