Trajectory-based differential expression analysis for single-cell sequencing data

Koen Van den Berge; Hector Roux de Bézieux; Kelly Street; Wouter Saelens; Robrecht Cannoodt; Yvan Saeys; Sandrine Dudoit; Lieven Clement

doi:10.1101/623397

Abstract

Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression levels during biological processes such as the cell cycle, cell type differentiation, and cellular activation. Downstream of trajectory inference, it is vital to discover genes that are associated with the lineages in the trajectory to illuminate the underlying biological processes. Furthermore, genes that are differentially expressed between developmental/activational lineages might be highly relevant to further unravel the system under study. Current data analysis procedures, however, typically cluster cells and assess differential expression between the clusters, which fails to exploit the continuous resolution provided by trajectory inference to its full potential. The few available non-cluster-based methods only assess broad differences in gene expression between lineages, hence failing to pinpoint the exact types of divergence. We introduce a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of (i) within-lineage differential expression by detecting associations between gene expression and pseudotime over an entire lineage or by comparing gene expression between points/regions within the lineage and (ii) between-lineage differential expression by comparing gene expression between lineages over the entire lineages or at specific points/regions. By incorporating observation-level weights, the model additionally allows to account for zero inflation, commonly observed in single-cell RNA-seq data from full-length protocols. We evaluate the method on simulated and real datasets from droplet-based and full-length protocols, and show that the flexible inference framework is capable of yielding biological insights through a clear interpretation of the data.

Single-cell RNA sequencing (scRNA-seq) has revolutionized modern biology by allowing researchers to profile transcript abundance at the resolution of single cells. This has opened new avenues to study cellular pathways during cell cycle, cell type differentiation, or cellular activation. Indeed, scRNA-seq can provide a snapshot of the transcriptomes of thousands of single cells in a cell population, which are each at distinct points of the dynamic process under study. This wealth of transcriptional information, however, presents many data analysis challenges. Until recently, statistical and computational efforts have focused mostly on trajectory inference (TI) methods, which aim to first allocate cells to lineages and then order them based on pseudotimes within these lineages. A wide range of TI methods have been proposed; 45 of which are extensively benchmarked in Saelens et al. [2019]. Note that we use the term trajectory to refer to the collection of lineages for the process under study.

Most TI methods share a common workflow: dimensionality reduction followed by inference of lineages and pseudotimes in the reduced-dimensional space [Cannoodt et al., 2016]. While early methods were limited to inferring trajectories comprised of a single linear lineage, recent developments have allowed the inference of trajectories that might bifurcate multiple times and consist of several smooth lineages, or that might have cyclic patterns [Street et al., 2018, Lönnberg et al., 2017, Qiu et al., 2017]. These advances in TI methods enable researchers to study dynamic biological processes, such as complex differentiation patterns from a progenitor population to multiple differentiated cellular states [Byrnes et al., 2018, Herring et al., 2018], and have the promise to provide transcriptome-wide insights into these processes.

Unfortunately, statistical inference methods are lacking to identify genes associated with lineage differentiation and to unravel how their corresponding transcriptional profiles are driving the dynamic processes under study. Indeed, differential expression (DE) analysis of individual genes along lineages is often performed on discrete groups of cells in the developmental pathway, e.g., by comparing clusters of cells along the trajectory or clusters of differentiated cell types. Such discrete DE approaches do not exploit the continuous expression resolution that can be obtained from the pseudotemporal ordering of cells along lineages provided by TI methods. Moreover, comparing cell clusters within or between lineages can obscure interpretation: it is often unclear which clusters should be compared, how to properly combine the results of several pairwise cluster comparisons, or how to account for the fact that not all these comparisons are independent of each other. Inevitably, the number of cluster comparisons also increases rapidly with the number of lineages of interest, leading to multiple testing issues at the gene level [Van den Berge et al., 2017] and further decreasing the reproducibility of scRNA-seq DE results.

A few methods have been been published with the aim of improving trajectory-based differential expression analysis by modeling gene expression as a smooth function of pseudotime along lineages. GPfates [Lönnberg et al., 2017] relies on a mixture of overlapping Gaussian processes [Lázaro-Gredilla et al., 2012], where each component of the mixture model represents a different lineage. For each gene, the method tests whether a model with a bifurcation significantly increases the likelihood of the data as compared to a model without a bifurcation, essentially testing whether gene expression is differentially associated with the two lineages. Similarly, the BEAM approach in Monocle 2 [Qiu et al., 2017] allows users to test whether differences in gene expression are associated with particular branching events on the trajectory. Both methods improve upon a discrete cluster-based approaches by: (1) exploiting the continuous expression resolution along the trajectory and (2) comparing lineages using a single test based on entire gene expression profiles. However, they both lack interpretability, as they cannot pinpoint the regions of the gene expression profiles that are responsible for the differences in expression between lineages. Moreover, the GPfates model is restricted to trajectories consisting of just one bifurcation, essentially precluding its application to biological systems with more than two lineages (i.e., a multifurcation or more than one bifurcation). BEAM is restricted to the few dimensionality reduction methods that are implemented in the Monocle software, namely, independent component analysis (ICA), DDRTree [Qiu et al., 2017], and uniform manifold approximation and projection (UMAP) [McInnes et al., 2018]. Hence, novel methods to infer differences in gene expression patterns within or between transcriptional lineages with complex branching patterns are vital to further advance the field.

In this manuscript, we introduce tradeSeq, a method and software package for trajectory-based differential expression analysis for sequencing data. tradeSeq provides a flexible framework that can be used downstream of any dimensionality reduction and TI methods. Unlike previously proposed approaches, tradeSeq provides several tests that each identify a distinct type of differential expression pattern along a lineage or between lineages, leading to clear interpretation of the results. In practice, tradeSeq infers smooth functions for the gene expression measures along pseudotime for each lineage using generalized additive models and tests biologically meaningful hypotheses based on parameters of these smoothers. By allowing cell-level weights for each individual count in the gene-by-cell expression matrix, tradeSeq can handle zero inflation, which is essential for dealing with dropouts in full-length scRNA-seq protocols [Van den Berge et al., 2018]. As it is agnostic to the dimensionality reduction and TI methods, the approach scales from simple to complex trajectories with multiple bifurcations: tradeSeq only requires the original expression count matrix of the individual cells, estimated pseudotimes, and a hard or soft assignment (weights) of the cells to the lineages to infer the lineage-specific smoothers. For within-lineage differential expression, tradeSeq provides both global tests to screen for genes with overall DE along a lineage, as well as specific tests to pinpoint relevant variation in gene expression profiles within the lineage. Likewise, for between-lineage comparisons, tradeSeq provides both global tests to compare expression patterns between entire lineages (useful for initial screening of interesting genes), as well as specific tests that allow researchers to pinpoint relevant differences in expression profiles between lineages. If multiple hypotheses are assessed for each gene, one can build upon our stageR package [Van den Berge et al., 2017] to conduct an omnibus test (e.g., there are no differences in expression profiles across multiple lineages) prior to post hoc tests that identify the relevant specific differences (e.g., all pairwise comparisons between lineages). We benchmark our method against current state-of-the-art methods using simulated datasets (with cyclic, bifurcating, and multifurcating trajectories) and demonstrate its functionality and versatility on two real datasets. These case studies highlight the enhanced interpretability of tradeSeq’s results, which lead to improved understanding of the underlying biology.

Methods

In this Section, we first present a negative binomial generalized additive model for expression measures along a trajectory. Building on this model, we then describe a general and flexible framework for identifying genes that are differentially expressed either within or between lineages of a given trajectory.

Negative binomial generalized additive models

We build on the generalized additive model (GAM) methodology to model gene expression profiles as non-linear functions of pseudotime for the different lineages in a complex trajectory. In our GAM framework, each lineage is represented by a separate cubic smoothing spline, i.e., a linear combination of cubic basis functions of pseudotime. The flexibility of GAM also allows us to easily adjust for other covariates or confounders such as treatment and batch. The discrete nature and the over-dispersion of read counts is addressed by modeling the expression measures Y_gi, for a given gene g ∈{1,…, G} across cells i ∈ {1,…, n}, using a negative binomial (NB) distribution with cell and gene-specific means µ_gi and gene-specific dispersion parameters ø_g. Hence, we propose the following gene-wise negative binomial generalized additive model (NB-GAM) where the mean µ_gi of the NB distribution is linked to the additive predictor η_gi using a logarithmic link function. The gene-wise additive predictor consists of lineage-specific smoothing splines s_gl, that are functions of pseudotime T_li, for lineages l ∈{1,…, L}. The binary matrix Z = (Z_li ∈{0, 1}: l ∈{1,…, L}, i ∈ {1,…, n}) assigns every cell to a particular lineage based on user-supplied weights (e.g., from slingshot [Street et al., 2018] or GPfates [Lönnberg et al., 2017], see details in Supplementary Methods). We let ℒ_l ={i: Z_li = 1} denote the set of cells assigned to lineage l. In addition, we allow the inclusion of p known cell-level covariates (e.g., batch, age, or gender), represented by an n × p matrix U, with i^th row U_i corresponding to the i^th cell, and a regression parameter α_g of dimension p× 1. Differences in sequencing depth or capture efficiency between cells are accounted for by cell-specific offsets N_i.

The smoothing spline s_gl, for a given gene g and lineage l, can be represented as a linear combination of K cubic basis functions, where the cubic basis functions b_k(t) are enforced to be the same for all genes and lineages. In our implementation, we set K = 10. Thus, for each gene and each lineage in the trajectory, we estimate K = 10 regression coefficients β_glk. The number of parameters L ×K + p + 1 in the gene-wise model is therefore typically much lower than the number of cells n in the dataset. In practice, we found the results to be robust to the choice of K. Indeed, the proportion of deviance explained by the model is not altered for any given choice of K between 6 and 14 (Supplementary Figure S1).

The NB-GAM is fitted gene by gene using the fitGAM function from the tradeSeq package, which relies on the mgcv package in R. We build upon recent developments that allow the joint estimation of the NB regression parameters in µ_gi and of the dispersion parameter Ø_g [Wood et al., 2016]. In order to control the smoothness of the spline, some coefficients β_glk are shrunken by substracting a penalty from the log-likelihood function, where β_g denotes the concatenation of the L K-dimensional column vectors β_gl of lineage-specific smoother coefficients and S is an (LK) × (LK) diagonal matrix that indicates which coefficients in β_g are to be penalized. The magnitude of penalization is controlled by the smoothing parameter λ_g, which is selected using generalized cross-validation [Wood, 2017]. Note that we enforce identical basis functions between lineages, i.e., b_k does not depend on l, as well as identical smoothing parameter λ_g, in order to ensure that the smoothers are comparable across lineages.

Importantly, the model of Equation (1) can accommodate zero-inflated counts typical for full-length scRNA-seq protocols by using observation-level (i.e., cell-level) weights obtained from the zero-inflated negative binomial (ZINB) approach of Van den Berge et al. [2018] and Risso et al. [2018a].

Statistical inference

We propose a general and flexible testing framework for (linear combinations of) the parameters β_g, which allows us to pinpoint specific types of differences in gene expression both within and between lineages, see Figure 1 for an overview. We first present the general approach and then detail the implementation and interpretation of specific DE tests.

Figure 1: Overview of tradeSeq functionality.

For each gene, we start with a scatterplot of expression measures vs. pseudotimes, where each lineage is represented by a different color (top left). A NB-GAM is fitted using the fitGAM function. The locations of the knots for the splines are displayed with gray dashed lines. The NB-GAM can then be used to perform a variety of tests of differential expression within or between lineages, as well as for clustering of the expression profiles of DE genes. In the table, we assume that the earlyDETest is used to assess differences in expression patterns early in the lineage, e.g., with option knots = c(1, 2).

All proposed DE procedures involve testing null hypotheses of the form H₀: C^T β_g = 0 using Wald test statistics where denotes an estimator of β_g, represents an estimator of the covariance matrix of , and C is an (LK) ×C matrix representing the C contrasts of interest for the DE test.

For each gene, we compute p-values based on the nominal chi-squared asymptotic null distribution of the Wald statistics (with degrees of freedom equal to the column-rank of C). Rather than attaching strong probabilistic interpretations to the p-values (which, as in most RNA-seq applications, would involve a variety of hard-to-verify assumptions and would not necessarily add much value to the analysis), we view the p-values simply as useful numerical summaries for ranking the genes for further inspection.

Within-lineage comparisons

`associationTest`

A logical first question is whether a gene’s expression is associated with pseudotime along a given lineage, i.e., whether the smoother is flat or significantly varying along pseudotime. To address this question, the associationTest tests the null hypothesis that all smoother coefficients within the lineage are equal, i.e., H₀: β_glk = β_glk′ for all k ≠ k′ ∈ {1,…, K}. This null hypothesis can be encoded in several ways; here, we chose the contrast matrix C to be an LK × L(K − 1) matrix, where each column corresponds to a contrast between two consecutive β_glk and β_gl(k+1) and where we have K - 1 contrasts per lineage for a total of L(K − 1) contrasts.

`startVsEndTest`

By default, the startVsEndTest compares mean expression at the progenitor state (i.e., the start of the lineage) to mean expression at the differentiated state (i.e., the end of the lineage) Wald test statistic, as described above. Specifically, C is an (LK) ×L matrix, whose entry in row k + (l − 1)K and column l encodes the contrast for lineage l and knot k and is defined by b_k (T_l,max − b_k T_l,min) where T_li and T_li denote, respectively, the maximum and minimum pseudotime across all cells assigned to lineage l. Other entries of C are set to zero. Therefore, the l^th element of the vector C^T β_g is , which contrasts mean expression at the beginning and at the end of the lineage. Note that contrasting the start and endpoints of a lineage is a special case of a more general capability of tradeSeq to compare the average expression between any two regions of a given lineage. As such, this test can be considered a generalization of cluster-based discrete DE within a lineage (e.g., Risso et al. [2018b]).

Between-lineage comparisons

`diffEndTest`

The diffEndTest compares average expression at the differentiated states of multiple lineages, i.e., it compares the endpoints of different lineage-specific smoothers. It can be viewed as an analog of discrete DE for the differentiated cell types. The test is implemented using a Wald test statistic, as described above, where C is an (LK) × L(L − 1)/2 matrix. Each column of C encodes a pairwise contrast between the endpoints of two lineages, such that the corresponding element of C^T β_g is for lineages l₁ and l₂.

`patternTest`

This test compares the expression patterns along pseudotime between lineages by contrasting a fixed set of equally-spaced pseudotimes (M = 100 by default). First selecting the pseudotimes and subsequently comparing their expression levels between lineages, allows for comparisons between smoothers of different lengths. Specifically, for lineage l, let P_lm denote the m^th equally-spaced pseudotime between T_l,min and T_l,max. The contrast of M points corresponds to testing the null hypothesis that a gene has the same expression pattern along pseudotime across the lineages under comparison, while normalizing for the length of the lineages. The test is implemented using a Wald test statistic, as described above, where C is an (LK) × L(L − 1)M/2 matrix. Each column of C encodes a pairwise comparison between two pseudotimes of two different lineages, such that the corresponding element of C^T β_g is for lineages l₁ and l₂ and m ∈ {1,…, M}. The test is implemented through the eigendecomposition of the estimated variance-covariance matrix of the contrasts to avoid singularity problems [Smyth, 2004] (see Supplementary Methods). It should be noted that this test is a global test, able to identify both differences in patterns of expression as well as genes with similar patterns but different mean expression across the pseudotime range. It is therefore most useful as a screening test to identify any form of differential expression between the lineages.

`earlyDETest`

The earlyDETest aims to identify genes that are driving the differentiation around the branching. It is similar to the patternTest, in that it also compares the expression patterns along pseudotime between lineages by contrasting a fixed set of equally-spaced pseudotimes (M = 100 by default). However, instead of using points distributed from the beginning T_l,min to the end T_l,max of the lineages as in the patternTest, it relies on points over a shorter range of time. In the current implementation, this range is delimited by the pseudotimes of two user-defined knots. The knots should be chosen to span the branching event (or any event of interest) and do not need to be consecutive.

Global testing

While the statistical tests introduced above can assess DE within one lineage or between a pair of lineages, one may want to investigate multiple (i.e., more than two) lineages. For example, if a trajectory consists of three lineages, one may wish to test the global null hypothesis that, for each of the three lineages, there is no association between gene expression and pseudotime using the associationTest. The null hypothesis that would be tested can be expressed as H₀:∀l and ∀k ≠ ∀ k′, β_glk = β_glk′, i.e., within each of the three lineages, all K regression coefficients are equal. We refer to such a test as a “global test”. The tradeSeq package provides functionality for global testing for each of the within and between-lineage tests described above. For within-lineage tests, the user can specify whether the test should be done for each lineage individually or at the global level (i.e., for all lineages). For between-lineage tests, the user can specify whether only a pair of lineages should be assessed for DE or all pairwise comparisons should be performed.

Stage-wise testing

For the olfactory epithelium case study [Fletcher et al., 2017] detailed below, we apply stage-wise testing, as implemented in stageR [Heller et al., 2009, Van den Berge et al., 2017], to assess DE between lineages using multiple tests for each gene. Stage-wise testing aims to control the overall FDR (OFDR) [Heller et al., 2009], i.e., the expected proportion of genes with at least one falsely rejected null hypothesis among all genes declared DE. In our case, the OFDR can be interpreted as a gene-level FDR [Van den Berge et al., 2017]. Stage-wise testing is performed in two stages, a screening and a confirmation stage. At the screening stage, each gene is screened by performing a global test across all null hypotheses of interest, essentially testing whether at least one of these hypotheses can be rejected. At that stage, the FDR is controlled across genes at level α_I. At the confirmation stage, each specific hypothesis is assessed, but only for the genes that have passed the screening stage. For each gene, the family-wise error rate (FWER) is controlled across hypotheses at level , where R denotes the number of genes that had their global null hypothesis rejected at the screening stage and G the total number of genes assessed. Heller et al. [2009] proved that this procedure controls the overall FDR at level α_I. It should be noted that, while the stage-wise testing paradigm theoretically controls the OFDR (given underlying assumptions are satisfied), the resulting p-values might still be too liberal since the same data are used for trajectory inference and differential expression. As mentioned before, we use p-values simply as numerical summaries for ranking the genes for further inspection.

Clustering gene expression patterns

The NB-GAM can also be used to cluster genes according to their expression patterns, as shown in Figure 1. Specifically, for each gene, we extract a number of fitted values for each lineage (100 by default). We then use resampling-based sequential ensemble clustering, as implemented inRSEC [Risso et al., 2018b], to perform the clustering based on the first ten principal components of the standardized fitted values matrix (i.e., the fitted values are standardized to have zero mean and unit variance across cells for each gene). This clustering approach is implemented in the tradeSeq package (clusterExpressionPatterns function) for downstream analysis facilitating the interpretation of DE genes.

Implementation

The above DE tests are implemented in the open-source R package tradeSeq, available on our GitHub repository (https://github.com/statOmics/tradeSeq) and to be submitted to the Bioconductor Project (http://www.bioconductor.org). We provide an extensive vignette along with the package, as well as a cheat sheet describing the different types of DE patterns detected with each test.

Methods comparison

slingshot is a fast and robust method for TI that was shown to be among the top performing methods in a recent large-scale benchmarking study [Saelens et al., 2019]. Hence, we evaluate tradeSeq down-stream of a slingshot analysis, which can work with any dimensionality reduction and clustering methods. slingshot builds a cluster-based minimum spanning tree (MST) to infer the global lineage topology and make an initial assignment of cells to lineages. This structure is then smoothed by fitting simultaneous principal curves, which refine the assignments of cells to lineages. This process results in lineage-specific pseudotimes and weights of assignment for each cell.

GPfates [Lönnberg et al., 2017] is a Python package that adopts Gaussian processes in reduced dimension to infer trajectories. Dimensionality reduction is performed using Gaussian process latent variable models (GPLVM) [Lawrence, 2003]. GPfates is able to identify bifurcation points and assess how well a bifurcation fits the expression pattern for every gene, i.e., whether the patterns of gene expression are different between the lineages. This allows us to compare a slingshot + tradeSeq analysis with a Gpfates analysis. In addition, we also evaluate a tradeSeq analysis downstream of TI with GPfates, since Gpfates also calculates posterior probabilities that each cell belongs to a particular lineage. We then compare the complete GPfates (TI and DE) analysis to a GPfates + tradeSeq analysis.

Monocle 2 [Qiu et al., 2017] applies reverse graph embedding to infer trajectories and yields a principal graph that is allowed to branch. It provides a similar approach as tradeSeq with the branch expression analysis modeling (BEAM) method. It assumes a gene-wise negative binomial model for gene expression, where the mean is expressed in terms of lineage-dependent smooth functions of pseudotime, i.e.,

In this model, the lineage-specific intercepts β_0gl account for mean differences in expression between lineages, while the lineage-specific smoothers s_gl(t) model the expression change along pseudotime. To test for lineage-dependent expression, the full model is compared to a null model of the form using a likelihood ratio test. Thus, BEAM tests whether the smooth functions of gene expression along pseudotime are different between lineages. Importantly, the BEAM method does not allow the inclusion of other covariates such as batch, and it is restricted to the dimensionality reduction methods that are implemented in the software package. Additionally, it only provides a screening test (like the patternTest in tradeSeq), as it only allows testing for any difference in the expression profiles between lineages and does not specify the exact type of divergence.

An alpha release for Monocle 3 is available online (downloaded August 30, 2018 from the Monocle GitHub repository) which, unlike Monocle 2, performs uniform manifold approximation and projection [McInnes et al., 2018] dimensionality reduction upstream of the trajectory inference. Additionally, Monocle 3 implements the Moran’s I test to discover genes whose expression is significantly associated with pseudotime; a functionality that is unavailable in Monocle 2.

edgeR [McCarthy et al., 2012] is a discrete differential expression method, where the groups under comparison must be defined a priori. It is therefore useful for assessing DE between, for example, annotated clusters or different treatment groups. For such comparisons, edgeR is a powerful method with high sensitivity. However, it is limited to discrete DE and cannot be applied when interested in continuous DE, e.g., assessing differences in expression patterns along pseudotime.

Simulation study

The simulation study evaluates methods that (differentially) associate gene expression with pseudotime for three different trajectory topologies, i.e., a cyclic, a bifurcating, and a multifurcating trajectory. As independent evaluation, we use the extensive simulation framework that previously served for benchmarking trajectory inference methods in Saelens et al. [2019]. Interested readers should refer to the original publication for details on the data simulation procedure. Dataset characteristics are listed in Table 1.

View this table:

Table 1:

Overview of simulated datasets. Each dataset is simulated using a framework from the dynverse toolbox and is characterized by the topology of the trajectory, as well as the number of cells and genes. Low-dimensional representations of representative datasets can be found in Figure 2. Note that the cyclic datasets have some variation in the numbers of genes and cells and in the amount of differential expression, which is inherent to the dyngen simulation framework.

For each of the cyclic and bifurcating topologies, we generate and analyze 10 datasets. Since the multifurcating topology is very variable across simulations due to its flexible definition, its analysis requires substantial supervision. Therefore, we analyze only one representative multifurcating dataset.

Prior to trajectory inference, the simulated counts are normalized using full-quantile normalization [Bolstad et al., 2003, Bullard et al., 2010]. For TI with slingshot, we apply principal component analysis (PCA) dimensionality reduction to the normalized counts and k-means clustering in PCA space. For the bifurcating and multifurcating trajectories, the start and end clusters of the true trajectory are provided to slingshot to aid it in inferring the trajectory. For the edgeR analysis, we assess DE between the end clusters that are also provided to slingshot. The BEAM method can only test one bifurcation point at a time. For the multifurcating dataset, we therefore assessed both branching points separately and aggregated the p-values using Fisher’s method [Fisher, 2006]. For the tradeSeq and edgeR analyses of the multifurcating dataset, we perform global tests across all three lineages.

We assess performance based on scatterplots of the true positive rate (TPR) vs. the false discovery proportion (FDP), according to the following definitions where FN, FP, and TP denote, respectively, the numbers of false negatives, false positives, and true positives. FDP-TPR curves are calculated and plotted with the Bioconductor R package iCOBRA [Soneson and Robinson, 2016].

Case studies

Mouse bone marrow dataset

We use as first case study the mouse haematopoiesis scRNA-seq dataset of Paul et al. [2015]. Two small cell clusters corresponding to the dendritic and eosinophyl cell types were removed from the trajectory inference and downstream DE analysis, since these are outlying cell types that do not seem to belong to any particular lineage (Supplementary Figure S2).

tradeSeq downstream of slingshot is compared to the BEAM approach from Monocle 2. Since BEAM is restricted to the dimensionality reduction methods implemented in the package, we use independent components analysis (ICA) for both slingshot and Monocle 2 in this comparison. For Monocle 2, we specify the argument num paths=2 to aid it in inferring two lineages.

Subsequently, we demonstrate a tradeSeq analysis downstream of slingshot by performing dimensionality reduction using UMAP [McInnes et al., 2018], following the data processing pipeline described in the Monocle 3 vignette, since this better reflects the biology of the experiment.

In this case study, we show how one can perform multiple tests to identify genes with distinct types of behavior, specifically, genes that are deemed DE for one test (test 1) but not another (test 2). Let denote the test statistic for gene g in test τ ∈ {1, 2} and denote the rank (in terms of ordering from low to high) of among all G test statistics associated with the G genes. Then, define a score for each gene g as . Genes with high scores are genes which are expected to be DE for test 1 but not DE for test 2 and vice versa. This is used to identify genes that are DE with the patternTest (test 1) but not the diffEndTest (test 2), i.e., genes that are transiently DE between lineages. Note that the procedure only provides a ranking of the genes and not an evaluation of statistical significance.

Mouse olfactory epithelium dataset

The olfactory epithelium (OE) dataset from Fletcher et al. [2017] is our second case study. We use the lineages discovered in the original manuscript. In brief, counts are normalized using full-quantile normalization [Bolstad et al., 2003, Bullard et al., 2010] followed by regression-based adjustment for quality control variables [Fletcher et al., 2017]. Dimensionality reduction is performed through PCA on the normalized log-transformed counts that are offset by 1 to avoid taking the log of zero, i.e., log(y + 1). Clustering is performed through k-means on the first 50 principal components by varying the number of clusters k ∈{4,…, 15}; stable clusters are derived using clusterExperiment [Risso et al., 2018b], yielding a final repertoire of 13 cell clusters. Next, slingshot is used to infer trajectories with the initial cluster chosen by known marker genes of horizontal basal cells (HBC), an adult stem cell population. A double bifurcation is discovered, with the first giving rise to sustentacular cells and two more lineages that split into microvillous cells and olfactory sensory neurons. The data were downloaded from GEO with accession number GSE95601.

Results

In this section, we first evaluate tradeSeq on several simulated datasets with trajectories that span different topologies. Next, we demonstrate how tradeSeq can improve biological interpretation of trajectory inference results by applying it to two real datasets, a scRNA-seq dataset for mouse bone marrow [Paul et al., 2015] and a SMART-Seq2 dataset for the mouse olfactory epithelium [Fletcher et al., 2017].