Abstract
Resting-state functional connectivity is a powerful tool for studying human functional brain networks. Temporal fluctuations in functional connectivity, i.e., dynamic functional connectivity (dFC), are thought to reflect dynamic changes in brain organization and non-stationary switching of discrete brain states. However, recent studies have suggested that dFC might be attributed to sampling variability of static FC. Despite this controversy, a detailed exposition of stationarity and statistical testing of dFC is lacking in the literature. This article seeks an in-depth exploration of these statistical issues at a level appealing to both neuroscientists and statisticians.
We first review the statistical notion of stationarity, emphasizing its reliance on ensemble statistics, which contrasts with the fact that all FC measures depend on sample statistics. One implication is that stationarity does not imply the absence of brain states. We then expound the assumptions underlying two frameworks - phase randomization (PR) and autoregressive randomization (ARR) - widely used to generate null data for statistical testing of dFC. It turns out that both PR and ARR rely on assumptions of stationarity, linearity and Gaussianity. Therefore statistical rejection does not necessarily imply non-stationarity, but can also be due to nonlinearity or non-Gaussianity. We further show that a common form of ARR (bivariate ARR) is susceptible to false positives compared with PR and an adapted version of ARR (multivariate ARR).
Application of PR and multivariate ARR to Human Connectome Project data suggests that the stationary, linear, Gaussian null hypothesis cannot be rejected for most participants. However, failure to reject the null hypothesis does not imply that static FC can fully explain dFC. After all, AR models are dynamical FC models in the sense that they encode linear dynamic interactions beyond static FC. We find that AR models could explain temporal FC fluctuations significantly better than static FC models. We also find that AR models explain temporal FC fluctuations significantly better than a popular model assuming discrete brain states, suggesting the lack of discrete states (as measured by resting-state fMRI). Overall, our results suggest that AR models are not only useful as a means for generating null data, but may be a powerful tool for exploring the dynamical properties of resting-state functional connectivity. We also discuss how apparent contradictions in the growing dFC literature might be reconciled.
1. Introduction
The human brain exhibits complex spatiotemporal patterns of activity fluctuations even during the resting-state (Greicius et al., 2003; Damoiseaux et al., 2006; Smith et al., 2013b). Characterizing the structure of these fluctuations is commonly done via functional connectivity (FC) analyses of resting-state fMRI (rs-fMRI) data (Van Den Heuvel and Pol, 2010; Buckner et al., 2013). The most common FC measure is the Pearson correlation between brain regional time courses (Biswal et al., 1995; Vincent et al., 2006; Dosenbach et al., 2007; Buckner et al., 2009; Zalesky et al., 2010; Power et al., 2011; Yeo et al., 2011, 2014; Margulies et al., 2016), although other measures, such as partial correlation (Fransson and Marrelec, 2008; Spreng et al., 2013) or mutual information (Tsai et al., 1999; Tedeschi et al., 2005; Chai et al., 2009) have been utilized. These FC measures are static in the sense that they are invariant to temporal re-ordering of fMRI time points, thus ignoring temporal information that might be present in fMRI
In contrast, recent work on dynamic functional connectivity (dFC) suggests that there might be important information beyond static FC, e.g., in the temporal fluctuations of FC or in models taking into account the temporal ordering of fMRI time series (see Hutchison et al. (2013a); Calhoun et al. (2014); Preti et al. (2016) for recent reviews). To interrogate dFC, sliding window correlations (SWC) is by far the most common method (Sakoğlu et al., 2010; Handwerker et al., 2012; Hutchison et al., 2013b; Allen et al., 2014; Leonardi et al., 2014; Liégeois et al., 2016; Wang et al., 2016), although many alternative approaches have been proposed (Majeed et al., 2011; Smith et al., 2012; Lindquist et al., 2014; Karahanoğlu and Van De Ville, 2015; Shine et al., 2015).
To assess the statistical significance of dFC, randomization frameworks are typically used to generate null data. Null hypothesis testing can then be performed by comparing statistics from the original data against those from the null data. The two most popular randomization frameworks are autoregressive randomization (ARR) (Chang and Glover, 2010; Zalesky et al., 2014) and phase randomization (PR) (Handwerker et al., 2012; Allen et al., 2014; Hindriks et al., 2016). While most papers reported the rejection of the null model (Chang and Glover, 2010; Handwerker et al., 2012; Zalesky et al., 2014), recent studies have suggested difficulties in rejecting the null model, especially in single subject data (Hindriks et al., 2016; Laumann et al., 2016).
The observed dFC have also been interpreted by many authors as evidence of non-stationary switching of discrete brain states (Allen et al., 2014; Hansen et al., 2015). These states have been associated with mental disorders (Damaraju et al., 2014; Rashid et al., 2014; Su et al., 2016; Du et al., 2016), as well as variation in intra-individual and inter-individual differences in vigilance, consciousness and executive function (Barttfeld et al., 2015; Nomi et al., 2017; Shine et al., 2016; Wang et al., 2016). In contrast, some have suggested that the brain (as measured by rs-fMRI) might not be undergoing sharp transition between discrete states (Leonardi et al., 2014) or that dFC fluctuations might largely reflect sampling variability (Laumann et al., 2016).
Contributing to the possible confusion in the literature is the loose use of the term “stationarity” (e.g., Hutchison et al., 2013a; Allen et al., 2014; Zalesky and Breakspear, 2015; Preti et al., 2016). For example, Hutchison and colleagues Hutchison et al. (2013a) equated static FC and dFC analyses with assumptions of stationarity and non-stationarity respectively. However, the very same review cautioned that a stationary process can exhibit tempo-ral fluctuations in an FC metric, such as SWC (Hutchison et al., 2013a). Since null data generation frameworks (e.g., PR and ARR) were developed based on strict statistical definitions of stationarity (Tucker et al., 1984; Efron and Tibshirani, 1986), the loose usage of statistical terminologies impedes our understanding of dFC. To the best of our knowledge, issues of stationarity and assumptions of null data generation frameworks (PR and ARR) are often briefly mentioned, but not discussed in detail in the literature. By exploring these issues in-depth, this commentary leads to several surprising conclusions.
The commentary begins by reviewing random processes, weak-sense stationarity and how fMRI can be conceptual-ized as a random process (Section 2). We then show that a two-state hidden Markov model (HMM) process is actually stationary, suggesting that stationarity does not imply the absence of brain states (Section 3). In the following section, assumptions behind PR and ARR are discussed, revealing that both PR and AR generate null data that are linear, stationary and Gaussian. Therefore rejection of the PR and ARR null models does not imply non-stationarity. Importantly, AR models encode dynamical interactions between brain regions, above and beyond static FC (Section 4). Experiments on the Human Connectome Project data suggests that the PR and ARR null models cannot be rejected for most low motion participants, and that bivariate ARR (a common variant of ARR) can yield false positives (Section 5). Furthermore, multivariate AR models replicate the rich dynamics of SWC significantly better than just models of static FC, as well as commonly used HMM-type models that explicitly encode discrete brain states (Section 6). We conclude with a discussion of how these results can be reconciled with the growing literature on dFC (Section 7).
While our experiments focused on SWC, almost all the issues we discuss apply to other dFC methods. In addition, it is worth distinguishing dFC (second order statistics) from dynamic fMRI activity level (first order statistics). From the earliest days of resting-state fMRI, the question of dynamic fMRI activity level during resting-state and its relationship with behavior has been of great interest (Fox et al., 2006, 2007; Kucyi et al., 2016). Many of the issues that we raised in this commentary also apply to the study of dynamic activity level. Therefore we will point out relevant lessons to dynamic activity level as and when they arise.
2. Interpreting fMRI as a random process
In this section, we first review random variables, random processes and weak-sense stationarity (WSS). We then distinguish between sample statistics and ensemble statistics. Finally, we formalize fMRI as a random process and explain why testing dFC within a formal statistical framework is non-trivial.
2.1. Random variables
A random variable is a quantity that is uncertain (Prince, 2012). It may be the outcome of an experiment (e.g., tossing a coin) or real world measurement (e.g., measuring the temperature of a room). If we observe a random variable multiple times, we will get different values. Some values occur more frequently than others; this variation in frequencies is encoded by the probability distribution of the random variable. Multiple observations of a random variable are referred to as realizations (or samples) of the random variable.
The mean or expectation of a random variable X is denoted as E(X). We can think of E(X) as the average value of X over many (infinite) realizations of X. Similarly, the variance of a random variable X is denoted as Var(X) = E[(X − E(X))2]. We can think of Var(X) as the average square deviation of X from its mean over many (infinite) realizations of X.
In the case of two random variables X and Y, we can characterize their linear relationship with the covariance Cov(X, Y) = E[(X − E(X))(Y − E(Y))]. We can think of Cov(X, Y) as averaging (X − E(X))(Y − E(Y)) over many (infinite) realizations of X and Y. The covariance measures how much X and Y co-vary across their respective means. If the covariance is positive, then for a particular realization of X and Y, if X is higher than its mean, then Y tends to be higher than its mean. If the covariance is negative, then for a particular realization of X and Y, if X is higher than its mean, then Y tends to be lower than its mean. We note that Cov(X, X) = Var(X).
2.2. Random processes and weak-sense stationarity (WSS)
A random process is an infinite collection of random variables, and is especially useful for the analysis of time-series. For example, suppose we randomly pick a ther-mometer from a store (with many thermometers) to measure the temperature of a particular room. Let Ut be the thermometer measurement at time t. Then Ut is a random process. Figure 1A illustrates three realizations of the random process, where each realization (blue, red or green) corresponds to a different thermometer. Here we assume that the room temperature is constant at 20°C, and that each thermometer is identical and incurs an independent (zero-mean unit-variance Gaussian) measurement noise at each time, i.e., p(Ut) ∼ N (20, 1).
The expectation of a random process Xt at time t is denoted as E(Xt). We can think of E(Xt) as averaging Xt across infinite realizations of the random process at time t. For the toy example Ut (Figure 1A), averaging the temperature measurements across many thermometers at a particular time t converges to the true temperature of the room, and so E(Ut) = 20 for all time t. Similarly, we can think of Var(Xt) = E[(Xt − E(Xt))2] as averaging the square deviation of Xt at time t from its mean E(Xt) over infinite realizations of the random process. For the toy example Ut (Figure 1A), Var(Ut) = 1 for all time t.
Finally, the auto-covariance Cov(Xn, Xm) = E[(Xn − E(Xn))(Xm − E(Xm))] measures the co-variation of Xn and Xm about their respective means at times n and m.For example, if the auto-covariance is positive, then for a particular realization of Xt, if Xn is higher than its mean E(Xn), then Xm tends to be higher than its mean E(Xm). Conversely, if the auto-covariance is negative, then for a particular realization of Xt, if Xn is higher than its mean E(Xn), then Xm tends to be lower than its mean E(Xm). We note that Cov(Xn, Xn) = Var(Xn). For the toy example Ut (Figure 1A), Cov(Xn, Xm) = 0 for two different time points n and m since we assume the thermometer noise is independent at each time point.
We can now define WSS as follows (Papoulis, 2002):
A random process Xt is WSS if its mean E(Xt) is constant for all time t and its auto-covariance Cov(Xn, Xm) depends only on the time interval t = n − m, i.e., Cov(Xn, Xm) = R(n − m) = R(τ).
Since Var(Xn) = Cov(Xn, Xn), this implies that Var(Xn) = R(0) for a WSS process. In other words, the variance Var(Xt) of a WSS process is constant over time. While there are other forms of stationarity (e.g., strict-sense stationarity), we will only focus on WSS in this paper and will use the phrase “stationarity” and “WSS” interchangeably.
The toy example Ut (Figure 1A) is WSS because E(Ut)= 20 is a constant, and Cov(Xn, Xm) can be written as R(n − m), where R(0) = 1 (when n = m) and R(n-m) is equal to 0 for n − m ≠ 0. In contrast, suppose the thermostat of the room was changed at time t0, so that the room temperature increased from 20°C to 23°C (Figure 1B). Then the resulting random process Vt isnon-stationary because E(Vt) = 20 for t < t0 and E(Vt) = 23 for t > t0.
2.3. Ensemble statistics versus sample statistics
It is worth emphasizing that the mean E(Xt), variance Var(Xt) and auto-covariance Cov(Xn, Xm) of a random process are defined across an infinite number of realizations, rather than within a single realization. To illustrate this point, suppose at time points t1 and t2, we average across multiple realizations of the random process Ut resulting in MU (t1) and MU (t2) (Figure 1A). We can think of MU (t1) and MU (t2) as estimates of E(Ut1) and E(Ut2). Indeed, as the number of realizations increases, MU (t1) and MU (t2) will converge to E(Ut1) and E(Ut2) respectively. This convergence holds not just for WSS processes, but all random processes. In the toy example Vt (Figure 1B), MV (t1) converges to E(Vt1) = 20, while MV (t2) converges to E(Vt2) = 23. We refer to the computation of statistics across realizations as ensemble statistics.
In contrast, sample statistics are computed within a single realization. For example, we can average each realization of the random process Ut resulting in M (U 1), M (U 2) and M (U 3) (blue, green and red in Figure 1A). In the case of the random process Ut (Figure 1A), as the number of time points for each realization increases, the sample statistics converge to the ensemble statistics. More specifically, M (U 1), M (U 2) and M (U 3) converge to E(Ut) = 20. However, sample statistics do not converge to ensemble statistics for non-stationarity processes. In the toy example Vt (Figure 1B), the sample statistics M (V 1), M (V 2) and M (V 3) (blue, green and red in Figure 1B) converge to 21.5.
Therefore ensemble statistics are generally not equivalent to sample statistics in the case of non-stationary processes. Based on the toy example Ut (Figure 1A), one might be tempted to conclude that ensemble and sample statistics are equivalent in WSS processes. However, this turns out not to be true. To illustrate this, let’s again assume the room temperature is constant at 20°C. However, each thermometer now incurs an independent non-zero-mean unit-variance Gaussian noise at each time p(Wt) ∼ 𝓝(20 + b, 1), where the bias b ∼ 𝓝(0, 1) is different for each thermometer (but held constant within a realization). Three realizations of random process Wt is illustrated in Figure 1C. The random process Wt is WSS with E(Wt) = 20 (because the bias b is zero-mean), Var(Wt)is constant over time, and Cov(Wn, Wm) = 0 for n ≠ m. The ensemble mean MW (t1) and MW (t2) still converge to E(Wt) = 20. However, the sample means M (W 1), M (W 2)and M (W 3) now converge to 17, 20 and 23 respectively because we have assumed b = −3, 0, 3 for the blue, green and red realizations in Figure 1C. We now define ergodicity as follows (Papoulis, 2002):
A random process Xt is ergodic if its ensemble statistics and sample statistics converge to the same values. An ergodic process is WSS.
The distinction between ensemble statistics and sample statistics becomes important as we conceptualize fMRI as a random process in the next section.
2.4. Interpreting fMRI as a random process
In the previous examples of random processes (Figure 1), each realization consisted of one uncertain quantity (temperature) at each time point. However, a random process Xt can also be multivariate, i.e., each realization consists of a vector at each time point. fMRI data typically consists of N time series, where N is the number of voxels or regions of interest (ROIs). Therefore the fMRI data can be thought of as a multivariate random process with a N × 1 vector of measurements at each time point (i.e., TR).
For a multivariate random process, E(Xt) is now an N × 1 vector equivalent to averaging Xt across infinite realizations of the random process at time t. The N × N auto-covariance matrix Cov(Xn, Xm) = E[(Xn − E(Xn))(Xm − E(Xm))T] measures the co-variation of Xn and Xm about their respective vectorial means at times n and m. For example, if the i-th row and j-column of the auto-covariance matrix is positive, then for a particular realization of Xt, if the i-th element of Xn is higher than its mean (the i-th element of E(Xn)), then the j-th element of Xm tends to be higher than its mean (the j-th element of E(Xm)). For a WSS process, the auto-covariance Cov(Xn, Xn) = E[(Xn − E(Xn))(Xn − E(Xn))T] is constant over time.
In the case of fMRI, one could potentially interpret Cov(Xn, Xn) as the ensemble N × N (un-normalized) functional connectivity matrix among all brain regions at time n and might therefore be relevant for the topic of dFC. However, difficulties arise because the auto-covariance and WSS are based on ensemble statistics, and therefore require multiple realizations of a random process to estimate. While most researchers can probably agree that the fMRI data of a single subject can be considered a single realization of a random process, what constitutes multiple realizations is more ambiguous. Most neuroscientists would probably balk at conceptualizing the fMRI data of each subject (of a multi-subject dataset) as a single real-ization of the same random process. Therefore in most dFC papers (Chang and Glover, 2010; Handwerker et al., 2012; Zalesky et al., 2014; Hindriks et al., 2016), the fMRI data of different subjects are treated as single realizations of different random processes. As such, both ensemble statistics and hypothesis testing only have access to a single realization of a random process (i.e., relying on sample statistics). Yet, for sample statistics to converge to ensemble statistics (previous section), fMRI must be ergodic, which in turn implies WSS. This creates conceptual and practical issues for dFC analysis that will be the focus for the remainder of this paper.
3. Stationarity does not imply the absence of brain states
The previous section suggests the existence of concep-tual and practical issues when studying dFC. This section focuses on the conceptual issue of whether non-WSS and dFC are equivalent. In the literature, it is often implicitly assumed that WSS implies the lack of fluctuations in FC (e.g., as measured by SWC) or FC states (e.g. Allen et al., 2014). However, we now show that WSS does not imply the lack of FC fluctuations or FC states.
Consider a toy “brain” with two regions whose sig-nals correspond to a bivariate random process Xt containing two brain states S1 and S2. If the brain is in state S1 at time t, then, i.e.,the two brain regions are functionally connected with r = 0.9. If the brain is in state S2 at time t, then i.e., the two regions are anticorrelated with r = −0.2. Finally, let the probability of transitioning between the two brain states be given by the following transition probability matrix: , i.e.,from time t to t + 1, there is a 0.99 probability of remainingin the same state and a 0.01 probability of switchingstate. This random process is known as a hidden Markovprocess (HMM) (Baum and Petrie, 1966).
Three realizations of this process are shown in Figure 2.The blue and green time courses correspond to the signals of the two brain regions. The sliding window correlations (SWC) between the two regions (red lines in Figure 2) exhibit huge uctuations with correlations close to 0.9 and −0.2 in states S1 and S2 respectively.
Most neuroscientists would probably agree that thistoy brain exhibits brain states and FC.However, thistoy brain is also WSS because the ensemble mean E(Xt) is constant over time, and heensemble auto-covariance Cov(Xn, Xm) is only a function of the interval n − m. More importantly, the ensemble (unnormalized) functionalconnectivity matrix Cov(Xn, Xn) in this toy brain is constantover time, despite the presence of brain states and dFC (Figure 2). This situation arises because dFC and brain states exist within a single realization, where the FC at a given time depends on the underlying brain state, which can change over time in a single realization. How-ever, these dynamics are averaged out (across realizations) when considering ensemble notions like WSS. Therefore WSS does not imply the absence of brain states or fluctuations in FC.
It is also worth noting that non-WSS does not nec-essarily imply the presence of dFC either. If the (unnormalized) functional connectivity matrix Cov(Xn, Xn) varies as a function of time, then fMRI is non-WSS and the brain exhibits true fluctuations in FC. However, non-WSS can arise from just a non-stationary mean E(Xt). In other words, the first order statistics (mean) can be non-stationary, while the second order statistics (variance and covariance) remains stationary. For example, the random process Vt (Figure 1A) is non-stationary because its mean E(Vt) is non-constant over time, but its variance Var(Vt) is actually constant over time. Therefore, even if fMRI is shown to be non-WSS, this might be due to non-stationary spontaneous activity level and/or non-stationary FC. In other words, non-WSS does not necessarily imply real fluc-tuations in FC.
While we focus on dFC (second order statistics) in this paper, many of the same issues also apply to the study of dynamic activity level (first order statistics). For example, one could modify the previous HMM example (Figure 2) so that the two states have different ensemble means, but the same ensemble covariance matrix. In this case, the resulting toy brain will be WSS, while still exhibiting real dynamic spontaneous activity (but not dFC)1.
Despite the caveat that non-WSS does not necessarily imply real fluctuations in FC, establishing non-stationarity of fMRI is still a useful step towards establishing dFC. Therefore in the next section, we leave behind the conceptual issues raised in this section, and dive into the statistical testing of non-stationarity when only a single realization of the random process is available.
4. Stationarity cannot be tested alone
Statistical testing of FC non-stationarity is difficult for two reasons. First, observed dFC values (e.g., using SWC) are only estimates of true values (Hindriks et al., 2016; Laumann et al., 2016). As such, observed dFC fluctuations might simply correspond to sampling variability or measurement noise. Second, as explained in previous sections, the notion of stationarity is based on ensemble statistics defined across infinite realizations of a random process. Therefore observing fluctuations in a single realization of fMRI cannot be directly interpreted as evidence for non-stationarity but requires further statistical testing.
This section seeks to provide insights into common approaches for statistical testing of FC stationarity. A statistical test for non-stationarity requires defining a test statistic and a procedure to generate null data preserving certain properties of the single fMRI realization. The test statistic computed from real data is then compared against the null distribution of test statistics computed from the null data. A significant deviation of the real test statistic from the null distribution of test statistics would result in the rejection of the null hypothesis.
The test statistic should ideally reflect the null hypothesis being tested. For example, if one is interested in whether there are “real” fluctuations in FC between two brain regions, then an intuitive test statistic might be the variance of the SWC (Hindriks et al., 2016): where SWC(t) is the SWC between the two brain regions at time t and µ is the mean of the SWC time series. A higher κ (relative to a properly generated null distribution) indicates stronger evidence of dFC.
While there have been a wide variety of test statistics and dFC measures proposed in the literature (Sakoğlu et al., 2010; Chang and Glover, 2010; Zalesky et al., 2014; Hindriks et al., 2016), there has been significantly less discussion about the assumptions behind procedures for generating null data. Such assumptions are highly important because if one manages to generate WSS null data that also preserve all other properties (e.g., possible nonlinearity or non-Gaussianity) of the original data, then a rejection of the null hypothesis would imply non-WSS. However, as will be seen, the two main approaches – autoregressive randomization (ARR) and phase randomization (PR) – actually generate linear, WSS, Gaussian data. Therefore a rejection of the null hypothesis only implies that the signal is non-linear, non-WSS, non-Gaussian or any combination of the above (Schreiber and Schmitz, 2000).
We begin by discussing properties of the fMRI time series we hope to preserve in the null data, followed by explaining how ARR and PR preserve these properties, and the relationships between the two approaches. Finally, we illustrate with examples of how the null hypothesis can be rejected even though the underlying data is WSS.
4.1. Properties to preserve in null data
To test if observed fluctuations in FC (e.g., SWC) can be completely explained by static FC (e.g., Pearson correlation), the null data should retain the static FC observed in real data. To preserve static FC, we could simply generate null data by permuting the temporal ordering of fMRI time courses. However, this procedure destroys the auto-correlational structure inherent in fMRI, and therefore the null hypothesis will be easily rejected. In other words, procedures for generating null data should also preserve the auto-correlational structure of fMRI data in addition to static FC.
Let us define these auto-correlations more precisely. Suppose we observe fMRI time courses from N ROIs (or voxels) of length T. Let xt be the N × 1 vector of fMRI data at time t after each time course has been demeaned. We define the auto-covariance sequence to be the following N × N matrices: where ′ denotes transpose. We note that the diagonal elements of Rl encode the auto-covariance of individual time courses, while the off-diagonal terms of Rl encode what is usually referred to as the cross-covariance between pairs of time courses.
The auto-covariance sequence {Rl} measures the co-variance of fMRI data l time points apart. Since the auto-covariances are computed from a single realization of fMRI data, the Rl are considered sample statistics. R0 is the (un-normalized) functional connectivity matrix typically computed in the literature. If fMRI data is ergodic (and hence WSS), then R0 would be equal to the ensemble (un-normalized) functional connectivity matrix among all brain regions (for sufficiently large T). As will be seen, R0…encodes static properties of fMRI, while the higher order auto-covariances R1, …, RT -1 encode the dynamic properties of fMRI. We will now examine two frameworks for generating null data that preserve auto-covariances of the original data.
4.2. Autoregressive randomization (ARR)
The ARR framework has been utilized by the statistics and physics community for decades (Efron and Tibshirani, 1986) and adopted by seminal papers in the dFC literature (Chang and Glover, 2010; Zalesky et al., 2014). Suppose we have fMRI time courses from N brain regions (or voxels). Each fMRI time course is assumed to be demeaned2.ARR assumes that the fMRI data at time t is a linear combination of the fMRI data from the previous p time points: where p ≥ 1, xt is the N × 1 vector of fMRI data at time t, ∊t ∼ 𝓝(0, Σ) corresponds to independent zero-mean gaus-sian noise3, and Al is a N × N matrix encoding the linear dependencies between time t and time t − l. Eq. (3) is known as a p-th order Gaussian autoregressive (AR) model.
ARR proceeds by first estimating the AR model parameters (Σ, A1, …, Ap) from the fMRI data (details in Appendix A1). Each null fMRI timeseries is initialized by randomly selecting p consecutive time points of the original data, and then repeatedly applying the AR model (Eq. (3)) until null data of length T are generated.
Suppose the estimated AR model is stable. Then the AR model corresponds to a linear WSS Gaussian process whose auto-covariance sequence R0, … Rp matches those of the original data.
The preservation of the first p+1 auto-covariances of the original data is a consequence of the Yule-Walker equations (Yule, 1927; Walker, 1931). Further details are found in Appendix A2. One consequence of the above result is the need to verify that the estimated AR model parameters correspond to a stable AR model (see Appendix A2).
Furthermore, any Gaussian linear process can be approximated arbitrarily well by an AR model4 (El-Shaarawiand Piegorsch, 2013). Therefore, signficant deviation from ARR null data (i.e., null hypothesis rejected) might be due to the fMRI data being non-linear, non-Gaussian, non-WSS or any of the above.
The matching of the higher order auto-covariances R1, …, Rp arise from the linear dynamical interactions between brain regions (Al in Eq. (3)). Therefore the higher order auto-covariances encode the dynamic properties of functional connectivity beyond the static FC encoded by R0.
4.3. Phase randomization (PR)
The PR framework for generating null data has been utilized in the physics community for decades (Tucker et al., 1984; Osborne et al., 1986; Theiler et al., 1992; Prichard and Theiler, 1994). It has been applied to fMRI in several important dFC papers (Allen et al., 2014; Hindriks et al., 2016). We again assume without loss of generality that each fMRI time courses has been demeaned. The PR procedure generates null data by performing Discrete Fourier Transform (DFT) of each time course, adding a uniformly distributed random phase to each frequency, and then performing the inverse DFT. Importantly, the random phases are generated independently for each frequency, but are the same across brain regions. Details of this procedure is found in Appendix A3.
PR generates data corresponding to a linear WSS Gaussian process whose auto-covariance sequence R0, …, RT -1 matches those of the original data.
Proof of WSS is found in Appendix A4. The preservation of all auto-covariances of the original data is a consequence of the Wiener-Khintchine theorem (Wiener, 1930; Khint-chine, 1934; Prichard and Theiler, 1994; Weisstein, 2016); see further elaborations in Appendix A5. An important consequence of the above result is that a rejection of the null hypothesis could be due to the fMRI data being non-linear, non-Gaussian, non-WSS or any of the above.
The relationship between ARR and PR is summarized in Figure 3. ARR preserves the first p +1 auto-covariances of the original data, while PR preserves the entire auto-covariance sequence. Therefore, if the original data is not auto-correlated beyond p time points (i.e., Rl = 0 for l greater than p), then a p-th order ARR would be theoretically equivalent to PR, except for implementation details. Differences arising from implementation details should not be downplayed. For example, estimating the parameters of a p-th order AR model requires the original data to be at least of length T = p (N + 1), where N is the number of brain regions. The implication is that a (T − 1)-th order ARR cannot be performed even if it is theoretically equivalent to PR. Another difference is that PR null data is only Gaussian for sufficiently long T (Tucker et al., 1984), while ARR does not have the constraint. On the other hand, ARR preserves the auto-covariance sequence for sufficiently long T, while PR preserves the entire auto-covariance sequence for any T. Yet another difference is that PR can only generate null data of the same length as the original data, while ARR can generate null data of arbitrary length, although in the case of dFC, we are typically interested in generating null data of the same length as the original data. As will be seen in the next sections, both approaches appear to yield similar conclusions in dFC analyses despite the practical differences.
4.4. WSS data can be rejected by ARR and PR
To demonstrate that rejection of the null hypothesis with ARR and PR null data does not imply non-stationarity, we consider the toy brain in the previous section (Figure. 2). Recall that the toy brain is a HMM with two brain regions and two brain states, but is WSS.
Figure 4 shows a single realization of this toy brain (Figure 4A), and corresponding first order ARR (Figure 4B) and PR (Figure 4C) null data. The blue and green time courses correspond to the signals of the two brain regions. The PR and ARR null data successfully replicate the auto-covariance sequence of the original time series. For example, the static functional connectivity (Pearson correlation) between the time courses of the two brain regions is equal to 0.35 in the original, ARR and PR data.
On the other hand, the SWC between the two regions in the original data (red line in Figure 4A) exhibit huge fluctuations with correlations close to 0.9 and 0.2 in states S1 and S2 respectively. However, there is little variation in SWC correlations for the null data (red lines in Figures 4A and 4B). Using the κ statistic (Eq. (1)), the null hypothesis is easily rejected.
This result is indeed good news because the implication is that using current methodologies, the null hypothesis can be rejected for a WSS brain with states assuming sufficient statistical power (e.g., the states have sufficiently distinct connectivity patterns and the sliding window is shorter than the average dwell time of a brain state, etc). However, the bad news is that a rejection of the null hypothesis does not imply the existence of brain states because the rejection might simply be due to a non-Gaussian process with no brain states. These considerations are unfortunately moot because experiments with real data (next section) suggest that the stationary linear Gaussian model cannot be rejected for most low motion Human Connec-tome Project (HCP) participants.
5. Stationary linear Gaussian model cannot be rejected for most low motion subjects
In this section, we show that for most low-motion Human Connectome Project (HCP) participants, the stationary linear Gaussian model cannot be rejected. In addition, we show that one form of ARR used in the literature might result in false positives and should be utilized with care.
5.1. HCP data and SWC computation
We considered ICA-FIX fMRI data from the HCP S900 data release in fsLR surface space (Glasser et al., 2013; Smith et al., 2013a; Van Essen et al., 2013). Since motion can potentially introduce false positives in dFC (Laumann et al., 2016), our analyses were restricted to participants whose maximum framewise root mean square (FRMS) motion5 was less than 0.2mm and maximum DVARS was smaller than 75. Among the four fMRI runs available for each HCP participant, the second run (REST1 RL) yielded the most participants (116) who survived these criteria and was therefore considered (the remaining runs were ignored). Of the 116 remaining participants (or runs), the top 100 participants with the smallest average FRMS were selected. Among these 100 low motion participants, 5Values were obtained from Movement RelativeRMS.txt provided by HCP. average FRMS of the second run ranged from 0.051mm to 0.073mm.
For each participant, the fMRI signal was averaged within each of 114 cortical ROIs (Yeo et al., 2011, 2015; Krienen et al., 2016) resulting in an 114 1200 matrix of fMRI data per participant. Following Zalesky et al. (2014), SWC was computed using a window size of 83 frames (59.76s), consistent with window sizes recommended in the literature (Leonardi and Van De Ville, 2015; Liégeois et al., 2016).
5.2. PR, multivariate ARR and bivariate ARR
For each participant, null fMRI data was generated using ARR (section 4.2) and PR (section 4.3). For the ARR procedure, the most common variant in the litera-ture (Chang and Glover, 2010; Zalesky et al., 2014) involves estimating for each pair of brain regions, a 2 × 2 Al matrix (Eq. (3)) for each temporal lag l (even though there are 114 ROIs). In other words, the resulting null time courses are generated for each pair of brain regions separately. We refer to this procedure as bivariate ARR. In contrast, multivariate ARR estimates a single 114 × 114 Al matrix (Eq. (3)) for each lag l. For both multivariate ARR and bivariate ARR, an AR order of p = 1 was utilized. A larger order p did not affect our conclusions (see additional control analyses in section 5.6). For each procedure and each participant, 1999 null datasets were generated6.
5.3. Most pairs of brain regions exhibit stationary, linear and Gaussian dynamics
We first tested if there exists “dynamic” connections in the human brain, defined as ROI pairs exhibiting greater SWC variance (Eq. (1)) than those from null data. To this end, for each participant and ROI pair, the observed SWC variance (computed from real data) was compared against the null distribution of SWC variance generated from the 1999 null datasets, resulting in one p value for each ROI pair7. Within each participant, multiple comparisons were corrected by applying a false discovery rate (FDR) of q < 0.05 to the 6441 p values.
Figure 5 illustrates the number of significant ROI pairs across the 100 participants. For both multivariate ARR and PR, 57% of the participants have 0 significant edges. On average (across 100 participants), 36.8 and 34.2 edges were significant with multivariate ARR and PR respectively. Therefore a stationary linear Gaussian model was able to reproduce the SWC fluctuations of more than 99.4% of ROI pairs.
On the other hand, for bivariate ARR, 21% of the participants have 0 significant edges. On average (across 100 participants), 306.7 edges were significant. In other words, bivariate ARR tends to be less strict in terms of rejecting the null hypothesis. We will return to this point in Section 5.5.
5.4. For almost all low-motion HCP subjects, coherent brain dynamics are stationary, linear and Gaussian
Existence of coherent SWC fluctuations was tested using the approach of the pioneering dFC paper (Zalesky et al., 2014). More specifically, this approach involves computing the SWC time series for all ROI pairs and then selecting the top 100 most dynamic SWC time series as measured by the SWC variance (Eq. (1)). The percentage variance explained by the top principal component of these 100 SWC time series was utilized as a test statistic (Zalesky et al., 2014). A high percentage variance would imply the existence of coherent SWC fluctuations across the 100 pairs of brain regions. The percentage variance computed from real data was compared against the null distribution from 1999 null datasets, resulting in one p value for each participant. Multiple comparisons across participants were corrected using a FDR of q < 0.05.
Figure 6 illustrates data from a representative HCP subject, as well as one single representative null data from PR, first order multivariate ARR and eleventh order bivariate ARR. The order of the bivariate ARR was chosen to match Zalesky et al. (2014). The top 100 most dynamic SWC time series (blue lines in Figure 6) exhibited massive fluctuations in the representative participant (Figure 6A) and corresponding null data (Figure 6B-6D). In the representative participant (Figure 6A), PR (Figure 6B), multivariate ARR (Figure 6C) and bivariate ARR (Figure 6D), the first principal component of the 100 SWC time series (red line in Figure 6) accounted for 62%, 58%, 66% and 11% of the variance respectively.
On average (across 100 participants), the first principal component explained 49%, 46%, 45% and 10% variance in real data, PR, multivariate ARR and bivariate ARR respectively (Figure S1). For PR and multivariate ARR null data, the null hypothesis was rejected for only one participant. Therefore the stationary linear Gaussian model reproduces coherent SWC fluctuations for 99% of the low motion HCP participants. For bivariate ARR, the null hypothesis was rejected for all the participants. This discrepancy is discussed in the following section.
5.5. Why bivariate ARR might generate false positives
The previous two sections suggest that bivariate ARR commonly used in the literature (Chang and Glover, 2010; Zalesky et al., 2014) might be susceptible to false positives. To understand why this might occur, let us consider the toy example illustrated in Figure 7. In this toy example, there are three brain regions X, Y and Z, whose signals follow a first order AR model (Figure 7A). With the multivariate ARR procedure, the AR parameters were estimated using the time series from all three brain regions. The estimated AR parameters (Figure 7B) were the same as the true parameters (Figure 7A). As illustrated in Figure 7B, there is no arrow directly connecting brain regions Y and Z. In other words, brain regions Y and Z only influence each other via brain region X.
On the other hand, the bivariate ARR procedure estimates AR parameters separately for brain regions X and Y, brain regions X and Z, and brain regions Y and Z. The estimated parameters (Figure 7C) are generally different from the true AR parameters (Figure 7A). More specifically, brain regions Y and Z exert direct influence on each other (which is non-existent under the true model).
When generating null data using multivariate ARR, the time course at brain region X is generated by taking into account the influence of both brain regions Y and Z (Figure 7B). However, in the bivariate ARR procedure, the time course at brain region X is generated by taking into account the influence of only brain region Y (left panel of Figure 7C) or brain region Z (center panel of Figure 7C), but not both. Therefore bivariate ARR neglects influence among all brain regions, thus resulting in null data that are less “dynamic”.
Furthermore, since bivariate ARR estimates the AR parameters for each pair of brain regions separately, any coherence among pairs of brain regions is destroyed. Consequently, the false positive situation appeared more severe when evaluating the existence of coherent brain dynamics (section 5.4) than when evaluating the existence of “dynamic” connections (section 5.3).
5.6. Control analyses
To ensure the results are robust to the particular choice of parameters, the sliding window size was varied from 20 frames to 100 frames, and the AR model order p was varied from 1 to 8. When evaluating coherence of whole brain dynamics, the number of most dynamic connections was also varied from 20 to 200. Since thresholding by DVARS might artificially exclude “real” dynamics, we also considered all the HCP subjects, rather than just the top 100 participants with least motion and DVARS. None of these changes significantly affected the results. Surprisingly, AR models of orders ranging from 1 to 8 explained similar variations in SWC fluctuations (Figure S2). However, AR model of order 0 (i.e., only preserving R0 or static FC) could not explain SWC fluctuations (Figure S2). Since our results might be sensitive to the choice of test statistic, the nonlinear statistic utilized in Zalesky et al. (2014) was also considered. We found that it was even more difficult to reject the null hypothesis using this nonlinear statistic.
Finally, some authors have suggested that regressing mean grayordinate signal (akin to global signal regression) in addition to ICA-FIX might be necessary to remove global noise artifacts (Burgess et al., 2016; Siegel et al., 2016). When mean grayordinate signal was regressed, the SWC fluctuations were even less statistically significant. For example, when evaluating the existence of coherent brain dynamics, the null hypothesis was not rejected for all 100 participants (Figure S3).
6. Stationary linear Gaussian models explain SWC fluctuations better than HMM
Section 5 suggests that distinguishing the linear stationary Gaussian model from real fMRI data is difficult (at least for the statistics tested). However, failure to reject the null hypothesis could be due to a lack of statistical power, rather than the null hypothesis being true. It is entirely possible that HMM-type models (Allen et al., 2014; Wang et al., 2016) might generate null data that fit observed fluctuations in SWC better than the linear stationary Gaussian model.
To test this possibility, Figure 8A shows the T × T functional connectivity dynamics (FCD) matrix of a rep-resentative HCP participant, where the i-th row and j-th column of the FCD matrix corresponds to the correlation between the SWC of time points i and j (Hansen et al., 2015). The presence of relatively large off-diagonal entries (yellow in Figure 8A suggest the presence of recurring SWC patterns.
The PR (Figure 8B) and first-order multivariate ARR (Figure 8C) null data were able to replicate the rich dynamics of the empirical FCD matrix (Figure 8A), while first-order bivariate ARR null data (Figure 8D) exhibited signicantly weaker recurring SWC patterns. By contrast,the 3-state HMM null data8 (Figure 8E) exhibited recurring SWC patterns with much sharper transitions than real data (Figure 8A).
To quantify these differences, the FCD matrices of the four null data generation approaches (Figure 8B-E) were compared with the empirical FCD matrix (Figure 8A) using the Kolmogorov-Smirnov statistic (Figure S4). Both the PR null data and (first-order) multivariate ARR null data fitted the empirical FCD matrix better than the HMM null data (two-sample t-test p < 1e-32 and p < 1e-30 respectively).
11 states were necessary for the HMM model to perform as well as first-order multivariate ARR (Figure S4). However, visual inspection of the FCD matrix (Figure S5A) suggests that the 11-state HMM still generated SWC patterns with sharper transitions than real data. The results were replicated (Figure S6) in the one subject for whom the stationary, linear, Gaussian null hypothesis was rejected in Section 5.4. For completeness, Figures S4 and S5B show the FCD results replicated with the Laumann null data generation approach (Laumann et al., 2016).
7. Discussion
In this commentary, we seek to improve our under-standing of observed fluctuations in resting-state FC (e.g., SWC) widely reported in the literature. These fluctuations have often been interpreted as dynamic changes in inter-regional functional interactions, and non-stationary switching of discrete brain states (e.g. Allen et al., 2014). However, several recent papers have questioned these interpretations, especially in the case of single subject fMRI data (Hindriks et al., 2016; Laumann et al., 2016).
7.1. Linking Stationarity, dFC and brain states
By reviewing the conceptualization of fMRI as a random process (Section 2), we highlight that many statistical notions, such as ensemble auto-covariance and WSS, are reliant on ensemble statistics. Ensemble statistics are defined by taking into account an infinite number of realizations of a random process. However, the fMRI data of multiple participants are often considered as single realizations of different random processes. Therefore all FC measures in the literature are actually based on sample statistics (i.e., statistics based on one realization), but not ensemble statistics. It is possible that in the case of fMRI, ensemble statistics are equal to sample statistics. However, in this scenario, fMRI would be ergodic, which would in turn imply that fMRI is WSS.
Because the definition of WSS involves ensemble statistics, it is possible to come up with a toy brain with discrete brain states (i.e., HMM process) that is both WSS and ergodic (Section 3). Given that a WSS process can exhibit sharp transitions in SWC (Figure 2), this suggests that observed fluctuations in functional connectivity are not necessarily evidence of a non-stationary system.
The loose use of the term “non-stationarity” in the literature is not merely a linguistic issue, but can lead to potential confusion because current dFC statistical test-ing approaches rely on frameworks from the physics and statistics communities utilizing strict statistical notions, including stationarity (e.g. Schreiber and Schmitz, 2000). This motivates our detailed exploration of the assumptions behind the popular ARR and PR frameworks for generating null data for hypothesis testing of dFC.
It is possible that the dFC community might not be referring to “non-stationarity” in the statistical sense. How-ever, the widely used null hypothesis testing frameworks (ARR and PR) do rely on traditional statistical notions of stationarity. Therefore it is important for the community to articulate the exact statistical notions (e.g., piecewise stationarity; Nason (2013)) that might encode the intuitive notion of “non-stationarity” mentioned in the dFC literature. Having the exact statistical notions will lead to better null hypothesis testing frameworks.
7.2. Preserving auto-covariance beyond static FC
Our review of ARR and PR frameworks (Section 4) shows that both approaches retain the 0-th sample auto-covariance (R0 in Eq (2)) of the original fMRI data. R0 can be interpreted as static FC (being an unnormalized variant of Pearson’s correlation). Therefore R0 is an im-portant quantity to preserve in null data since the dFC researcher is presumably interested in showing that dFC cannot be completely explained by static FC.
Preserving R0 (static FC) can be easily achieved by permuting the temporal ordering of fMRI time points. However, such a procedure ignores well-known autocor-relation in the original fMRI data, which will lead to false positives in null hypothesis testing. Both PR and ARR seek to preserve auto-correlations in addition to R0. More specifically, PR preserves the entire sample auto-covariance sequence (R0, RT -1 in Eq (2)) of the original fMRIdata, while ARR of order p preserves only the first p + 1terms of the auto-covariance sequence (R0, … Rp). For example, ARR of order 1 preserves R0 and R1.
Since PR preserves higher order auto-covariances than ARR, it might be theoretically advantageous. However, in our experiments, PR and multivariate ARR of order 1 were able to explain observed SWC in real fMRI data equally well (Figures 5, 6 and 8). On the other hand, ARR of order 0 (i.e., only R0 or static FC is preserved) does not explain SWC fluctuations at all (Figure S2). The implication is that fluctuations in SWC can largely be explained by taking into account auto-covariances of lag 0 (i.e., static FC or R0) and lag 1 (i.e., R1) from the original fMRI data.
7.3. Null data generation in the literature
Understanding why ARR and PR are able to preserve sample auto-covariances of the original data (Figure 3) is useful for interpreting other null data generation approaches in the literature. For example, Allen et al. (2014) applies the PR framework directly on the sliding window FC time series, and not on the fMRI time series. There-fore in the case where the same random phase sequence was added to all dFC phase spectra (i.e., SR1 in Allen et al., 2014), the null dFC time series have the same auto-covariance sequence as the original dFC time courses, but not the auto-covariance of the original fMRI time courses. Therefore the tested null hypothesis is similar, but not the same as other papers that applied PR to the fMRI time series (Handwerker et al., 2012; Hindriks et al., 2016).
More recently, Laumann et al. (2016) proposed a procedure to generate null data that matches the static FC (R0) and the power spectral density of the original fMRI data (averaged across ROIs). Given the deep relationship between the auto-covariance sequence and cross-spectral density of the original fMRI data (Appendix A5), preserving the average power spectral density retain some (but not all) information of the original fMRI auto-covariance structure beyond static FC. The advantage of the Lau-mann (and PR) approaches over multivariate ARR is that the number of fitted parameters does not increase quadratically with the number of ROIs. This allows the generation of null data with large number of ROIs for which multivariate AR model parameters cannot be estimated (see Appendix A2). However, because the Laumann approach preserved fewer properties of the original data than PR or multivariate ARR, one might expect the Laumann null data to be less close to real data than PR or multivariate ARR. Indeed, this expectation is empirically confirmed in Figures S4 and S5B. On the other hand, these results also suggest that compared with bivariate ARR, the Laumann procedure generated null data that were more similar to real data.
ARR have also been widely used in neuroimaging applications, mostly following the bivariate variant proposed by Chang and Glover (2010). Our results suggest that bivariate ARR neglects higher-order interactions (Figure 7), resulting in wrongly estimated AR model parameters, potentially leading to false positives. The issue with wrongly estimated AR model parameters is reminiscent of Friston’s criticism of functional connectivity (Friston, 2011), where two regions A and B might be functionally connected because of mutual effective connectivity with an intermediary region C. However, it should be noted that AR models are not effective connectivity models because of the lack of hemodynamic modeling (Friston, 2009).
The false positive rate associated with bivariate ARR is less serious when investigating the existence of individual “dynamic” edges (Figure 5), but extreme when investigating the coherence of SWC across dynamic pairs of brain regions. More specifically, when investigating SWC coherence, the null hypotheses for all participants were rejected using bivariate ARR, compared with only one participant for multivariate ARR or PR (Figures 6 and S1).
7.4. The stationary, linearity, Gaussian null hypothesis
Our review suggests that PR and ARR generate null data that are linear, WSS and Gaussian (Section 4). Conversely, any linear Gaussian process can be arbitrarily well approximated by an AR model of sufficiently large order p. Together, this implies that if the original time series significantly differ from ARR or PR null data, then the original data is non-Gaussian or nonlinear or non-stationary. For example, the null hypothesis was easily rejected for the WSS two-state toy brain (Figure 4) due to nonlinearity and non-Gaussianity, rather than non-stationarity.
Nevertheless, this ambiguity is less of an issue because our results also suggest that the stationary linear Gaussian null hypothesis cannot be rejected for most low-motion HCP participants (Section 5). On average (across participants), the stationary linear Gaussian null hypothesis can only be rejected for 0.6% of brain region pairs (Figure 5) when using PR and multivariate ARR. When studying dFC coherence, the stationary linear Gaussian nullhypothesis was only rejected for 1% of the participants (Figures 6, S1 and S2).
The difficulty in rejecting the null hypothesis is some-what surprising given that the brain is a complex organ possessing nonlinear neuronal dynamics (e.g., Hodgkin and Huxley, 1952; Valdes et al., 1999; Deco et al., 2008; Stephan et al., 2008; Deco et al., 2011). However, our results are consistent with previous literature reporting difficulties to reject the null model, especially in single subject fMRI data (Hindriks et al., 2016; Laumann et al., 2016). A close look at the dFC literature suggests similar difficulties in seminal dFC papers. For example, dFC states were found using null data generated by adding the same random phase sequence to all dFC phase spectra during PR (i.e., SR1 in Allen et al., 2014). Similarly, Zalesky and colleagues (Zalesky et al., 2014) reported that on average across subjects, the null hypothesis was rejected for only 4% of edges when using bivariate ARR, which is highly consistent with our bivariate ARR results, where 5% of edges were rejected. Therefore, the dFC literature is consistent in finding only small deviations from the stationary, lineary, Gaussian null hypothesis. Much of the controversy might be due to differences in the interpretation, i.e., view-ing the glass as half-full or half-empty.
For the small minority of edges or participants whose null hypothesis is rejected, it is unclear whether this deviation is due to non-stationarity, non-linearity or non-Gaussianity. It is also unclear whether these deviations are due to artifacts (e.g., respiration; Laumann et al., 2016; Power et al., 2017) or biologically meaningful. One approach of demonstrating biological relevance is by association with behavior or disease. While there are many studies linking dFC measures (e.g., SWC, dwell time of brain states, etc) with behavior and diseases (Damaraju et al., 2014; Barttfeld et al., 2015; Su et al., 2016; Du et al., 2016; Nomi et al., 2017; Shine et al., 2016; Wang et al., 2016), there are far fewer studies explicitly demon-strating that dFC measures are able to explain behavioral measures or disease status above and beyond static FC (e.g. Rashid et al., 2014). Moreover, to the best of our knowledge, we are unaware of any studies showing dFC-behavioral associations above and beyond the stationary, linear and Gaussian model. Therefore, for studies demon-strating that their dFC measures are more strongly associated with behavior than static FC (e.g. Rashid et al., 2014), the improvement might potentially be explained by the AR model, which encodes both static FC (R0) and linear dynamical interactions between brain regions (R1, etc)
7.5. Does dynamic functional connectivity exist?
Does stationarity, linearity and gaussianity of fMRI time series imply that dFC is spurious? Obviously, if dFC is strictly defined as non-WSS (Section 2.2), then stationarity does imply the lack of dFC. However, Section 3 suggests that such a definition of dFC would exclude a class of signals (e.g., HMM) that most neuroscientists would think of as encoding dFC. Therefore alternative definitions of dFC should be considered.
If dFC is thought of as corresponding to the brain sharply switching between discrete states with distinct FC patterns, then our results suggest a lack of evidence in resting-state fMRI. The presence of HMM-type states could potentially lead to the rejection of the stationary linear Gaussian null hypothesis (Section 3). However, the null hypothesis was not rejected for most low motion HCP participants (Section 5). This non-rejection could be due to a lack of statistical power. Therefore we tested whether an HMM-type model explicitly encoding the presence of states would fit SWC fluctuations better than AR (or PR) models. We found that ARR and PR reproduced the gentle fluctuations of recurring SWC patterns in real data, whereas sharp transitions were observed in the HMM (Figure 8). This result was replicated (Figure S6) in the one subject for whom the stationary linear Gaussian model was rejected (Section 5.4). Altogether, this suggests the lack of discrete brain states, which is consistent with some proposals that SWC might be better explained by a mixture of states (Leonardi et al., 2014).
If dFC is thought of as the existence of FC information beyond static FC, then our results do support the existence of dFC because multivariate AR models explained SWC fluctuations significantly better than just static FC (Figure S2). Indeed, AR models are often considered models of linear dynamical systems (e.g. Casti, 1986; Gajic, 2003). By encoding linear dynamical interactions between brain regions (A1 in Eq. (3)), the first order AR model captures both static FC (i.e., R0 in Eq. (2)) and temporal FC structure (i.e., R1 in Eq. (2)). It is worth reminding the readers that the diagonal elements of R1 encode the auto-correlation within individual brain regions, while the off-diagonal terms encode lagged cross-covariance between brain regions. Since the Laumann null data generation approach (Laumann et al., 2016) explicitly preserves static FC and temporal autocorrelation, but did not explain SWC as well as first order AR models (Figure S5), this suggests the importance of the off-diagonal entries of R1, i.e., lagged cross-covariance between brain regions. The importance of such resting-state lagged cross-covariances has been described in humans (Mitra et al., 2014; Raatikainen et al., 2017) using fMRI and in animals using a variety of electrophysiological techniques (Moha-jerani et al., 2013; Stroh et al., 2013; Matsui et al., 2016); for review, see Mitra and Raichle (2016).
Finally, if dFC refers to the presence of biological infor-mation in FC fluctuations, then recent evidence suggests that FC fluctuations within a single fMRI session can be linked to varying level of vigilance (Wang et al., 2016). Others have shown that fluctuation in activity level fluctuations might also be associated with arousal (Chang et al., 2016) or attention (Kucyi et al., 2016). However, it is currently unclear whether behavioral fluctuations might be more readily explained by fluctuating activity level (first order statistics) or fluctuating FC (second order statistics). To see that stationary, linear, Gaussian fMRI does not necessarily contradict the previously mentioned studies, consider the following toy example. Suppose Alice and Bob plays a game, where Alice tosses a fair coin at each round of the game. Every time the coin toss results in a head, Alice pays Bob a dollar. Every time the coin toss results in a tail, Bob pays Alice a dollar. We can see that the coin tosses do not possess any interesting dynamics: the coin tossing is a stationary process and temporally independent. However, the outcomes of the coin toss are still financially (behaviorally) relevant.
7.6. Future directions
Rather than arguing about the existence of dFC, it might be more useful to re-frame this debate in terms of adequate models of fMRI time series: (i) what kind of model reproduces properties of fMRI (and dFC) time series and (ii) what types of FC fluctuations (shapes, timescales) are expected in these models? This framing sidesteps the question of whether dFC exists, but instead relies on mathematical models, from which principled predictions and interpretations might provide further insights into human brain organization.
Given the ability of AR (and PR) models to generate realistic SWC, rather than treating them as just null models, AR models could themselves be utilized to provide insights into the brain. Since more complex models are harder to interpret, they should not be preferred un-less existing models could not fit some important aspect of fMRI data. For example, since models of static FC cannot explain SWC dynamics very well (Figure S4), it is clear that first order multivariate AR model is preferable for a researcher interested in dFC. The next step would be to show that the additional dynamics modeled by AR model (above and beyond static FC) is functionally meaningful, such as by association with behavior or disease.
Since this commentary only focuses on SWC and several statistics (Sections 5.3, 5.4 and 6), it is possible that stationary linear gaussian models are unable to explain other unexamined aspects of fMRI dynamics captured by other statistics (Maiwald et al., 2008; Griffa et al., 2017). In these cases, more complex generative models, such as those involving mixture of states (Leonardi et al., 2014) or wavelets (Breakspear et al., 2004; Van De Ville et al., 2004), might be necessary. However, more complex and biophysically realistic models might not necessarily explain fMRI dynamics better. For example, the use of the FCD matrices (Figure 8) to visualize the rich SWC dynamics was pioneered by Hansen et al. (2015), who demonstrated that nonlinear biophysical (neural mass) models could replicate some of the rich SWC dynamics, while a linear stochastic model was not able to. The linear stochastic model utilized by Hansen and colleagues is essentially the same as the first order AR model utilized here. However, the linear stochastic model was utilized to model neural dynamics, rather than fMRI data directly. The “output” of the linear neural model was then fed into a biophysical hemodynamic response model to generate fMRI data. Importantly, the interactions between brain regions were set to be the diffusion connectivity matrix, rather than fitted to real fMRI data. Consequently, the linear stochastic model (Hansen et al., 2015) did not exhibit the rich dynamics observed in this commentary (Figure 8). Because of the difficulties in estimating the parameters of the nonlinear neural mass models, it is unclear whether an optimized neural mass model might be able to explain FC fluctuations better than a first order multivariate AR model.
A recurring criticism of AR modeling of fMRI time series is that the model parameters cannot be interpreted as effective connectivity because spatial variability of the hemodynamic response function is usually not taken into account (e.g. Friston, 2009). Hence other generative models such as dynamic causal modelling should be preferred if one wants to study effective connectivity (Friston et al., 2003). However, this does not preclude the use of AR models as a diagnostic tool encoding functional connectivity information beyond static FC (Rogers et al., 2010). An obvious challenge will then be to extract the most relevant information from these models (Liégeois et al., 2015; Ting et al., 2016).
8. Conclusion
In this commentary, we explore statistical notions relevant to the study of dynamic functional connectivity. We demonstrate the existence of a stationary process exhibiting discrete states, suggesting that stationarity does not imply the absence of brain states. Our review of two popular null data generation frameworks (PR and ARR) suggests that rejection of the null hypothesis indicates non-stationarity or nonlinearity or non-Gaussianity. We show that most HCP participants possess stationary, linear and Gaussian fMRI during the resting-state. Furthermore, AR models explain real fMRI data better than just static FC, and a popular approach that explicitly models brain states. Overall, the results suggest a lack of evidence for discrete brain states (as measured by fMRI SWC), as well as the existence of FC information beyond static FC. Therefore dFC is not necessarily spurious because AR models are themselves linear dynamical models, encoding temporal auto-covariance above and beyond static FC. Given the ability of AR models to generate realistic fMRI data, AR models might be well suited for exploring the dynamical properties of fMRI. Finally, our results do not contradict recent studies showing that temporal fluctuations in functional connectivity or activity level can be behaviorally meaningful. The code used for this work are available at GITHUB LINK TO BE ADDED.
Acknowledgements
The authors thank Andrew Zalesky for important perspectives and feedback on this work. This work was supported by Singapore MOE Tier 2 (MOE2014-T2-2-016), NUS Strategic Research (DPRT/944/09/14), NUS SOM Aspiration Fund (R185000271720), Singapore NMRC (CB RG/0088/2015), NUS YIA, and Singapore NRF fellow-ship (NRF-NRFF2017-06). RL is also supported by a Wallonie-Bruxelles International-WORLD Excellence Fellowship. Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Wash-ington University.
Appendix A1. Details of ARR procedure and linear Gaussianity
Let xt be the N × 1 vector of fMRI data at time t after each time course has been demeaned. There are many approaches to estimate the p-th order AR model parameters (Σ, A1, …, Ap) from the data xt. A common approach is as follows
Estimate A1, …, Ap using the following least-squares cost function: where T is the number of time points. Let A = [A1, … Ap] (i.e., A is a N × Np matrix). Then the optimum to the above criterion corresponds to where ′ indicates transpose, X is a N × (T−p) matrix and Z is a Np × (T − p) matrix The estimation procedure therefore requires ZZ ′ (in Eq. (A2)) to be full rank, which implies that T ≥ (N + 1)p. For example, if one utilized a parcellation with 400 ROIs, then one would need at least 401 ×3 = 1203 time points for a 3rd order AR model.
After estimating  = [Â1, …, Âp], the residual (error)is defined as follows The covariance matrix ∑ can then be estimated via the empirical covariance matrix:
Once (Σ, A1, …, Ap) are estimated, each ARR null data is initialized by randomly selecting p consecutive time points of the original data, and then repeatedly applying the AR model (Eq. (3)) until null data of length T are generated. By virtue of the AR model (Eq. (3)), the resulting null data is linear Gaussian if the resulting system is stable (see Appendix A2).
In the fMRI literature, it is common to generate null data where ∊t in Eq. (3)) are not generated using a Gaussian distribution, but obtained by sampling from the residuals δt of the identified AR model (Chang and Glover, 2010). In practice, the residuals might not follow a Gaussian distribution (Lutkepohl, 2005), and so the resulting ARR null data would only be linear but not Gaussian. However, in our experiments (not shown), this difference does not have any practical effect on our conclusions.
A2. WSS, stability and auto-covariance of ARR null data
The AR random process (Eq. (3)) is WSS if and only if the AR model is stable (Lutkepohl, 2005; Zivot and Wang,2006; Pfaff, 2008). Stability can be assessed by ensuring that the matrix has eigenvalues with magnitude strictly smaller than one.
Therefore ARR null data is WSS assuming that the estimated AR parameters correspond to that of a stable AR model. This stability condition should therefore be checked when estimating the AR model parameters (Eq. (A1)). Since the original fMRI data is stable (i.e., the fMRI measurements do not diverge to infinity), the estimated AR model should be stable as long as the estimation procedure is reliable. In our experience with fMRI data, this is indeed the case if the AR order p is not too close to its maximal value of (Appendix A1). If the order p is too close to pmax, then the estimation procedure (Eq. (A1)) might require the inversion of an almost singular matrix, which might lead to an unstable AR model.
We now turn our attention to the relationship between the AR model parameters (Σ, A1, …, Ap) (Eq. (3)) and auto-covariance sequence R0, …, Rp (Eq. (2)), which is governed by the Yule-Walker equations (Yule, 1927; Walker, 1931) assuming infinitely long time courses:
Eq. (A8) is invertible for sufficiently large T, and so the AR model parameters are completely determined by the auto-covariance sequence of the original data (Stoica and Moses, 2005) and vice versa (e.g., Pollock, 2011, Chapter 13). Consequently, the AR null data generated by this framework share the first p + 1 auto-covariances (Eq. (2)) computed from the original data.
It is also worth noting that for sufficiently large T, the AR parameters obtained by inverting Eq. (A8) are mathe-matically and practically equivalent to those obtained from Appendix A1. For small T, AR model parameters estimated from Eq. (A8) are guaranteed to correspond to a stable system, while AR model parameters estimated from Eq. (A2) are not (Stoica and Nehorai, 1987). Therefore one should always check the stability of the AR system when using the least squares approach (Appendix A1).
A3. Details of the PR procedure
This appendix elaborates the PR procedure. Let xn denote the time course of the n-th brain region, while denote the t-th time point of the n-th brain region (where the first time point corresponds to t = 0). Without loss of generality, we assume that the time courses have been demeaned. PR proceeds as follows:
The Discrete Fourier Transform (DFT) for each time course xn is computed: where k indexes frequency, and and are the amplitude and phase of the k-th frequency compo-T-k. Because the signal hasis equal to 0 for all time points, therefore nent of the DFT. Since the input signal is real and Because the signal has been demeaned, therefore .
A random phase φk is then added to the DFT coefficients for each brain region n: where ϕk is drawn from a uniform distribution on the [0, 2π] interval. Importantly, ϕk is the same forall brain regions and independently sampled for frequencies k = 0, …, [T/2] (although we note that ϕ0 is useless because ). For k > 1T / 2l ([·] denote the ceiling function), ϕk = ϕT −k because of the need to ensure the null data remains real (rather than complex-valued).
The inverse DFT is then performed for each brain region n resulting in the PR null data: Because and ϕk = −ϕT − k for k > [T/2], the null data (Eq. (A12)) can bewritten in the following form:
A4. PR null data are WSS
To show that the PR null data is WSS, we show thatthe ensemble mean and ensemble auto-covariance do not depend on the time t.
First, by applying “expectation” to Eq. (A13) and via the linearity of expectation, the ensemble mean is equal to: where the second equality arises . Therefore the ensemble mean does not depend on the time t.
Let Cov be the ensemble auto-covariance between the signal of brain region n at time t and the signalof brain region m at time t − l. Because the ensemble mean is equal to 0 for all time points, therefore where A corresponds to the products of cosines with different frequencies and B corresponds to the products of cosines with the same frequencies. More specifically, where the second equality is true due to the fact that the random phases ϕk1 and ϕk2 are independently sampled, and the third equality is obtained from straightforward integration (just like the ensemble mean). On the other hand,
Therefore which does not depend on time t and only depends on the interval l between the two time points t and t − l. Therefore PR null data are WSS. Furthermore, one can also verify that the sample mean is equal to 0 and the sample auto-covariance sequence Rl are equal to the ensemble auto-covariances (Eq. (A22)). Therefore PR null data are also ergodic.
A5. PR null data preserve auto-covariance sequences
In this appendix, we explain why PR null data preserve auto-covariance sequences (Eq. (2)) of the original data. This property arises from the Wiener-Khintchine theorem, first formulated in the univariate case by Wiener (1930) and Khintchine (1934), and then reported in the multivariate case as an extension of the Wiener-Khintchinetheorem (Prichard and Theiler, 1994), or cross-correlation theorem (Weisstein, 2016).
Given two time courses and from brain regions n and m, their (sample) cross spectral density (CSD) is defined as where F(·) is the DFT,k indexes frequency, and is the complex conjugate. Let corresponds to the n-th row and m-column of the au-tocovariance matrix Rl defined in Eq. (2). Then according to the multivariate Wiener-Khintchine theorem:
In other words, the (sample) CSD and the autocovariance sequence of a multivariate random process encode the same information about the data.
Let the PR null time courses for brain regions n and m be denoted as and Their sample CSD corresponds to where the second equality is obtained by plugging in Equation (A10), and the random phase ϕk is the same for all brain regions and therefore cancels out in the third equality. Since the sample CSD is the same between brain regions n and m in the PR null data, then their auto-covariance sequence is also the same according to Wiener-Khintchine theorem (Eq. (A23)).
Footnotes
↵1 Contact: R.Liegeois{at}nus.edu.sg (Raphaël Liégeois)
↵1 One could also modify the HMM so that the toy brain will be WSS, but exhibiting real dynamic spontaneous activity and dynamic FC.
↵2 The mean can always be added back to the null data hencethere is no loss of generality to assume the original time courseswere demeaned.
↵3 ∑ might not be a diagonal matrix, i.e., the noise does not needto be independent across brain regions.
↵4 In statistical parlance, AR processes are dense in the class of linear Gaussian processes.
↵5 Values were obtained from Movement RelativeRMS.txt provided by HCP.
↵6 There were 1999 (and not 2000) null datasets because the real data is also counted as a dataset when computing p values, so that a p value of 0 is impossible.
↵7 Since there were 114 ROIs, each null dataset generated 6441 (= 114(114 − 1)/2) null values, which were pooled across the cortex into a single, highly-resolved null distribution (Zalesky et al., 2014). In other words, for a given participant, all the ROI pairs shared the same null distribution.
↵8 Following popular approaches (Allen et al., 2014; Wang et al., 2016), SWC of the representative participant were clustered into two states using kmeans. Probability of transitioning between states were estimated based on the state assignment by kmeans. Timepoints assigned to the same state was utilized to estimate the mean and covariance matrix of a multivariate Gaussian distribution using maximum likelihood. The estimated model parameters were then used to generate null data. It is worth noting that the 3-state HMM has roughly the same number of parameters as a first-order multivariate AR model.