Abstract
Quantifying the brain’s effective connectivity offers a unique window onto the causal architecture coupling the different regions of the brain. Here, we advocate a new, data-driven measure of directed (or effective) brain connectivity based on the recently developed information flow rate coefficient. The concept of the information flow rate is founded in the theory of stochastic dynamical systems and its derivation is based on first principles; unlike various commonly used linear and nonlinear correlations and empirical directional coefficients, the information flow rate can measure causal relations between time series with minimal assumptions. We apply the information flow rate to electroencephalography (EEG) signals in adolescent males to map out the directed, causal, spatial interactions between brain regions during resting-state conditions. To our knowledge, this is the first study of effective connectivity in the adolescent brain. Our analysis reveals that adolescents show a pattern of information flow that is strongly left lateralized, and consists of short and medium ranged bidirectional interactions across the frontal-central-temporal regions. These results suggest an intermediate state of brain maturation in adolescence.
Introduction
The brain is a complex entity comprising widely distributed but highly interconnected regions, the dynamic interplay of which is essential for brain function. Establishing how activity is coordinated across these regions to give rise to organized (higher order) brain functions ranks as one of the key challenges in neuroscience. Various measures of brain connectivity are in use for this purpose as discussed in (Friston, 1994; Horwitz, 2003; Sporns, 2011; Rubinov and Sporns, 2010; Friston, 2011; Cohen, 2014) and references therein. Structural measures are based on confirmed anatomical connections between brain regions. Functional measures involve dynamically changing, linear or nonlinear, non-directional coefficients of statistical dependence (e.g., correlation, covariance, phase-locking values, coherence) that may appear between structurally unconnected regions. Effective brain connectivity measures capture directionally dependent interactions between different brain regions and aim to identify causal mechanisms in neural processing. In the following, we use the terms “effective” and “’directed” connectivity interchangeably. We refer readers to Sakkalis (2011) and Bastos and Schoffelen (2015) for recent reviews of functional and effective connectivity measures in the brain. Herein, we investigate effective connectivity patterns as revealed by electroencephalography (EEG) recordings (Van de Ville et al., 2010) of scalp electromagnetic fields following source-space reconstruction.
The multichannel EEG signals, which are thought to reflect activity in the underlying brain regions, offer a convenient window into the temporal dynamics of the corresponding brain-scale neuronal networks. EEG studies have been extensively used to infer the nature of the functional connectivity —-i.e. the linear or nonlinear statistical interdependence between the electrical activity in the different brain regions (Stam and Van Straaten, 2012) — during resting state or during task related activities. In this paper, we focus our attention on the former.
The resting-state or persistent background activity, previously dismissed as background noise, has been shown to comprise coherent patterns of functional connectivity and appears to play a critical role in mediating complex functions such as memory, language, speech and emotional states (Raichle et al., 2001; Raichle and Mintun, 2006). There has been considerable progress in mapping out the key resting-state functional brain networks as well as tracking how they change over development. These functional connectivity studies indicate that the resting-state brain networks are sparsely connected in childhood (Fair et al., 2008) and evolve towards increased connectivity in adolescence (Smit et al., 2012). However, a more complete description remains elusive. For one, very little is known about how information flows within these networks, and how these flow patterns change with maturation.
Several different approaches are in use for quantifying the brain’s effective connectivity. Structural approaches such as Structural Equation Modeling (SEM) (McLntosh and Gonzalez-Lima, 1994) and Dynamic Causal Modeling (Friston et al., 2003) involve a neuranatomical model of the brain and a connectivity model. Other measures are data-driven and involve a statistical model, such as Granger-causality-based methods (Kamiński et al., 2001; Hesse et al., 2003; Roebroeck et al., 2005; Ding et al., 2006; Bressler and Seth, 2011; Seth et al., 2015). A different data-driven approach involves information theoretic measures, like transfer entropy (Schreiber, 2000; Vicente et al., 2011) and partial directed coherence (Baccalá and Sameshima, 2001). Each approach has its advantages and disadvantages [see (Lindquist, 2008; Liu and Aviyente, 2012) and the Discussion section below] in terms of the assumptions involved and the computational effort required. The fact that all methods currently used make assumptions the validity of which has not been fully tested, leaves room for introducing new measures of effective connectivity (Lindquist, 2008). This motivates the investigation of new measures of directed connectivity.
We have two goals in this paper. Our first goal is to advocate a new measure of data-driven effective brain connectivity by applying the novel concept of information flow rate to EEG signals. This goal is motivated by the need to define measures of connectivity that are based on fewer or more suitable model assumptions than commonly used methods (Lindquist, 2008). The information flow rate has several desirable properties (as summarized below and elaborated in the Discussion section) which give it unique advantages for connectivity analysis compared with standard methods. To the best of our knowledge, our study is the first to apply the information flow rate to neuroscience data. Our second goal is to analyze EEG resting-state data from a group of healthy adolescents using the information flow rate, in order to identify connectivity patterns in the adolescent brain. Only one prior study that focuses on this age group is available in the literature, and the connectivity analysis in that study is carried out in sensor space (Marshall et al., 2014).
The information flow rate was developed by Liang using the concept of information entropy and the theory of dynamical systems (Liang, 2008, 2013b, 2014, 2015) and based on earlier work with Kleeman (Liang and Kleeman, 2005). While the the initial formulation of the information flow rate was derived for two-dimensional (bivariate) systems, Liang (2016, 2018) recently showed that the formulation is also valid for N-dimensional systems as well. The Liang-Kleeman coefficient can measure the transfer of information between time series at different locations and thus between different brain regions. Unlike empirical measures of causality, e.g., transfer entropy and Granger causality, the information flow rate is derived from general, first-principles equations for the time evolution of stochastic dynamical systems (Liang, 2016, 2018). Owing to its definition, which involves only the time series and their temporal derivatives (or their finite-difference approximations for discretely sampled systems), the information flow rate has computational advantages over other entropy-based measures such as transfer entropy, that require the estimation of additional information (e.g., conditional probabilities) from the data. In addition, the information flow rate concept does not require stationarity (Liang, 2015) or a specific model structure, and can also be applied to deterministic nonlinear systems (Liang, 2016). These are important advantages, since the EEG signals exhibit non-stationary features evidenced in transitions between quasi-stationary periods and nonlinear dynamic behavior (Blanco et al., 1995; Kaplan et al., 2005; Klonowski, 2009).
Results
We set out to investigate patterns of resting-state effective connectivity in the brain of adolescent males, using source-reconstructed EEG signals (see Materials and methods). Our analysis of connectivity is based on the Liang-Kleeman information flow rate described in Box 1. The information flow rate measures the effect of a time series i, called transmitter, on a different time series j, called receiver. The indices i and j correspond to different brain source locations. In particular, we use a normalized version of the information flow rate which is better suited for ranking pair-wise information flow rates for an ensemble of Ns time series based on their relative impact on the receiver time series (see Materials and methods for details). Herein Ns = 15 corresponds to the numbers of source locations obtained by source-space reconstruction. The brief comment in Box 2 provides an intuitive understanding of causal relations in terms of the information flow rate.
The Liang-Kleeman information flow rate
Let denote a collection of Ns time series at different brain source locations indexed by i = 1, …, Ns. Herein, the term “time series” implies EEG-derived series of current dipole moments.
The Liang-Kleeman coefficient Ti→j measures the rate of information flow from the time series i to the time series j (where j ≠ i). Ti→j can be expressed in terms of sample statistics as follows (Liang, 2014)
In the above, the linear (Pearson) sample correlation coefficient between the time series pi and pj is defined by where is the sample cross-covariance of the series pi and pj, and is the sample standard deviation of the series pi (i = 1, …, Ns). Both and (often used to measure functional connectivity) are non-directional and symmetric under the index interchange i ⇄ j. The sample cross-covariance is defined by where the “overline” denotes the sample time average, i.e., and . If i = j the above equation returns the variance of pi, i.e., .
The cross-correlation coefficients , where i, j = 1, …, Ns, in Equation 1 involve the time series pi and the temporal derivative of the time series pj. These coefficients are expressed in terms of the respective covariances as follows where is the sample covariance of the time series pi and the first derivative, , of the series pj. Due to the discrete nature of sampling, the first derivative is unknown a priori. Hence, a finite difference approximation based on the Euler forward scheme, with a time step equal to kΔt, is used, i.e.,
The differencing orders k = 1 and k = 2 are the two most common choices (Liang, 2013a) which we also consider herein.
Herein we refer to pi as the transmitter series and to pj as the receiver series with respect to Ti→j. We adopt the term transmitter instead of “source” for the series that “sends” information in order to avoid confusion, since all the time series represent current dipole moments obtained from scalp EEG by means of source reconstruction.
Causality and the Liang-Kleeman coefficient
Consider two time series pi and pj where i, j = 1,…,Ns and j ≠ i. According to the Liang-Kleeman formalism which is based on the notion of information entropy, the series pj has a causal effect on pi if the rate of change of pi depends on pj. Conversely, pi has a causal effect on pj if the rate of change of pj depends on pi. Hence, the following four possibilities arise:
Neither pi influences pj, nor pj influences pi: Ti→j = Tj→i = 0:
Only pi influences pj, but pj does not influence pi: Ti→j ≠ 0, Tj→i = 0:
Only pj influences pi, but pi does not pi influence pj: Ti→j = 0; Tj→i ≠ 0:
Both pi and pj influence each other: Ti→j ≠ 0, Tj→i ≠ 0.
Ti→j does not have a physical meaning and is thus undefined.
To quantify effective brain connectivity, we use the normalized information flow rate ri→j(defined by Equation 9 in Material and methods). The time series that we analyze involve the magnitudes of fifteen current dipole moments per individual. These are obtained by means of source reconstruction of scalp EEG signals as described in Materials and Methods. We focus on the normalized inter-dipole information flow rate instead of the non-normalized Ti→j, because we aim to capture interactions between brain regions that significantly affect the receiver region (denoted by the index j). The advantage of τi→j is its ability to measure the relative importance of causal relations (Liang, 2015).
We use the second-neighbor differencing scheme (i.e., k = 2, see Box 1) to calculate the information flow rates as suggested by Liang (2014). We further comment on this choice in Materials and methods (section on the impact of differencing scheme).
Our analysis focuses on the mean information flow rate calculated over all the individuals in the study cohort, but we also explore variations of connectivity between individuals.
Brain connectivity based on mean information flow rate
To study the information flow across brain regions we want to characterize connections that exhibit significant levels of activity (as measured by the information flow rate) over all the individuals. We do this using the ensemble mean of the normalized information flow rate i → j evaluated over the cohort of L = 32 individuals:
The top panel in Figure 1 displays the patterns of the mean information flow rate . The variable for all values of transmitter i and receiver j source locations is represented by an Ns × Ns matrix that represents all possible (i.e., 210) connections between sources. The number of possible connections is Ns × (Ns − 1) where Ns = 15 is the number of source dipoles. The value of a grid cell (L1, L2), determined by the label L1 on the vertical axis and the label L2 on the horizontal axis, represents information flow from dipole L1 to dipole L2. The matrix cells are colored according to the value of : the values increase as the color changes from blue to red. The cells along the main diagonal are not colored, indicating that the information flow rate from i → j is only defined if i ≠ j. The color pattern (thus, also the matrix ) is asymmetric along the main diagonal. This asymmetry reflects the directionality of the information flow rate, i.e., the fact that τi→j is in general different than τj→i.
A relevant question for interpreting the results is how many of the 210 connections (represented by the off-diagonal matrix cells) shown in Figure 1 are important. As we discuss in Material and methods, it can be shown by permutation testing that the vast majority of the connections for all the individuals are statistically significant even at the p = 0.001 level. However, very low values of information flow rate, albeit statistically significant, imply that the relative impact of the transmitter series on the receiver is not neurologically important. On the other hand, there is no golden rule for selecting a threshold value above which connections are considered important (Cohen, 2014). Hereafter, we will consider that a connection i → j between two dipoles is active in the ensemble sense if the magnitude of the normalized information flow rate |τi→j| exceeds the arbitrary threshold of τc = 0.05. This means that the entropic rate of change at the receiver j due to its interaction with the transmitter located at i is at least 5% of the total rate of entropy change at j. (We further comment on the selection τc = 0.05 in connection with Figure 3 below.) The bottom panel of Figure 1 shows the mean information flow rate for the connections that are active in the ensemble sense. The latter involve only connections such that . As evidenced in this plot, 92 out of the 210 inter-dipole pairs are connected on average, i.e., they exhibit .
The top thirty (30) active connections, ranked on the basis of , are listed in Table 1 and displayed by arrows on an axial view schematic in Figure 2. All thirty connections correspond to positive values of in the interval between 0.116 (highest) and 0.072 (lowest). All of them have values higher than the threshold τc = 0.05. Evaluating the thirty top connections (cf. Figure 2), the overall information flow pattern is predominantly left lateralized and consists of mostly short and medium range bidirectional connections linking the frontal, central and temporal regions of the brain. The possible neurological insights derived from Table 1 and Figure 2 are developed in the Discussion section.
The last column of Table 1 displays the polarization P (τi→j) of the information flow rate. This ensemble measure is given by the average sign of τi→j expressed as a percentage, i.e., where sgn(⋅) is the sign function defined by sgn(x) = 1, if x > 0, sgn(x) = −1, if x < 0 and sgn(x) = 0, if x = 0. The polarization is a number close to ±100% if the sign of the τi→j is typically the same for all the individuals (if all the signs are the same the polarization is 100%). In Table 1 the polarization varies between ≈ 88% and 100% and is less than 100% only for eight connections. This means that for the vast majority of connections, the variations between individuals affect the magnitude but not the sign of the normalized information flow rate.
A different way to view the relation between the ensemble mean and the information flow rates of individuals is by counting for how many individuals each connection is active. Hereafter, we will consider that a connection i → j between two dipoles is individually active if the magnitude of the normalized information flow rate |τi→j| exceeds the threshold τc, which means that the percentage of the total entropy rate of the receiver due to its interaction with the transmitter is at least 5%. We assume that the threshold for individually active connections is the same as the threshold used for the ensemble mean of the information flow rate. However, this is not necessary in general.
We define the frequency of activity, ni→j(τc), for the connection i → j as the number of individuals in the study cohort for which the specific connection is active. Hence, where θ(⋅) is the unit step function, i.e., θ(x) = 1, for x ≥ 0 and θ(x) = 0 for x < 0. The frequency of activity is evaluated over all the individuals and takes values integer ni→j(τc) ∈ {0, 1, …, L}, where L = 32 is the number of individuals in the cohort. The frequency of activity depends on τc, as higher values of τc imply a smaller number of active connections.
In Figure 3, we explore the correlation between the ensemble mean and the number of individually active connections ni→j(τc). The scatter plot shows an almost linear dependence between the number of individually active connections and the respective value of for values of and appears to level off at higher values of . At the same time, the scatter also increases towards these higher values. We can also use this plot as a guide for selecting a suitable threshold for ensemble-based connectivity analysis, since it reveals how the threshold imposed on (shown as a vertical red line in Figure 3) affects the number of active connections, i.e., the number of markers to the right of the vertical line at τc. However, note that the frequency of the connections in individuals (i.e., the values on the vertical axis) will change if a different threshold is used to estimate individual activity. Essentially, the plot would need to be redrawn for different values of individual τc.
Information flow rate patterns per individual
To study the information flow across brain regions in individuals, we focus on the individually active source dipole pairs. As stated above, these are dipole pairs with τi→j whose magnitude (absolute value) exceeds the threshold τc = 0.05. We use the criterion |τi→j| > τc instead of τi→j > τc since there are a few pairs (nine out of a total of 6720) with values of τi→j < −0.05.
For each of the 32 individuals in the study, we calculate 210 values of normalized inter-dipole information flow rates τi→j. To calculate τi→j, we use all the time points in the EEG time series. The τi→j values over all individuals range from −0.0794 to 0.3568. The matrix of the τi→j values for each individual is depicted in Figure 4-Figure 7. Each plot corresponds to a single individual and shows an Ns × Ns square grid that represents all the possible connections between sources. The value of each grid cell in Figure 1 is equal to the average (evaluated over all the individuals) of the values of the respective grid cells in Figure 4-Figure 7.
These plots display the values of τi→j for all source dipole pairs, regardless of whether the connections are active with respect to the threshold τc or not. All the plots use a unified colormap based on the full τi→j range, i.e., [−0.08, 0.36] calculated over all dipole pairs and individuals. We note that the τi→j values are directional. For example, in the second plot (top right) of Figure 4, the cell labeled (TAL, CL) — near the bottom right of the grid — is colored red, which reflects a large value of τi→j, while the cell marked by (CL, TAL) — above the main diagonal of the grid — has a much lower τi→j. This indicates that the information flow from TAL has much higher impact on CL than the impact of CL on TAL.
The maximum τi→j observed among individuals is ≈ 0.36. This is about three times higher than the highest ensemble mean which is equal to 0.116 (cf. Table 1). This difference reflects the variability of the information flow rate values between individuals.
Effective connectivity variations between individuals
To investigate the variability of the connectivity patterns between individuals, in Figure 8 we plot the frequency of activity, ni→j(τc), defined in Equation 7, for all i ≠ j = 1, …, 15 and for τc = 0.05. The main features evidenced in this plot are as follows:
Almost all the possible (208 out of 210) transmitter i → receiver j inter-dipole connections are active in at least one individual.
Of the 210 × 32 = 6720 pairs of inter-dipole connections that are available in total in the cohort of 32 individuals, only 2821 connections, or about 42% of the total number, are individually active (i.e., their magnitude is not less than τc = 0.05). This means that more than half of the current source dipole pairs in the study cohort are not strongly connected. In these pairs, the transmitter dipole does not strongly affect the receiver dipole.
In light of (1) and (2), we conclude that the active connections vary to some extent between individuals. For example, if the same set of about 42% connections were active for all (32) the individuals in the cohort, approximately 88 (i.e., 42% of 210) dipole pairs would be active. However, more than twice as many (i.e., 208) show active connections. In particular, 142 inter-dipole connections are active in ten or more individuals, forty inter-dipole connections (i.e., about 20% of the total connections) are active in twenty or more individuals, and twelve are active in more than 25 individuals.
The connectivity map in Figure 8 exhibits a denser network of connections than the respective map in the bottom plot of Figure 1. The former shows the number of individuals for which a particular inter-dipole connection is individually active. Hence, it includes connections that are active in single individuals. On the other hand, the bottom plot in Figure 1 displays the number of connections that are active on average, which is understandably smaller given the inter-subject variability. It is noteworthy that the five connections with the highest mean τi→j, i.e., FL→TAL, FL→FpM, TAL→CL, FM→FR, FL→FM (cf. Figs. Figure 1 and Figure 2, and Table 1), have relatively high frequencies of activity ni→j(0.05) = 27, 26, 21, 26, 29 respectively. In other words, these connections are active in most of the individuals (cf. Figure 8).
Discussion
In this section we first discuss methodological aspects that are related to the information flow rate as well as its relation and differences with other connectivity measures. Then, we analyze the results that we obtained in this study in the context of the existing literature results on effective brain connectivity, focusing on the resting state of the adolescent brain.
Brain connectivity measures and information flow
Functional measures of connectivity estimate non-directional relations and thus lead to undirected brain networks that fail to capture how one brain region influences another. However, such measures are still in common use (Mill et al., 2017). The simplest measure of functional connectivity is Pearson’s linear correlation coefficient (Cohen, 2014). Pearson’s coefficient fails to satisfactorily capture nonlinear dependence. Mutual information is a measure of functional connectivity that is based on information theory and can detect both linear and nonlinear relations (Salvador et al., 2010). Its calculation, however, requires the univariate probability distribution of each individual EEG time series, as well as the bivariate (joint) distribution for each pair of time series. Since long time series are required to estimate the bivariate distribution, the application of mutual information can be computationally intensive. Moreover, the method is sensitive to the number of bins used to estimate the probability histograms, and it fails to distinguish between nonlinear and linear, or positive and negative relations (Cohen, 2014).
On the other hand, measures of effective connectivity are directional variables which can distinguish the direction of information flow between brain regions. Measures of effective connectivity, such as Granger causality (Kamiński et al., 2001; Hesse et al., 2003; Bressler and Seth, 2011; Seth et al., 2015) and transfer entropy (Schreiber, 2000; Liu and Aviyente, 2012; Salvador et al., 2010; Shovon et al., 2014; Hillebrand et al., 2016), have been applied to EEG data to identify patterns of information flow in the functional brain networks during cognitive activity. Recently, Muthuraman et al. (2015) applied renormalized partial directed coherence, a measure based on the principle of Granger causality, to the combination of EEG and magnetoencephalography (MEG) signals to identify the direction of information flow between two signals and ultimately characterize the functional and effective connectivity in resting-state brain connectivity patterns. Thus, effective connectivity measures offer insights into the dynamics of the neuronal clusters that underpin cognitive function. Graphical models provide an intuitive tool for analyzing and visualizing associations and causal relationships and for modelling functional connectivity between brain regions (Li and Wang, 2009).
Granger causality analysis is based on the assumptions that (1) the time series are stationary, (2) interaction between the series can be described by means of a linear relation (typically a multivariate autoregressive model), (3) a specific model order can be defined, which determines how far in the past the coupling between two series extends, and (4) the innovation process of the linear model is described by Gaussian white noise (Seth, 2007; Liu and Aviyente, 2012; Cohen, 2014). This “plain vanilla” variety of Granger causality fails to detect nonlinear causal links (Liu and Aviyente, 2012; Lin et al., 2017). In such cases, nonlinear extensions of Granger causality are necessary (Chen et al., 2004; Marinazzo et al., 2011). However, such approaches are not yet conclusive since the selection of the degree of model nonlinearity and overfitting remain open issues (Marinazzo et al., 2011).
Transfer entropy is an extension of the concept of mutual information. It is based on the notion of relative entropy (also known as Kullback–Leibler divergence) and measures the difference between two probability distributions. For linear autoregressive systems driven by Gaussian white noise, Granger causality has been shown to be equivalent to transfer entropy (Barnett et al., 2009; Liu and Aviyente, 2012). Hence, the latter can be viewed as an extension of the former that can handle the dependence of non-Gaussian time series. Comparisons between Granger causality and transfer entropy are given in (Bressler and Seth, 2011; Liu and Aviyente, 2012). As stated above, Granger causality requires the specification of the order of the autoregressive processes involved. This model order, however, may depend on a number of variables including the conditions, the tasks executed (for task-oriented studies), and the EEG time series segments analyzed (Cohen, 2014). Transfer entropy makes fewer assumptions about the data than the standard Granger causality approach (Vicente et al., 2011). There are, nonetheless, challenges related to the calculation of transfer entropy, e.g., estimation by state-space partitioning, as discussed by Bressler and Seth (2011) and Liang (2014).
Entropy and information content are key concepts in the definition of functional and effective brain connectivity measures (Cohen, 2014). In the thermodynamic sense, entropy is associated with disorder: a higher temperature implies higher entropy. In classical (as opposed to quantum mechanical) thermodynamics, the entropy S is calculated by means of the Gibbs formula S = −kB ∑i pi ln pi, where the summation is over the probabilities pi of the system’s microstates (the index i should not to be confused with the location index of current source dipoles) and kB is Boltzmann’s constant. In information theory, the entropy of a system with N states is defined in terms of Shannon’s formula , where pi, i = 1, …, N is the probability of the state indexed by i. If the natural logarithm is used in the definition (as was done above), Shannon entropy is measured in terms of natural information units (nats).
The Shannon entropy quantifies the unpredictability (uncertainty) of a stochastic system. High entropy implies that the result of a measurement is not only a priori unpredictable, but that the measurement itself provides new information which improves our knowledge of the system. On the other hand, low entropy means that the extant knowledge of the system allows us to predict quite well the outcome of the measurement and consequently, the measurement does not contain significant new information. Hence, higher entropy implies a higher level of unpredictability, while lower entropy implies that efficient, “compressed” representations are possible (i.e., a parameter set of lower dimensionality can be used to represent the system). In complex systems, there are interactions between different components. We can intuitively view information flow from component X to component Y as the amount of the uncertainty of Y that is resolved by the past states of X. If the past states of the component X do not affect the current state of Y, there is no information flow from X to Y (Bossomaier et al., 2016). On the other hand, the past states of X may reduce or improve the predictability of Y, thus implying information flow from X to Y. Currently used measures of functional and effective brain connectivity are based on the concept of the absolute Shannon entropy. The concept of Shannon entropy has been generalized to dynamical systems that are not necessarily stochastic, by means of the Kolmogorov-Sinai entropy (Gutzwiller, 1990) which quantifies the unpredictability of future states of the system.
The Liang-Kleeman information flow rate is a recently developed measure which is also based on the concept of Shannon entropy (Liang, 2008, 2013b,a, 2014). However, the information flow formalism can be derived using either absolute or relative entropy. In two dimensions (i.e., for a system of two time series) this was shown by Liang (2013b, 2014). Relative entropy (Kullback–Leibler divergence) measures how much information is added to a given system with respect to the information contained in the initial probability distribution. Recently, Liang (2018) has shown that the relative entropy formulation of the information flow rate is also valid for stochastic dynamical systems with N > 2 dimensions (i.e., systems involving N potentially coupled time series, where N is an arbitrary integer value).
The information flow rate formulation is based on the theory of dynamical systems, in contrast with transfer entropy which is a statistically motivated measure of information transfer. The information flow rate aims to address the computational shortcomings of transfer entropy (requirement for long time series, computational complexity, estimation of bivariate probability distribution) as well as spurious causal associations (Liang, 2016). The information flow rate provides an easy-to-compute directional (asymmetric) measure of dependence between pairs of time series that can be evaluated from a single realization of each series and does not require the estimation of transition probabilities. Unlike Granger causality, the information flow rate concept does not require a specific model structure, Gaussian statistics, or stationarity (Liang, 2015) and can also be applied to deterministic nonlinear systems (Liang, 2016). These could be important advantages of information flow, since the EEG signals exhibit non-stationary features evidenced in transitions between quasi-stationary periods and nonlinear dynamic behavior (Blanco et al., 1995; Kaplan et al., 2005; Klonowski, 2009), while the correct model structure is never known a priori. However, further study is needed to compare in detail the performance of the information flow rate against the standard methods of assessing brain connectivity.
Finally, we draw attention to an ongoing discussion in the literature regarding the very definition of effective connectivity, e.g. (Lindquist, 2008). Friston argues that effective connectivity should be based on dynamic models, such as the Dynamic Causal Models (DCMs) (Friston, 2011). Model-based connectivity methods assume a well-defined biophysical model of neuronal dynamics (Sakkalis, 2011). Friston also opines that data-driven models, such as Granger causality, provide functional connectivity measures, a view that is echoed by Bastos and Schoffelen (2015). On the other hand, several other publications referenced in this paper, including the reviews (Sakkalis, 2011; Bastos and Schoffelen, 2015), refer to causality-based methods, including data-driven methods such as Granger causality and transfer entropy, as effective connectivity measures. We follow the latter viewpoint, according to which the information flow rate is an effective connectivity measure.
Comparison of brain connectivity results with literature
In recent years, a number of advances facilitating the study of functional connectivity of the brain have thoroughly transformed our understanding of the activity present in the brain in absence of “any imposed stimuli, task performance or other behaviourally salient events” [for a review, see (Snyder and Raichle, 2012)]. This “resting state” of the brain is characterized by spontaneous, coherent fluctuations of blood-oxygen-level-dependent (BOLD) as well as electromagnetic signals from functionally distinct brain regions. fMRI studies were the first to show that subsets of these regions tend to act in concert, giving rise to functionally relevant “resting-state” brain networks (Raichle et al., 2001; Greicius et al., 2009) that provide a basis for information processing and coordinated activity. More recently, Yuan et al. (2016) and Liu et al. (2017) have found that functional resting-state networks can also be extracted from source-space EEG data, and Hillebrand et al. (2016) have done the same using MEG data. The most commonly reported resting-state functional networks observed in children (Muetzel et al., 2016), adolescents (Borich et al., 2015) and adults (Yuan et al., 2016; Liu et al., 2017) (and references therein) include the visual, the fronto-parietal, the sensory motor and the default mode network (DMN). These studies also highlight that the above resting-state functional networks are not independent, and that there is a high degree of interconnections between them. To date, however, very few studies have investigated the information flow between the different networks.
To our knowledge, there are only three studies that have investigated the source space information flow pathways in adults (age: over 20 years) during eyes closed resting state, and considered the relationship between these pathways and the underlying functional networks: (1) Michels et al. (2013) study EEG data using the partial directed coherence (PDC) measure, which is based on Granger causality, to quantify effective connectivity; (2) Muthuraman et al. (2015) analyze both EEG and MEG data, also by means of the partial directed coherence (PDC) measure; and (3) Hillebrand et al. (2016) study MEG recordings using directed phase transfer entropy (dPTE) to assess effective connectivity. All three studies find that the dominant pattern in adults is a posterior to anterior flow, originating in the regions associated with the primary visual cortex and the posterior DMN, and flowing to the frontal regions. Michels et al. (2013) and Muthuraman et al. (2015) observe only one-way connectivity between the brain regions. However, Hillebrand et al. (2016) find that the dominant patterns are complemented by weaker anterior to posterior connections which make the flow bidirectional at finer connection strength resolution.
Only Michels et al. (2013) have investigated source-space resting-state directed connectivity in children (mean age: 10 years). They find that the dominant flow pattern is opposite to that observed in the adults, with activation originating in the anterior (i.e. pre-frontal) regions and terminating in the posterior (parietal/occipital) regions. One possible explanation is that the anterior to posterior flow in children indicates modulation of lower-order sensory-motor information from frontal regions (Emberson et al., 2015; Taylor and Khan, 2000). Admittedly, there are obvious gaps in our understanding of the resting-state dynamics over the course of development. More studies of the resting-state dynamics in children, as well as detailed comparisons with other populations at different stages of development are needed to fully contextualize these findings.
The present study provides the first critical step towards understanding information flow in the brain during a key transition stage between childhood and adulthood. We have analyzed resting-state EEG data from an intermediate population, a cohort of adolescents (mean age: 16 years). Using the Liang-Kleeman information flow rate as a measure of effective brain connectivity, we find that of the 30 active connections in adolescent brains (based on the ensemble means of the normalized information flow rate), the five strongest (cf. red arrows in Figure 2a) mostly originate in the left frontal region of the brain and flow to left temporal and mid-frontal regions. Including the next ten connections (cf. yellow and dark green arrows in Figs. Figure 2b and Figure 2c) extends the active areas of the brain beyond the frontal region to encompass adjacent posterior regions (i.e., central and temporal), with the information flow pattern becoming largely bidirectional but still strongly left lateralized. The final fifteen connections (cf. light green arrows in Figure 2d) are characterized by information flows between mainly the left and mid anterior regions. They also show some slightly lower level activity on the right side of the brain, an indication of inter-hemispheric flow between the left and right frontal regions, and the emergence of connections in the posterior regions (i.e. parietal to occipital). Overall, the information flow pattern suggested by the thirty connections is highly left lateralized and comprises mostly short and medium range bidirectional connections that link the frontal, central and temporal regions of the brain.
The above results are reminiscent of the basic directed connectivity pattern observed by Michels et al. (2013) in young children but with one important difference. In early adolescence, the pattern of information flow manifests an additional layer of complexity indicated by bidirectional communication between brain regions that Hillebrand et al. (2016) observe in the adults and which they interpret as feedback loops. In effect, the pattern that we observe in our cohort suggests a progression towards maturation of the adolescent brain.
Similarly, the lateralization of the information flow we observe is also a reflection of an earlier developmental stage. Agcaoglu et al. (2015) studied individuals ranging from 12 to 71 years and observed that the resting-state networks of young individuals are highly lateralized, with the default mode network, attention and frontal networks being strongly left lateralized. With age, however, this lateralization decreases and the network becomes more symmetric. In fact, the degree of interaction between networks, the order in which the networks are activated, the organization and the strength of the interactions within individual networks (including the extent to which they are lateralized), all change over development (Muetzel et al., 2016).
The fact that both functional and effective connectivity changes as the brain matures is not entirely surprising. It is well known that the brain undergoes considerable structural changes during the transition from puberty to adulthood (Shaw et al., 2008) as manifested by significant increase (decrease) in the volume of white (grey) matter (Gogtay et al., 2004; Paus, 2005; Toga et al., 2006; Lebel and Beaulieu, 2011). For example, Lebel and Beaulieu (2011) have shown that while the maturation of the projection fibers linking the primary sensorimotor cortical regions with lower-order subcortical sensory areas and the commissural fibers connecting the two hemispheres of the brain is mostly complete by late adolescence, the maturation of the association tracts, particularly the superior longitudinal and fronto-occipital fasciculi that connect the occipital and the frontal regions of the brain, continues well into the twenties. Functionally, these long association fibers are correlated with increasing long-range EEG coherence and synchronization (Miskovic et al., 2015).
Finally, we have also identified significant variability of effective connectivity between individuals based on the patterns of information flow rate between brain regions. We have presented and discussed graphical tools for visualizing and characterizing variability between individuals including dipole-dipole connectivity plots that account for all the individuals in the cohort, e.g., Figure 8. The variability of the brain’s resting-state functional and effective connectivity across individuals and over time are topics of considerable interest within both research and clinical settings. Hutchison et al. (2013) and Hirayama et al. (2016) (see also references therein) argue that the variability of the connectivity matrix between individuals is not due to noise but is associated with individual variances in mental/vigilance states and cognitive function. They also note that there are reports of the temporal dynamics of the connectivity matrix being affected by brain health, which raises the exciting possibility that, in the future, the associated features could serve as disease/injury biomarkers. The significant advantages of the new data-driven measure of effective brain connectivity discussed in this paper (i.e., ease of calculation, sensitivity to both linear and nonlinear relations, independence from a specific model structure and the stationarity assumptions), make it especially well suited for exploring these exciting new directions.
Materials and Methods
In this section we briefly describe the EEG dataset. We then present the Liang-Kleeman directional information flow rate that will be used for the analysis of resting-state EEG brain connectivity. We also discuss how to numerically calculate and evaluate the statistical significance of the information flow rate obtained from the EEG data.
Ethics Statement
This study was approved by the University of British Columbia Clinical Research Ethics Board (Approval number: H17-02973). The adolescents’ parents gave written informed consent for their children’s participation under the approval of the ethics committee of the University of British Columbia and in accordance with the Helsinki declaration. All participants provided assent.
Participants
Thirty-two (32) right-handed male adolescents (mean age: 15.8 yrs; SD: ±1.3) participated in this study. Exclusion criteria for all individuals included focal neurologic deficits, pathology and/or those on prescription medications for neurological or psychiatric conditions. Parents signed an informed consent form that was approved by the University of British Columbia and all participants provided assent.
Description of EEG data
Between 5−8 minutes of resting-state EEG data were collected while participants had their eyes closed, using a 64-channel Hydrogel Geodesic SensorNet (EGI, Eugene, OR) connected to a Net Amps 300 amplifier (Virji-Babul et al., 2014). The sensor-space signals were referenced to the vertex (Cz) and recorded at a sampling rate of fs = 250 Hz. The scalp electrode impedance values were typically less than 50 kΩ. To eliminate artifacts associated with attaching (removing) the cap, 750 data points were removed from the beginning (end) of each time series. (This corresponds to removing data with a total duration of 6s.) The EEG time series were then filtered using a band-pass filter (4–50 Hz) and a notch filter (60 Hz), as described in (Porter et al., 2017) [see also (Rotem-Kohavi et al., 2014, 2017)], to remove signal drift and line noise. In addition, Independent Component Analysis (ICA) was used to identify, decompose and remove eye blinks. Finally, the data were visually inspected and epochs with motion as well as additional ocular artifacts were excluded, as were channels with excessive noise. Each of the resulting EEG series used in this study involves between 67,845 and 114,304 time points.
Next, we used the Brain Electrical Analysis (BESA) Version 6.3 software1 (MEGIS Software GmbH, Gräfelfing, Germany) to map the cleaned sensor-space data to source waveforms. The voltages from the available sensor channels were first interpolated to voltages at 81 predefined scalp locations that comprise BESA’s Standard-81 10-10 Virtual Montage (BESA Wiki, 2018) and re-referenced to the average reference by subtracting the mean voltage of the full set of 81 virtual scalp electrodes. BESA uses spherical splines interpolation to perform this mapping (Perrin et al., 1989; Scherg et al., 2002). The interpolation offers a consistent way of dealing with occasional bad channels while maintaining a common montage across all the individuals. Thereafter, we use the BESA montage method (Scherg et al., 2002) to compute source waveforms. Since resting-state activity is not localized, we used the BR_Brain Regions montage which is derived from 15 pre-defined regional sources that are symmetrically distributed over the entire brain. The respective brain regions involved in this montage are listed in Table 2 and shown in Figure 2. BESA uses a linear inverse operator of the lead field matrix, which accounts for the topography of the sources included in the BR_Brain Regions montage, to calculate the source waveforms (Scherg et al., 2002). The composite source activity in each brain region is represented by a single regional source. Each source is modeled as a current dipole whose moment is specified in terms of a local orthogonal coordinate system with basis vectors commonly labelled as radial (r), horizontal (h), and vertical (v). Thus, the source waveforms represent time series of the fifteen current dipoles. Finally, the resulting data were exported to MATLAB for the analysis described below.
Definition of inter-dipole information flow rate
In the following, will denote the time series quantifying the time-varying strength (magnitude) of the current dipole moment at the source location i (where i = 1, …, Ns = 15) for the individual indexed by l (where l = 1, …, L = 32), at time tn = n Δt, where n = 1, …, N is the time index and Δt = 4ms is the time step. In terms of the dipole moment components in the local (r, v, h) system, the magnitude of the dipole moment is given by
For completeness, we note that in general both the strength and the orientation of the current dipoles vary with time; however, in the present study, we track only their strength. In addition, we drop the individual index i if there is no risk of confusion. For short, we will also write pi,n = pi(tn).
Unlike the Pearson correlation coefficient which satisfies due to Schwartz’s inequality, the magnitude of the coefficient is not constrained to be less than one. This is due to the normalization of by the standard deviation of pj instead of the standard deviation of the temporal derivative (cf. Equation 3).
Based on the discussion of Shannon entropy (cf. Discussion section), a positive (negative) rate of information flow from i → j (Ti→j) indicates that the interaction between the two series leads to an increase (decrease) in the entropy of the series pj. Equivalently, it signifies that the receiver series becomes more (less) unpredictable due to its interaction with the transmitter series. The predictability of each time series is negatively correlated with the entropy.
While the information flow rate coefficients Ti→j were initially formulated for bi-variate systems that involve two interacting time series, Liang has recently proved theoretically that the equations above are also valid for N-variate, deterministic or stochastic systems (Liang, 2016, 2018). In addition, even though the estimator of Ti→j has been derived using the assumption of a linear system, it has been successfully applied to identify causal connections in nonlinear systems as well (Liang, 2014, 2016).
Normalized information flow rate
The information flow rate is based on the notion of information entropy. A positive T2→1 implies that the transmitter series p2 increases the entropy of the receiver p1, while a negative T2→1 implies the opposite. By comparing Ti→j with Tj→i(in the latter the roles of transmitter and receiver are reversed), we can determine which series transfers more information to the other series. However, this comparison does not reveal which of the two series is affected more due to its interaction with the other, because the coefficient does not account for the entropy change of each series due to the intrinsic evolution and possible stochastic effects. In order to quantify the impact of the entropy transferred to the receiver from a transmitter series, we need to know the extent to which the information transfer affects the predictability of the receiver, relative to all the other influences acting on the receiver.
The total rate of entropy change of pj (receiver) depends not only on the information flow from pi(transmitter), which is determined by the rate Ti→j, but also on and . The term (intrinsic entropy rate) represents the entropy rate of change due to the change of the phase space in the direction pj. The term (noise-induced entropy rate) represents the impact of stochastic effects in the dynamical system that underlies the evolution of pj (Liang, 2008). Hence, as proposed by Liang (2015), a suitable normalization factor for the information flow rate from pi to pj is derived by adding the absolute values of the three rates that contribute to the total rate of entropy change of the receiver pj, i.e.,
Based on Equation 8, Zi→j is a non-negative number bounded from below by |Ti→j|. In addition, Zi→j cannot be zero unless the rate of change of the intrinsic and stochastic entropy components are zero. This can only happen if pj is constant in time, which is not relevant for the EEG time series. Thus, the normalized information flow rate from the transmitter pi to the receiver pj is defined as (Liang, 2015)
According to Equation 9, τi→j measures the percentage of the total entropy rate of change for pj which is due to its interaction with pi. The calculation of the terms which contribute to Zi→j from the data is explained in Appendix 1.
Based on the above analysis, the normalized information flow rate, τi→j, has several advantages over the un-normalized coefficient, Ti→j, the most important being that (1) τi→j does not explicitly depend on the finite difference step (item 3 in Box 4), and (2) it measures the importance of information flow from the transmitter to the receiver (items 4 and 5 in Box 4). Hence, τi→j is a suitable measure for investigating patterns of information flow between different regions of the brain and therefore for assessing effective connectivity.
The main properties of the information flow rate Ti→j are as follows
In general, the correlation coefficients between are not symmetric under interchange of i and j, i.e.,. The asymmetry of with respect to the interchange of i and j introduces directionality in the information flow rate coefficients, which implies that, in general, .
For i = j, in light of both the numerator and the denominator on the right-hand side of Equation 1 become zero. Thus, Ti→i is undetermined; however, this is not an issue, because the quantities of interest are the rates of information flow between different time series.
The presence of the coefficients and in the numerator on the right-hand side of Equation 1 implies that Ti→j is proportional to the inverse of the finite difference time step, i.e., ∝ 1/kΔt, where kΔt is the time step used to calculate the time derivative (cf. Equation 4.)
The main properties of the normalized information flow rate τi→j are as follows
The coefficient τi→j is, in general, asymmetric, i.e., τi→j ≠ τj→i for i ≠ j.
The τi→j can take negative or positive values with magnitude less than one, i.e., −1 ≤ τi,j ≤ 1. Positive values of τi→j imply that the transmitter pi tends to increase the entropy of the receiver pj (i.e., it increases its uncertainty), while negative values imply that pi reduces the entropy of pj.
The τi→j does not explicitly depend on the finite difference step kΔt. This is due to the fact that both the numerator and the denominator in Equation 9 are proportional to 1/kΔt.
The τi→j measures the relative importance of the entropy change in the receiver series pj due to its interaction with the transmitter pi. The impact of pi on pj increases with the magnitude of τi→j.
The τi→j is a relative measure which quantifies the information transfer from pi to pj with respect to the endogenous and noise-induced changes of the latter. However, it cannot be used to compare the information flow rate from pi to pj with that from pj to pi. This is due to the fact that the normalization of τi→j depends on the entropy changes of pj, while the normalization of τj→i depends on the entropy changes of pi. The comparison of the reverse information flows between pi and pj should thus be based on the non-normalized coefficients Ti→j and Tj→i(Liang, 2015).
The information flow rates (normalized and non-normalized) can be calculated without requiring (i) the estimation of conditional probability distributions (ii) stationarity assumptions (iii) Gaussian distribution of the fluctuations or (iv) a specific model structure.
Non-parametric testing of normalized information flow rate
To calculate τi→j for each individual l = 1, … L, we use all the time points in the series , for the source locations i = 1, …, 15. Each series represents the strength of the current dipole moment at location i. All the time series for the same individual (indexed by l) have the same length Nl which varies between 67845 and 114304 points.
In order to infer connectivity patterns, it is necessary to know if the estimated values τi→j are statistically significant. Each estimate of an inter-dipole τi→j is a statistic, i.e., a random variable that fluctuates between samples. If the sampling distribution of the statistic is known, the significance of a particular estimate can be assessed using a suitably constructed parametric statistical test. In the case of Ti→j such a test can be constructed (Liang, 2014). For τi→j, however, the sampling distribution is not known. In this case, it is possible to apply non-parametric permutation testing in the spirit used by Lachaux et al. to quantify the significance of phase locking values (Lachaux et al., 1999; Bastos et al., 2015). The goal of non-parametric permutation testing is to determine the probability that the observed test statistic could have been realized if the null hypothesis (i.e., zero information flow) were true. This, in turn, allows us to conclude if an estimated information flow rate is statistically significant: a very small probability (p-value) implies that the observed deviation is not likely under the null hypothesis (Maris and Oostenveld, 2007; Cohen, 2014).
The test statistic that we use is the normalized information flow rate from series pi to series pj, for i ≠ j = 1, …, 15. The null hypothesis that we test is that there is no information flow between series pi and pj. We generate Ms = 1000 randomized states , where m = 1, …, Ms and n = 1, …, N, from each transmitter time series pi. Each randomized state is obtained by scrambling (by means of random permutations) the N time points of pi. The permutation destroys the temporal ordering of pi and consequently any patterns of information flow from to . Hence, the estimated τi[m]→j values based on the shuffled time series pi do not represent meaningful information flow.
The p-value of the statistic τi→j is defined as the percentage of times — calculated over Ms permutation states — that the randomized information flow rate τi[m]→j is more extreme than τi→j (i.e., larger than τi→j if τi→j > 0 and smaller than τi→j if τi→j < 0). A high p-value would indicate that the null hypothesis cannot be rejected. In contrast, a low p-value would provide support for the alternative hypothesis (i.e., that there is significant information flow from pi to pj). The observed value τi→j is then considered as statistically significant, if the respective p-value is below a specified significance level (typically 0.1%–5%).
Based on all the simulations performed for the entire cohort of 32 individuals, we find that the magnitude of the information flow rates between the randomly permuted transmitter current dipoles and the receiver current dipoles are all contained in the interval [−5.5, 5.0] × 10−4. Turning to the inter-dipole information flow rates calculated from the EEG data, we find that all except for 11 out of the 210×32=6720 dipole pairs show information flow rates outside the above interval. In fact, the majority of the normalized information flow rates are two orders or more higher in magnitude. Hence, given the size of the above confidence interval, we can conclude that most of the observed τi→j are statistically significant even at the p = 0.1% level.
The above result indicates a low-level global connectivity linking most of the brain regions in the resting state. However, small normalized information flow rates, albeit statistically significant, imply that the contribution of the respective entropy flow rate (information flow) from the transmitter dipole to the receiver dipole is very small compared to the intrinsic entropy changes in the receiver dipole. This argument motivates the introduction of an arbitrary threshold that can be used to count the more important connections.
Impact of differencing scheme on connectivity
As stated following the definition of inter-dipole information flow rate, the estimation of the first-order derivatives is based on finite differences (cf. Equation 4). The finite differencing, as shown in Equation 4, can be accomplished by means of different time steps equal to kΔt. Typically, k = 1 or k = 2 is used (Liang, 2014). We have conducted our analysis with k = 2, since this choice tends to reduce the impact of occasional large spikes (e.g., jumps) in the EEG time series on the information flow rate. Using a larger value of k results in effective smoothing of the EEG time series which encroaches on the upper end of the frequency band that is generally of interest in resting state studies. Hence, we did not consider values of k higher than two.
We have experimented with synthetic data obtained from the simulation of two coupled stochastic differential equations for which Ti→j admits explicit expressions (Liang, 2014). We used similar length N = 60000 − 80000) for the synthetic time series as that of the EEG series and a number of repetitions equal to the number of individuals in the study (L = 32). Our results show practically no difference between the mean Ti→j estimated from the time series whether k = 1 or k = 2 is used.
In the case of the source-reconstructed EEG data, we repeated the entire analysis using k = 1. This leads to fewer active connections, i.e., 1904 instead of 2821 for k = 2 shown in Figure 8. On the other hand, the correlation coefficient between the spatial distribution of the frequency of active connections ni→j(0.05) for k = 1 and for k = 2 is equal to 0.89. This indicates that the distribution of active connections is highly correlated between k = 1 and k = 2. Based on our arbitrary threshold τc = 0.05, we determine 42 active connections for the k = 1 scheme versus 92 active connections for the k = 2 scheme. In spite of the fact that fewer active connections appear for k = 1, the overall pattern of information flow, as delineated by the thirty connections with the highest , remains unchanged.
Acknowledgments
DTH acknowledges useful electronic correspondence with X. San Liang regarding the definition and interpretation of the information flow rate coefficient.
Appendix 1
Herein we show how the two entropic components involved, in addition to |Ti→j|, in the normalization term Zi→j, can be estimated from the data. Analysis based on the theory of dynamical systems leads to the following expressions for the rates of change of the entropic components and (Liang, 2008) where the entropy transfer elements pi,j, qi,j are given by the following functions of the inter-dipole covariance coefficients
Using the definition in Equation 2 for the correlation coefficient and the definition in Equation 3 for the cross-correlation coefficient, the elements pi,j and qi,j can be expressed using correlation coefficients r instead of inter-dipole covariances C as follows (for i, j = 1, …, Ns):