An Implementation of Empirical Bayesian Inference and Non-Null Bootstrapping for Threshold Selection and Power Estimation in Multiple and Single Statistical Testing

Bahman Nasseroleslami

doi:10.1101/342964

Abstract

The majority of conclusions and interpretations in quantitative sciences such as neuroscience are based on statistical tests. However, the statistical inferences commonly rely on the p-values, but not on more expressive measures such as posterior probabilities, false discovery rates (FDR) and statistical power (1 - β). The aim of this report is to make these statistical measures further accessible in single and multiple statistical testing. For multiple testing, the Empirical Bayesian Inference (Efron et al., 2001; Efron, 2007) was implemented using non-parametric test statistics (Area Under the Curve of the Receiving Operator Characteristics Curve or Spearman’s rank correlation) and Gaussian Mixture Model estimation of the probability density function of the original and bootstrapped data. For single statistical tests, the same test statistics are used to construct and estimate the null and non-null probability density functions using bootstrapping under null and non-null grouping assumptions. Simulations were used to test the reliability of the results under a wide range of conditions. The results show conformity to the real truth in the simulated conditions, which is held under various conditions imposed on the simulation data. The open-source MATLAB codes are provided and the utility of the approach has been discussed for real-world electroencephalographic signals. This implementation of Empirical Bayesian Inference and informed selection of statistical thresholds are expected to facilitate more realistic scientific deductions in versatile fields, especially in neuroscience, neural signal analysis and neuro-imaging.

1 Introduction

The majority, if not all, of the conclusions and interpretations in quantitative sciences, especially in neuroscience and neuro-imaging are based on statistical tests. While the traditional hypothesis tests based on p-values are still dominant, there has been legitimate remarks on the need for more reliable and thorough statistical procedures and practices (Nuzzo, 2014). It is therefore vital that more meaningful statistical measures be accessible for statistical inference, namely: posterior probabilities, false discovery rates (FDR), and statistical power (1 − β). These measures are especially useful in statistical inferences involving high-dimensional data in neural signal connectivity or imagining.

Empirical Bayesian Inference (EBI) has shown promise in large-scale between-group comparisons (Efron, 2007b, 2004), especially in genomics (Efron, Tibshirani, Storey, and Tusher, 2001) and to some extent in the applications of neuroelectric signal and connectivity analysis (Singh, Asoh, Takeda, and Phillips, 2015). In EBI, constant prior probabilities are estimated from the data in large-scale multi-variable inferences or hypothesis testing and these priors are subsequently used to find the posterior probabilities using the estimated probability density functions of the pooled test statistics and the null distribution. It is possible to relate the posterior probabilities to frequentist concepts such as False Discovery Rate, FDR (Benjamini and Hochberg, 1995; Benjamini, Krieger, and Yekutieli, 2006), as well as power. While the theoretical framework is adequately established, the existing numerical implementations Efron, 2007b are optimised and suitable only for specialised applications (i.e. statistical genetics, where only a small fraction of the tests are real findings), and some of the essential measures such as FDR and power are not immediately available for informed threshold selection. From a practical viewpoint, the existing software package [locfdr] in R (r_core_team_r:_2016) may requires selection of several parameters and not immediately available for neuro-electro-magnetic signal and connectivity analysis in packages such as FieldTrip oostenveld_fieldtrip:_2010 or for neuroimaging analysis in packages such as SPM. Consequently, there is the need for new implementations to facilitate the application of EBI that work in wider range of situations (e.g. small or large proportions of test variables belonging to the affected group), to more explicitly relate the posterior probabilities to FDR and power (allowing informed decision on threshold selection), and to intrinsically account for data with non-normal distributions.

Such informed selection of statistical threshold is also challenging in complex statistical inferences (e.g. with non-normal data distributions) involving single or only a few comparisons or inferences. It would be desirable to similarly select a threshold value for the test statistic that corresponds to a known combination of Type I (α) and Type II (β) errors in a single comparison.

Here, we address these needs using an implementation of EBI using non-parametric test statistics, Gaussian Mixture Models and null bootstrapping. This implementation readily handles one-sample, two-sample (between-group comparison) and correlation problems in multi-dimensional data with arbitrary distributions, which is usable for a wide range of applications. Furthermore, for threshold selection in univariate testing (in the absence of prior probabilities), the non-null distribution is estimated using a non-null bootstrapping. This approach approximates the non-null probability density functions in order to enable the threshold selection for a desired combination of α and β values, regardless of the distribution of data.

2 Methods

2.1 Empirical Bayesian Inference (EBI) for Multiple Inferences

2.1.1 EBI framework

EBI, initially used in genomic applications (Efron et al., 2001) was subsequently expanded theoretically and in terms of computational implementation (Efron, 2004, 2007a). Here, the fundamentals are briefly explained.

Suppose X_ij, i = 1…m, j = 1…N represents N variables or N-dimensional data sampled from m observations/subjects. The grouping information of data is represented by α_i, which is a binary choice (0 or 1) for a two-sample (2-group) comparison. Statistical testing, performed independently in each variables according to the grouping information (e.g. between-group comparisons), using the test statistic z_j, yields N values. The probability density function of z_j, i.e. the probability of data given the hypotheses, is denoted by: where p₀ and p₁ are the prior probabilities of the null and non-null hypotheses (p₁ = 1 − p₀) and f₀(z) and f₁(z)are the probability density functions of z under the null and non-null (grouping) assumptions, respectively. The posterior probability, i.e. the probability of the hypotheses given the data, are subsequently given by:

Comparison of the posterior probability of the non-null hypothesis P₁(z_j) against a threshold P_crit provides a Bayesian inference, as well as subsequent frequentist quantities such as local false discovery rate, fdr_loc(z) = P₀(z), Type I error α, Type II error β, and the FDR value pertaining to the chosen P_crit. The classic EBI includes several stages for estimating the posterior probabilities. First, using a measure of between-group difference (e.g. Student’s t-statistic or p-values) and transforming the values to normal (e.g. by inverse normal cumulative distribution function); second, estimation of f(z) from the z_j histogram; third, estimating f₀(z) by theoretical assumptions on the distribution of z_j or bootstrapping; forth, estimation of p₀, usually through the assumption that f₁(argmax_z f₀(z)) = 0; and finally, P₁(z) is found by (2) - (3). Here, we will explain the details of each stage for the new implementation of the EBI.

2.1.2 Test Statistic

Instead of using Student’s t-statistic or a p-value which reflects the difference of the means of two groups, here a non-parametric measure was used as a test statistic. The Area Under the Receiver Operating Curve (AUROC), A, is closely related to the Mann-Whitney U statistic. AUROC is the probability of data in one group being larger (or smaller) than the other group (Pr(X_g₌₀) < Pr(X_g₌₁)); hence it is considerably independent of the distribution of the original data, as well as any measure (e.g. mean or median) forcomparison. This has been thoroughly discussed elsewhere (Zhou, McClish, and Obuchowski, 2009). AUROC was therefore taken as the test statistic for comparing the m data points in the two groups for each comparison of the N variables.

It is noteworthy that while AUROC is independent of the underlying distribution of the data, the data in all N variables should come from the same null and alternative distributions. The AUROC distributions depend on the number of data in the first and second group, as well as the distribution of the original data. Therefore, the number of data points in each group should be (ideally) the same for all variables, and all of them should come from the same arbitrary distribution (e.g. normal, Beta, Gamma, uniform). This is especially relevant as the curve fitting for null and mixed density functions, as well as bootstrapping (for construction of null data) rely on pooling data from all variables.

While not essential to transform the AUROC values to normal, it’s beneficial to do so computationally. The transformation to normal allows the use of more robust estimation methods such as Gaussian kernel methods that work best in unbounded domain, rather than in bounded ([0 1]) domains. As the AUROC distribution is bounded between 0 and 1, with the expected value of 0.5 under null, a mapping of 2(.) − 1 combined with Fisher’s Z-transform, z_i = 0.5log_e((.)+1)− 0.5log_e((.) −1) (Fisher, 1915; Zhou et al., 2009), can approximately map the data to normal (Zhou et al., 2009; Qin and Hotilovac, 2007): where σ²(.) indicates the variance. The tuning parameter ϵ which is added to the classic definition here, serves to limit the extreme z values; hence, facilitating numerical integration in later steps. To limit the z values to [−10,10], ϵ = 4.55e − 5 was adopted here. To avoid sharp distributions where AUC = 1 and z = 10 (AUC = −1 and z = −10), the values larger than 9 (smaller than −9) were redistributed to a truncated normal distribution. The redistribution of h extreme values larger than 9, assigned the i^th value to , where iCDF is the inverse cumulative distribution function and , is the normal distribution with mean 9, standard deviation 0.25, truncated between [5, 13]. The values smaller than −9 were similarly reassigned.

2.1.3 Estimating the f (z) Histogram

Gaussian Mixture Model (GMM) distributions (McLachlan and Peel, 2004) were used to estimate the probability density f(z), using the pool of z_j, j = 1…N. Using maximum likelihood estimates of GMM parameters, models with increasing number of Gaussian kernels were set for fitting z_j values. The model with minimum Akaike Information Criterion, AIK, (Akaike,1974) was eventually considered as the preferred fit. This was concluded when the increasing number of kernels yielded 3 consecutive increases in the AIK. A similar approach has been previously used (Le, Pan, and Lin, 2003) in statistical genetics applications, but not in the context of EBI.

2.1.4 Estimating the Null Distribution f₀(z)

For robust estimation of the null distribution, the data labels g_i were set for B₀ times re-sampling with substitution; for each set of the obtained labels, the above-mentioned procedures used for the original data and labels were applied to yield A_j and subsequently the z_j values. The data from all the B₀ bootstraps and all the N tested variables were used for pooling to estimate the null distribution. For computational efficiency, it may be helpful to sparse the null distribution. In this implementation, when the null data points exceeded 20000, the null data were sorted and only the every sk data values were kept for GMM estimation (sk: the integer multiples of 20000 in the number of null datavalues). UsingsimilarGMM estimation asfor f(z), the null distribution f₀(z) was estimated.

2.1.5 Estimating the prior p₀

The approach used by EBI for estimation of the prior p₀, relies on the key assumption that at maximum (peak) value of f₀(z), the value of f₁(z) is zero. Due to the smooth and reliable estimation of f₀(z) and f(z) by AIK-guided GMM fits, we may directly use the values of the estimated probability density functions to find the prior p₀: where iCDF_f₀(z)(0.25) is the z value at which the Cumulative Density Function (CDF) of f₀(z) is 0.5, which is the inverse CDF of 0.5, i.e. the median of the null data.

2.1.6 Estimating the Posterior P₀(z)

Given the estimates of p₀, f₀(z) and f(z), the calculation of posterior probabilities from (2) and (3) are straightforward. A bound between 0 and 1 was considered to protect against numerical instability at very small probability values.

2.1.7 Estimating FDR and Power (1 − β)

Following the calculation of p₀, p₁, f₀(z), f₁(z), f(z), P₀(z) and P₁(z), the Type I error α, Type II error β, power (1-β) and FDR (q) can be found by numerical integrations. This is achieved by using a decision threshold value (see Section 2.1.8 for how this threshold is decided on) on either of these measures to infer which variables do or do not show an interesting effect. For a given set decision threshold, a criterion on the Posterior, P_cr, we may write: as the parameter ϵ in section 2.1.2 limits the values of z, the integration would suffice to take place in the range [−20,20]. Additionally, the global values α_g and β_g show the sperability of the probability density distributions regardless of a chosen P_cr:

2.1.8 Threshold Selection

The threshold selection and subsequent inference is driven by setting a P_cr value and comparing the values of P₁(z) against this specific P_cr value, as required in the specific context of application. Alternatively, the availability of the computed values of α, β, q, and the detection ratio #{P₁(z_j) ≥ P_cr}/N as a function of P₁(z), allows setting the P_cr values that correspond to specific values of these measures.

2.2 Non-null Bootstrapping for Single Inference

The above-mentioned procedure is applicable for large-scale multiple testing, as this enables the estimation of empirical mixed density f(z), the priors p₀ and p₁, and eventually the Posteriors P₀(z) and P₁(z). For single statistical testing (N = 1), similar stages can be followed in order to calculate the test statistic, estimate the histograms or probability density functions, and calculate the null distribution. However, it is not possible to estimate the priors; hence, these variables are not available. Notwithstanding, it is possible to estimate f₁(z) by a different approach, namely the non-null bootstrapping which will make it possible to estimate α and β for a specific threshold, which would be a criterion on z (rather than on P₁(z)). This approach is explained below:

2.2.1 Non-Null Bootstrapping & Estimating f₁(z)

To estimate the non-null distribution f₁(z) we may rely on bootstrapping when the grouping information of the data g_i are respected (in contrary to the commonly practised null bootstrapping that aims to find the null distribution where the grouping information/labels are not respected). For this purpose, the following re-sampling algorithm was used:

If m_b₀ is the number of observations in group g = 0, take m_b₀ samples (with substitution) from {X_i.| g_i = 0} to build Ξ_b0.
If m_b₁ is the number of observations in group g = 1, (m_b₀ + m_b₁ = m), take m_b₁ samples (with substitution) from {X_i.|g_i = 1} to build Ξ_b1.
find the A (AUC for ROC curve) and consequently the z value according to (4) using the obtained set of the bootstrapped data Ξ_b0 and Ξ_b1.
Repeat this procedure (1-3) for B₁ times to obtain the needed samples of z_b, b = 1…B₁.

The GMM was then chosen to find the distribution of z_b values, which estimates f₁(z).

2.2.2 Estimating α and Power (1 − β)

Similar to the calculation of α and β for a given P_cr value in (7) and (8), it is possible to use numerical integration to find the relationship between α and β. If F₀(z) is the cumulative density function corresponding to f₀(z), then for a given two-tail decision region Z(α), it is possible to describe β a as function of α:

2.2.3 Threshold Selection

The threshold selection in single testing is straightforward, given the known relationship between α and β in (13).

2.3 Numerical Implementation & Simulations

The numerical programming for the proposed EBI implementation was performed in MATLAB (versions 2016b-2018a, Mathworks Inc., Natick, MA, USA). The Empirical Bayesian Inference Toolbox for MATLAB is publicly available at https://github.com/NeuroMotor-org/EBI and is licensed under BSD 3-Clause “New” or “Revised” License. To demonstrate the utility of the proposed implementation and to test its validity, it was applied to simulated data. Simulated data allow comparison of the performance measures to real truth, which is not available in real life applications. All simulations were performed in MATLAB. Different simulations were carried out as detailed below.

2.3.1 A Demonstrating Example

An example similar to applications in neural signal analysis and neuroimaging was considered. The simulation included N_var = 2000 variables, each with m₀ = 20 observations/subjects in first sample (e.g. controls), and m₁ = 60 observations/subjects in second sample (e.g. patients), totalling m_j = 80 observations/subjects. In control observations, all variables had a normal distribution , while in the other group the first 1600 variables have the same distribution, the next 300 variables come from and the last 100 variables were from .

2.3.2 Comparison Against Real Truth

Using the simulated data from previous section, the β and FDR (or α) values were calculated as a function of the posterior threshold P_cr and were compared to the true values of β and FDR (α). The real values were found by using the original labels of the variables in the simulations. By comparing them to the detected labels by EBI, the true positive (TP), false positive (FP), false negative and true negative (TN) rates were calculated at each threshold, from which the the real β and FDR were found. Additionally, the same data underwent EBI analysis with a previous implementation of EBI (Efron, 2007b) in R (r_core_team_r:_2016), when used with the default value.

2.3.3 Performance Under Different Condition

Several simulations were performed to test the performance of the framework in a broader range of conditions. This controlled variation of the simulation condition can test and inform of the performance in real-life applications, which is not easy with typical experimental data due to the lack of a gold standard or real truth. The parameters for generation of simulated data and the application of the new EBI implementation included: N_var = 200, 2000, p₀ = 0.25, 0.75, m₀ = 25,100, m₁ = 25,100, Normal (σ = 1) vs. Beta (a = 2,b = 10) distribution types, difference or effect size (Cohen’s d = 0.2, d = 0.9 for normal distributions or shift d = 0.05, d = 0.09 for Beta distributions). The estimated p₁, real FDR at expected value of 0.05, and real β at expected value 0.2 were compared across all the 64 simulation conditions. Each simulation condition was repeated 3 times to account for the non-deterministic nature of the implemented bootstrapping and estimation procedures.

2.3.4 Simulation of a Uni-Variate Example

To demonstrate the derivation of the α − β curve, a simple simulation with m₀ = 15 and m₁ = 25, random data with normal distributions for both groups (σ = 1), and shift value (Cohen’s d) of 0.5 was considered.

3 Results

3.1 Example of Multiple Testing with EBI

Figure 1 exemplifies the generated results and a report for a typical simulated case as described in Section 2.3.1. Notice how the two probability density functions f₀(z) and f₁(z), as well as the prior p₁ are the essential components in giving rise to the Posterior distributions P₀(z) and P₁(z). In addition pay attention to how the choice of different threshold levels (color-coded based on the criteria) helps to choose an informed statistical threshold for inference based on the levels of FDR and power they afford.

Figure 1. Report of an Empirical Bayesian Inference, applied on typical simulated data.

The probability density functions of the mixed (f(z)) and null (f₀(z)) data, estimated from the original and permuted, as well as of the non-null (f₁(z)) data, estimated from f(z) and f₀(z) by Bayesian inference are plotted as a function of the z (transformed to z from original test statistics). Following the estimation of the fixed prior probability p₁ = 0.19 (the ratio of affected to non-affected variables), the Posterior probabilities for null (P₀(z)) and non-null (P₁(z)) are estimated. The α_global = 0.02 and β_global = 0.083 show the global level of Type I and II errors regardless of a specific threshold, calculated from the probability density functions and Posterior Probabilities. The threshold selection is facilitated by the plots of FDR and β as a function of P₁,_crit. The table indicates common criteria as a function of p₁, P₁, FDR, β and a (thresholds as bold diagonal values), and other corresponding values afforded by the selected criterion. For example, at p₁ = 0.19, the estimated FDR and β values will be 0.03 and 0.10 respectively. In the large plot, for each chosen criterion labelled by colour-coding in the legend, the projections to left and right indicate the afforded FRR and β by each criterion, respectively.

3.2 Comparison of Typical Behaviour Against Truth

Figure 2 compares the estimated FDR and β as a function of the threshold values (P₁), against the real truth, using simulation labels as described in Section 2.3.2. Notice the similarity of real truth curves and estimates by the new implementation. In addition, as the simulated condition had a low prior p₁, the results from the previous locfdr implementation in R showed a good conformity to the real truth and the new implementation.

Figure 2. Comparison of FDR and β found by the EBI implemntation against thereal truth in simulations.

The data and conditions correspond to Figure 1. The dotted lines show the estimations from the locfdr package in R.

3.3 Performance of the EBI under Various Simulation Conditions

Figure 3 compares the estimated prior p₁ against the real values, as well as the real FDR and β values when estimated at nominalvalues at 0.05 and 0.2 (Described in Section 2.3.3). In the majority of conditions, the estimated measures were very close to the real values and there was negligible difference between the 5 different iterations of the simulation. The exception is at low effect sizes, combined with low number of observation/subjects and extreme prior values, where the estimation errors increase (possibly due to dissociations between the affected and non-affected variables and density functions). In the majority of the simulation cases the locfdr R package did not converge; therefore, the results were not included.

Figure 3. Comparison of estimated prior p₁ against real truth, and real FDR and β values estimated at nominal values 0.05 and 0.2 in simulations.

See Section 2.3.3 for details of methods.

3.4 Example Single Testing

Figure 4 shows the correspondence of different α values to β values, for a an exemplary simulated data (Section 2.3.4).

Figure 4. Correspondence of different α values to β values.

The plot can be used for informed selection of threshold, similar to multivariate EBI.

4 Discussion

While EBI (Efron et al., 2001; Efron, 2007b) provided a comprehensive theoretical framework for multivariate high-dimensional inference, the previous numerical implementation of EBI provided valid results only in limited conditions, namely, low prior p_l values and for rather high threshold values. Importantly, the previous implementation required numerous adjustments and parameter selection. The new implementation eliminates the need for parameter tuning (especially by using AIK for GMD fitting) and allows the method to be used in broader range of conditions. Importantly, the statistical power is explicitly estimated and made available for inference.

4.1 Applications

The new approach suits applications involving neural signal analysis, such as electromyography (EMG), Electroencephalography (EEG), as well as neuroimaging, e.g. Magnetic Resonance Imaging (MRI). More specifically, spectral, time-frequency, as well as functional and effective connectivity analyses can benefit most from the new statistical implementation. In applications such as fMRI, the need for improved statistical inference has been explicitly emphasised by highlighting the limitations with existing techniques that lead to high false discovery rates (Eklund, Nichols, & Knutsson, 2016). The existing attempts to improve the statistical inference in EEG connectivity analysis (Singh, Asoh, & Phillips, 2011; Singh et al., 2015), have yield only partial success to date. Here, we used simulations to compare the EBI reports against the real truth. Importantly, in 2 recent studies, we showed that EBI is reasonably cross-validated against traditional frequentist methods. EBI was cross-validated against the correction of significance level α according to the number of principal components in data (Iyer et al., 2017) when testing the significance of EEG time series. Moreover, the inference of EBI was cross-validated against adaptive False Discovery Rates (aFDR) (Benjamini & Hochberg, 1995; Benjamini et al., 2006) in comparing the average EEG connectivity patterns between healthy individuals and patient groups (Nasseroleslami et al., 2017).

A unique advantage of EBI is its ability to implicitly account for potential positive and negative correlations that may be present in the data. It is therefore a suitable candidate for situations where positive or negative correlations exist in multi-dimensional data (e.g. EEG/MEG network connectivity analysis or structural or functional MR imaging). This is afforded by the way the individual z-values pertaining to each variable are aggregated (i.e. the independent calculation of the test scores) and by the chosen approach for the calculation of a null distribution that similarly corresponds to the same data with or without correlation structures (Efron, 2007a). The flexible estimation of null distribution from permuted data by GMM affords such flexibility for inference in rather broad conditions.

In applications where only simple statistical testing is required, the calculated FDR is an accurate estimation equivalent to the procedures for pooled multivariate permutation tests, which can be used without reference to Bayesian inference.

4.2 Limitations

The practical range for the number of variables is between 100-10000. The performance beyond this range degrades. This limitation originates from the EBI framework, rather than a specific numerical implementation.

Too few variables lead to inaccurate probability density estimations where few isolated data points are not adequately represented by continuous distributions. In this situation, the extreme values of prior probabilities would correspond to fewer data points with real effect; hence, the probability densities fitted to these values will not be very representative and accurate.

On the other hand, too many variables lead to unwanted spread of the null distribution to the extent that inference at low FDR values does not yield significant results. This situation, however, can be partly remedied for by applying the EBI as several independent batches of analyses on the mutually exclusive chunks of data, each containing different variables. This is permissible as the quantities such as FDR and Posterior probability (and to a reasonable extent the power) are not affected by multiple testing (as is the case for p-values).

As the complete procedure for EBI relies on permutations for building the null distribution, the procedure would depend on random number generations and some variability in each run. Additionally, the numerical procedures for estimating the GMM fits to the distributions are subject to minor variability in each run. These 2 factors make the inference a non-deterministic procedure, which is subject to some variability. While important to take this into consideration, the results in Figure 3 idicate that this variability does not change the nature of the results.

Future studies are expected to focus on the factors that lead to inaccurate numerical estimations, further extending the range of operating conditions, as well as theoretical developments for robust estimation of prior when extreme data and conditions are processed.

5 Conclusion

The implementations of statistical inferences such as EBI that can inform of the posterior probabilities and statistical power need to be converted to common practice. This implementation of threshold selection for EBI and single testing has potential to add value to the neural signal analysis and neuroimaging studies by enabling realistic inference on high-dimensional multivariate data.

Acknowledgement

The author would like to thank the students and staff in the Academic Unit of Neurology, at Trinity College Dublin, the University of Dublin for facilitating and supporting this work. The study was supported by Irish Research Council (Government of Ireland Postdoctoral Research Fellowship GOIPD/2015/213 to the author) and by Science Foundation Ireland (SFI/16/ERCD/3854).

Appendix A: Extension of the Tests from Comparison to Correlation Coefficients and Beyond

The original EBI has been primarily used for two-sample one-dimensional location problems (between-group comparison) such as gene discovery by comparing a control group to a treatment or affected group, or similarly by comparison of the healthy individuals against patients as intended in neuro-electro-magnetic signal analysis. Notwithstanding, the framework can be similarly used for virtually any statistical test, such as one-sample location problems (where comparison of data against zero or paired comparison of data are intended), as well as correlation analysis. These options have been implemented in the EBI Toolbox for MATLAB.

5.1 One Sample Inference

For a one-sample 1-dimensional location problem, including n data points x_i, the Wilcoxon’s Signed Rank test statistic is defined as , where R(i) is the rank of the {|x_i|| x_i ≠ 0}. The normalised test static may be defined as W_n = W/(ΣR(i)) which is bounded between 0 and 1. Therefore, W_n can be transformed to the z space using (4), as in the case of AUC and the subsequent procedures will be similar to the two-sample problem. The bootstrapping procedure for building the null distribution, is carried out by performing random sign flips (multiplying data by random −1 or 1 values) and recalculation of the test statistics for the number of bootstrapping cycles.

5.2 Correlation Coefficient

To use the same framework for analysis of correlation coefficients, the Spearman’s correlation coefficient ρ can be mapped to the [0 1] range (as for AUROC, A) by the transformation (ρ + 1)/2. In this case, the grouping information will have equal numberof paired zeros and ones. The null permutation of the data shall consist of using separate re-sampling (with substitution) from the first and second groups of observations for the same data points as the original data, which disregards their pairing information.

References

↵
Akaike, H. (1974, December). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/TAC.1974.1100705
OpenUrl CrossRef
↵
Benjamini, Y. & Hochberg, Y. (1995, January 1). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
OpenUrl CrossRef Web of Science
↵
Benjamini, Y., Krieger, A. M., & Yekutieli, D. (2006, January 9). Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93(3),491–507. doi:10.1093/biomet/93.3.491
OpenUrl CrossRef Web of Science
↵
Efron, B. (2004). Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. Journal of the American Statistical Association, 99(465), 96–104.
OpenUrl CrossRef Web of Science
↵
Efron, B. (2007a). Correlation and Large-Scale Simultaneous Significance Testing. Journal of the American Statistical Association, 102(477), 93–103.
OpenUrl CrossRef Web of Science
↵
Efron, B. (2007b, August). Size, power and false discovery rates. The Annals of Statistics, 35(4),1351–1377. doi:10.1214/009053606000001460
OpenUrl CrossRef
↵
Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American statistical association, 96(456), 1151–1160. doi:10.1198 / 016214501753382129
OpenUrl CrossRef Web of Science
↵
Eklund, A., Nichols, T. E., & Knutsson, H. (2016, December 7). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 113(28),7900–7905. doi:10.1073/pnas.1602413113. pmid: 27357684
OpenUrl Abstract/FREE Full Text
↵
Fisher, R. A. (1915). Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika, 10(4), 507–521. doi:10.2307/2331838. JSTOR: 2331838
OpenUrl CrossRef
↵
Iyer, P. M., Mohr, K., Broderick, M., Gavin, B., Burke, T., Bede, P., … Vajda, A. (2017). Mismatch negativity as an indicator of cognitive sub-Domain Dys-function in amyotrophic lateral sclerosis. Frontiers in Neurology, 8,395.
OpenUrl
↵
Le, C. T., Pan, W., & Lin, J. (2003, July 1). A mixture model approach to detecting differentially expressed genes with microarray data. Functional & Integrative Genomics, 3(3), 117–124. doi:10.1007/s10142-003-0085-7
OpenUrl CrossRef PubMed
↵
McLachlan, G. & Peel, D. (2004, April 5). Finite Mixture Models. John Wiley & Sons.
↵
Nasseroleslami, B., Dukic, S., Broderick, M., Mohr, K., Schuster, C., Gavin, B., … Hardiman, O. (2017, November). Characteristic Increases in EEG Connectivity Correlate With Changes of Structural MRI in Amyotrophic Lateral Sclerosis. Cerebral Cortex, 1–15. doi:10.1093/cercor/bhx301
OpenUrl CrossRef
↵
Nuzzo, R. (2014). Statistical errors. Nature, 506(7487), 150–152.
OpenUrl CrossRef PubMed Web of Science
↵
Qin, G. & Hotilovac, L. (2007, August 14). Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Statistical Methods in Medical Research, 17(2), 207–221. doi:10.1177/0962280207087173
OpenUrl CrossRef
↵
Singh, A. K., Asoh, H., & Phillips, S. (2011, September 1). Optimal detection of functional connectivity from high-dimensional EEG synchrony data. NeuroImage, 58(1), 148–156. doi:10.1016/j.neuroimage.2011.05.082
OpenUrl CrossRef PubMed
↵
Singh, A. K., Asoh, H., Takeda, Y., & Phillips, S. (2015, March 30). Statistical Detection of EEG Synchrony Using Empirical Bayesian Inference. PLoS ONE, 10(3), e0121795. doi:10.1371/journal.pone.0121795
OpenUrl CrossRef
↵
Zhou, X.-H., McClish, D. K., & Obuchowski, N. A. (2009, September 25). Statistical Methods in Diagnostic Medicine. Hoboken, NJ, USA: John Wiley & Sons.

View the discussion thread.

Posted June 08, 2018.

Download PDF

Citation Tools

Subject Area

Neuroscience

Subject Areas

All Articles

Animal Behavior and Cognition (5210)
Biochemistry (11736)
Bioengineering (8749)
Bioinformatics (29186)
Biophysics (14964)
Cancer Biology (12086)
Cell Biology (17403)
Clinical Trials (138)
Developmental Biology (9418)
Ecology (14176)
Epidemiology (2067)
Evolutionary Biology (18299)
Genetics (12235)
Genomics (16795)
Immunology (11863)
Microbiology (28066)
Molecular Biology (11582)
Neuroscience (60936)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4956)
Plant Biology (10423)
Scientific Communication and Education (1683)
Synthetic Biology (2883)
Systems Biology (7338)
Zoology (1650)

[1] ↵
Akaike, H. (1974, December). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/TAC.1974.1100705
OpenUrl CrossRef

[2] ↵
Benjamini, Y. & Hochberg, Y. (1995, January 1). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
OpenUrl CrossRef Web of Science

[3] ↵
Benjamini, Y., Krieger, A. M., & Yekutieli, D. (2006, January 9). Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93(3),491–507. doi:10.1093/biomet/93.3.491
OpenUrl CrossRef Web of Science

[4] ↵
Efron, B. (2004). Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. Journal of the American Statistical Association, 99(465), 96–104.
OpenUrl CrossRef Web of Science

[5] ↵
Efron, B. (2007a). Correlation and Large-Scale Simultaneous Significance Testing. Journal of the American Statistical Association, 102(477), 93–103.
OpenUrl CrossRef Web of Science

[6] ↵
Efron, B. (2007b, August). Size, power and false discovery rates. The Annals of Statistics, 35(4),1351–1377. doi:10.1214/009053606000001460
OpenUrl CrossRef

[7] ↵
Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American statistical association, 96(456), 1151–1160. doi:10.1198 / 016214501753382129
OpenUrl CrossRef Web of Science

[8] ↵
Eklund, A., Nichols, T. E., & Knutsson, H. (2016, December 7). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 113(28),7900–7905. doi:10.1073/pnas.1602413113. pmid: 27357684
OpenUrl Abstract/FREE Full Text

[9] ↵
Fisher, R. A. (1915). Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika, 10(4), 507–521. doi:10.2307/2331838. JSTOR: 2331838
OpenUrl CrossRef

[10] ↵
Iyer, P. M., Mohr, K., Broderick, M., Gavin, B., Burke, T., Bede, P., … Vajda, A. (2017). Mismatch negativity as an indicator of cognitive sub-Domain Dys-function in amyotrophic lateral sclerosis. Frontiers in Neurology, 8,395.
OpenUrl

[11] ↵
Le, C. T., Pan, W., & Lin, J. (2003, July 1). A mixture model approach to detecting differentially expressed genes with microarray data. Functional & Integrative Genomics, 3(3), 117–124. doi:10.1007/s10142-003-0085-7
OpenUrl CrossRef PubMed

[12] ↵
McLachlan, G. & Peel, D. (2004, April 5). Finite Mixture Models. John Wiley & Sons.

[13] ↵
Nasseroleslami, B., Dukic, S., Broderick, M., Mohr, K., Schuster, C., Gavin, B., … Hardiman, O. (2017, November). Characteristic Increases in EEG Connectivity Correlate With Changes of Structural MRI in Amyotrophic Lateral Sclerosis. Cerebral Cortex, 1–15. doi:10.1093/cercor/bhx301
OpenUrl CrossRef

[14] ↵
Nuzzo, R. (2014). Statistical errors. Nature, 506(7487), 150–152.
OpenUrl CrossRef PubMed Web of Science

[15] ↵
Qin, G. & Hotilovac, L. (2007, August 14). Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Statistical Methods in Medical Research, 17(2), 207–221. doi:10.1177/0962280207087173
OpenUrl CrossRef

[16] ↵
Singh, A. K., Asoh, H., & Phillips, S. (2011, September 1). Optimal detection of functional connectivity from high-dimensional EEG synchrony data. NeuroImage, 58(1), 148–156. doi:10.1016/j.neuroimage.2011.05.082
OpenUrl CrossRef PubMed

[17] ↵
Singh, A. K., Asoh, H., Takeda, Y., & Phillips, S. (2015, March 30). Statistical Detection of EEG Synchrony Using Empirical Bayesian Inference. PLoS ONE, 10(3), e0121795. doi:10.1371/journal.pone.0121795
OpenUrl CrossRef

[18] ↵
Zhou, X.-H., McClish, D. K., & Obuchowski, N. A. (2009, September 25). Statistical Methods in Diagnostic Medicine. Hoboken, NJ, USA: John Wiley & Sons.