Abstract
Data binning can cope with overplotting and noise, making it a versatile tool for comparing many observations. However, it goes awry if the same observations are used for binning and contrasting. This creates an inherent circularity, leaving noise and regression to the mean insufficiently controlled. Here, we use population receptive field analyses – where data binning is commonplace – as an example to expose this flaw through simulations and empirical repeat data.
Main text
Data binning is often applied to large data sets in order to prevent overplotting and control noise. As such, it has become commonplace in population receptive field (pRF) modeling (Dumoulin & Knapen, 2018; Dumoulin & Wandell, 2008), where researchers are commonly interested in comparing visual field maps with thousands of observations between different (experimental) conditions. However, pRF modeling is only one out of several research areas where some form of differential data binning has been adopted, such as psychology (Gignac & Zajenkowski, 2020; Holmes, 2009; Preacher, MacCallum, Rucker, & Nicewander, 2005; Shanks, 2017), systems neuroscience (Holmes, 2009; Kriegeskorte, Simmons, Bellgowan, & Baker, 2009), epidemiology (Barnett, van der Pols, & Dobson, 2005), and presumably many more.
Although differential data binning can help us see an overall pattern in the face of an abundance of details, it goes awry if the same noisy observations are used for binning (selection) and contrasting (selective analysis). This is because dipping into noisy data more than once violates assumptions of independence, favoring some noise components over others, and eventually biasing descriptive and inferential statistics (Kriegeskorte et al., 2009). As such, double-dipping in differential data binning prevents us from – amongst other things – controlling for regression to the mean (e.g., Galton, 1886; Gignac & Zajenkowski, 2020; Holmes, 2009; Makin & De Xivry, 2019; Shanks, 2017). Regression to the mean is a statistical phenomenon operating when two variables are imperfectly correlated (e.g., due to random noise). In this case, extreme observations for one variable will on average be less extreme (closer to the mean) for the other variable (Campbell & Kenny, 1999; Cohen, Cohen, West, & Aiken, 2003; Shanks, 2017)1. The magnitude of regression to the mean tends to be higher the lower the correlation between the variables.
Regression to the mean and/or double-dipping are of particular concern in what is better known as post hoc subgrouping (Preacher et al., 2005), post hoc data selection (Shanks, 2017), and extreme groups approach (Preacher et al., 2005), all of which can be considered as subtypes of data binning. Post hoc subgrouping refers to collecting two measures, defining extreme subgroups post hoc using one measure (e.g., the lower and upper quantile), and then performing statistics on these measures for the extreme subgroups (Preacher et al., 2005). Post hoc data selection is similar but involves only one extreme subgroup (Shanks, 2017). Both of these practices are different from the extreme groups approach, where extreme subgroups are selected a priori based on one measure; that is, without collecting the whole range of the other measure (Preacher et al., 2005). Here, we focus on a post hoc scenario where essentially all subgroups are considered, not just the extreme ones (see also Gignac & Zajenkowski, 2020; Holmes, 2009). We label this procedure post hoc binning analysis.
Imagine we conduct a retinotopic mapping experiment (Dumoulin & Wandell, 2008), where we estimate pRF position and size for each voxel in the brain under a Baseline condition as well as a condition of Interest (see Figure 1 for a single pRF). We can think of the Interest and Baseline conditions as repeat data (e.g., Benson et al., 2018; van Dijk, de Haas, Moutsiana, & Schwarzkopf, 2016), different attention conditions (e.g, de Haas, Schwarzkopf, Anderson, & Rees, 2014; Klein, Harvey, & Dumoulin, 2014; van Es, Theeuwes, & Knapen, 2018; Vo, Sprague, & Serences, 2017), mapping sequences (e.g., Binda, Thomas, Boynton, & Fine, 2013; Infanti & Schwarzkopf, 2020), mapping stimuli (e.g., Alvarez, de Haas, Clark, Rees, & Samuel Schwarzkopf, 2015; Binda et al., 2013; Le, Witthoft, Ben-Shachar, & Wandell, 2017; Yildirim, Carvalho, & Cornelissen, 2018), scotoma conditions (e.g., Barton & Brewer, 2015; Binda et al., 2013; Haak, Cornelissen, & Morland, 2012; Prabhakaran et al., 2020), pRF modeling techniques (e.g., Carvalho et al., 2020) or uni- and multisensory conditions (Holmes, 2009) – to name but a few examples. As a pRF model, we adopt a 2D Gaussian, where pRF position represents the center of a pRF in visual space (the center of the Gaussian) and pRF size its spatial extent (the standard deviation of the Gaussian; see Figure 1). We then fit this model to the voxel-wise brain responses we measured in the retinotopic mapping experiment (Dumoulin & Wandell, 2008). To compare pRF positions in the Interest and Baseline condition voxel-by-voxel, we bin the pRF positions from both conditions according to the pRF positions from the Baseline condition. Subsequently, we quantify for each voxel the position shift from the Baseline to the Interest condition (see Figure 1 for a single pRF). Finally, we calculate the bin-wise mean shift. This is conceptually equivalent to calculating the bin-wise simple means for each condition and comparing them subsequently, be it descriptively or inferentially.
Either way, by adopting such a post hoc binning analysis, we essentially assume that the mean pRF position we quantify for each bin in the Baseline condition approximates the true mean pRF position. In particular, we presuppose that binning voxels according to pRF positions from the Baseline condition and aggregating them subsequently for this condition ensures that bin-wise noise components cancel out on average (see also Shanks, 2017). This, however, is not the case.
To illustrate this flaw, we generated a simplified contrast scenario with a null effect. In particular, we used random Gaussian noise to repeatedly disturb voxel-wise x0 and y0 coordinates (Figure 1) of a V1 visual field map from a single participant (Nrepeat = 200; sdnoise = 2 degrees of visual angle, dva). We did this twice to generate a Baseline and an Interest condition. We then converted the voxel-wise x0 and y0 samples to eccentricity values (Figure 1), as is often done in the pRF literature (see Figure 1-figure supplement 1 for interpretational difficulties with eccentricity when it comes to position shifts). This resulted in a gamma-like eccentricity distribution. Lastly, we binned the eccentricity values in both conditions according to the eccentricity values in the Baseline condition using deciles and calculated the bin-wise means for each condition2.
We plotted the bin-wise eccentricity means in the Baseline and Interest condition against one another along with individual observations per bin and marginal histograms (bin width = 0.5 dva) reflecting the simulated distributions3 (Figure 2, A., 1st column). Importantly, since there was no true difference between conditions, the bin-wise means should lie on the identity line. Contrary to this prediction, the bin-wise means systematically diverged from the identity line. Strikingly, when using the Interest instead of the Baseline condition for binning, the systematic pattern of divergence flipped (Figure 2, A., 2nd column). This bidirectionality is a typical sign of regression to the mean (Campbell & Kenny, 1999; Shanks, 2017) and due to circularity that leads to asymmetric bins (see bin-wise ranges of observations for Baseline and Interest, Figure 2, A., 1st and 2nd columns) and biases bin-wise noise components. In particular, for the condition that was used for contrasting and binning (henceforth circular condition), the bin-wise noise components of the x0 and y0 values were skewed on average. For the other condition (henceforth non-circular condition), however, the bin-wise noise components cancelled out on average (Figure 2, B., 1st and 2nd columns).
The skew in average noise renders the bin-wise eccentricity means of the circular condition more extreme, especially for lower and higher decile bins. As a result, the bin-wise eccentricity means for the non-circular condition regress – by statistical necessity – to the overall mean4 for this condition (red crosshair); that is, they are less extreme (see different ranges of bin-wise means for the circular and non-circular conditions in Figure 2, A., 1st and 2nd columns). If the Interest condition is then contrasted to the Baseline condition, a mean increase in eccentricity for lower deciles and a mean decrease for higher deciles or vice versa occurs, depending on whether the data are binned on the Baseline or Interest condition (Figure 2, A., 1st and 2nd column). This artifact arises because we did not use independent conditions for binning and contrasting; that is, conditions with independent noise components.
Importantly, how the artifact manifests can change when data are thresholded across conditions (i.e., corresponding observations are deleted in a pair-wise fashion; Figure 2-figure supplement 1-2, A. and B., 1st and 2nd columns) and/or noise scales with eccentricity (heteroskedasticity; Figure 2-figure supplement 3, A. and B., 1st and 2nd columns; see also Holmes, 2009). In fact, in the event of cross-thresholding, noise components are modified and might not necessarily cancel out for the non-circular condition (Figure 2-figure supplement 1-2, B., 1st and 2nd columns). The case of eccentricity-scaled noise furthermore shows that the artifact can include some clear regression away from the mean5 (egression; Figure 2-figure supplement 3, A., 1st and 2nd columns; e.g., Campbell & Kenny, 1999; Schwarz & Reike, 2018).
Condition cross-thresholding is common practice in the pRF literature where data are cleaned across conditions according to eccentricity, goodness-of-fit (R2), pRF size, missing data or other criteria from one or multiple conditions. Eccentricity-scaled noise is an equally likely scenario that might arise from fitting errors due to partial stimulation of pRFs (especially near the edge of the stimulated mapping area), higher variability in pRF position estimates for wider pRFs as well as fluctuations in the signal-to-noise ratio of brain responses due to central fixation and/or manipulating attention across visual space6.
The artifact also replicated when simulating a true effect (i.e., a radial shift of 2 dva in the Interest condition; Figure 2-figure supplement 4, A. and B., 1st and 2nd columns). The same was true for equidistant binning (Figure 2-figure supplement 5, A. and B., 1st and 2nd columns), which is frequently applied in the pRF literature. However, unlike decile binning, equidistant binning resulted in a lower number of observations for higher equidistant bins (due to the gamma-like eccentricity distribution; Figure 2-figure supplement 5, A., 1st and 2nd columns). Consequently, for higher equidistant bins, the skew in average noise for the circular condition was generally larger here. Similarly, for higher equidistant bins, noise components did not always cancel out for the non-circular condition (see all Figure 2-figure supplement 5, B., 1st and 2nd columns). This is because for random noise to cancel out on average, the number of observations needs to be sufficiently large.
For all presented simulation cases, the artifact likewise manifested for another kind of binning analysis, namely, when binning the x0 and y0 values according to both eccentricity and polar angle (i.e., 2D segments) and computing shift vectors (Figure 1 as well as Figure 3 and Figure 3-figure supplement 1-4, 1st row). Here, the bin-wise means regressed towards and away from the overall means of the x0 and y0 distribution.
Notably, for empirical repeat data from the Human Connectome Project (Benson et al., 2018, 2020), both kinds of binning analyses produced patterns consistent with the artifact (Figure 4-5 and Figure 4-figure supplement 1-3 and Figure 5-figure supplement 1-3, A.-C.). This establishes its practical relevance. Moreover, some of us recently retracted an article on attention-induced differences in pRF position and size in V1-V3 (de Haas et al., 2014) because an in-house reanalysis suggested that post hoc binning along with condition cross-thresholding and heteroskedasticity yielded artifactual (or artifactually inflated) results in the form of egression from the mean (de Haas et al., 2020). In this case, the apparent significant effect was an increase in eccentricity and pRF size in the Interest vs Baseline condition for eccentricity bins in the middle of the tested range.
Taken together, the heterogeneity in manifestation we exposed here makes it hard to spot the artifact by visual inspection alone and highlights its dependency on exact distributional properties of the data at hand (see also Campbell & Kenny, 1999; Holmes, 2009; Schwarz & Reike, 2018, for similar points).
How can we omit double-dipping and control for regression to the mean? We could, for instance, use an Independent condition for binning (such as repeat data or odd or even runs for the Baseline condition; Figure 2 and Figure 2-figure supplement 1-5, A., 3rd column as well as Figure 3 and Figure 3-figure supplement 1-4, 2nd row) or an anatomical criterion (Kriegeskorte et al., 2009), such as cortical distance. This way, noise components should nullify on average in both the Baseline and Interest condition (Figure 2 and Figure 2-figure supplement 1-5, B., third column), albeit not necessarily for sparsely populated bins (Figure 2-figure supplement 5, B., 3rd column as well as Figure 3 and Figure 3-figure supplement 1-3, 2nd row). Similarly, given that cross-thresholding reshapes noise components, they might not average out with an Independent condition (Figure 2-figure supplement 1-2, B., 3rd column as well as Figure 3-figure supplement 1-2, 2nd row). The same can evidently also happen with an anatomical criterion if the Baseline and Interest condition are subjected to cross-thresholding. Consequently, unless cross-thresholding can be omitted or demonstrated to be unbiased, an Independent condition might not be a safe option. Alternatively, we could use analyses without binning that control for circularity and regression artifacts or effects could be evaluated against appropriate null distributions that take into account all statistical dependencies (e.g., Holmes, 2009; Kriegeskorte et al., 2009). A combination of these approaches might be most fruitful. Regardless of the specific mitigation strategy, we believe that in light of the many layers of complexity in our analysis pipelines, we need to make it common practice to perform sanity checks using null simulations and empirical repeat data.
Uncontrolled post hoc binning analyses come in many flavours (e.g., centroids, shift vectors, eccentricity differences, x0 and y0 differences, and 1D or 2D bins) and are not restricted to pRF position estimates. For instance, biases should manifest equally when binning pRF size in a Baseline and Interest condition according to pRF positions from either of these conditions. Moreover, partial stimulation of pRFs likely results in heteroskedasticity and positively correlated errors for pRF size and position. This would, for instance, bias bin-wise pRF size vs eccentricity or pRF size vs pRF size comparisons where binning is based on non-independent eccentricity values. Likewise, fitting errors due to partial stimulation should be more pronounced whenever pRF size is larger, leading to stronger artifactual effects (for simulations using different levels of noise see Holmes, 2009). The same is to be expected based on a higher variability in pRF position estimates for wider pRFs. These factors might potentially explain why pRF position and size differences have been reported to be larger in higher-level areas where pRFs are wider. Moreover, the distribution of errors likely depends on the toolbox that was used for fitting (Lerma-Usabiaga, Benson, Winawer, & Wandell, 2020), making it hard to generalize across studies. Importantly, uncontrolled single bin (i.e., region of interest) analyses are equally affected by post-hoc binning (Kriegeskorte et al., 2009). And of course, delineations of visual areas in post hoc binning analyses should ideally also be based upon independent criteria as this is where selection starts.
The application of uncontrolled post hoc binning analyses in the pRF literature might have led to spurious claims about the plasticity of pRFs (see de Haas et al., 2014, 2020, for a possible example). Consequently, we urge researcher who engaged in post hoc binning to check for the severity of biases in their analyses by running adequate simulations and reanalyzing the original data wherever possible.
Without doubt, circularity and/or regression to the mean are thorny and omnipresent problems that can manifest subtly and diversely (e.g., Ball, Squeglia, Tapert, & Paulus, 2020; Barnett et al., 2005; Campbell & Kenny, 1999; Eriksson & Häggström, 2014; Gignac & Zajenkowski, 2020; Holmes, 2009; Kilner, 2013; Kriegeskorte et al., 2009; Preacher et al., 2005; Shanks, 2017; Vul, Harris, Winkielman, & Pashler, 2009). As such, we need to ensure that the validation of analysis procedures becomes part and parcel of the scientific process.
Materials and Methods
Post hoc binning using simulations
Stimuli and procedure
For the simulation analyses, we used data from a population receptive field (pRF) experiment involving a dynamic horizontal bar aperture (length of major axis: 17.15 degrees of visual angle, dva; length of minor axis: 1.27 dva). The bar aperture was centered and presented within the boundaries of a circular mapping area (diameter: 17.15 dva). It moved consecutively across the mapping area along cardinal (0/180° and 90/270°) and oblique axes (45/225° and 135/315°) and was superimposed onto a random dot kinematogram (RDK). The RDK comprised moving black dots (diameter: 0.13 dva) positioned within a square field (size: 17.03 × 17.03 dva). If a dot left the square field, it was moved back by 1 field width/height. The dots had a density of 6.89 dots/dva2, a lifetime of 36 frames, were repositioned randomly once they had died, and oscillated according to a sine wave (A = 1.29 dva, f =1 Hz, ω = 6.28 rad/s, φ = 0 rad). The sine wave was rotated with the current orientation of the bar aperture. The bar aperture and RDK were centered at the screen’s midpoint.
A semi-transparent (α = 50%) array of 5 vertical ovals was superimposed onto the bar aperture. One of the ovals was centered at the screen’s mid-point (length of major axis: 0.43 dva; length of minor axis: 0.28 dva) and the remaining ovals at an eccentricity of 4.29 dva (length of major axis: 0.86 dva; length of minor axis: 0.57 dva) and different polar angles (45°, 135°, 225°, and 315°). The ovals were presented as a rapid serial visual presentation (RSVP) task, where each trial started with 200 ms of oval presentation, followed by a blank (no ovals) of 600 ms. The ovals’ orientation (45° left- or rightwards from vertical) and color (red, yellow, cyan, orange, brown, white, black, green, and blue) changed pseudorandomly in each trial with the exception that ovals of the same color were never presented simultaneously. Participants had to press a button whenever a rightwards oriented oval was presented in blue or green color. A black radar grid (line width: 0.02 dva) at low opacity (α = 20%) with 12 radial lines (at polar angles: 0 to 330° with a step size of 30°) and 18 circles (diameters: 0.95 to 51.42 dva with a step size of 2.97 dva) was superimposed onto the screen. The radial lines ran from the midpoint of the screen to the outermost circle.
The experiment comprised 4 attention conditions, in which participants were required to perform the RSVP task on different oval streams whilst ignoring other streams and the bar aperture. The condition of relevance here is the Center condition, where participants performed the task on the central oval stream. This condition resembles a standard pRF mapping experiment. Participants performed 2 sessions à 4 runs per condition on consecutive days. The order of conditions was pseudorandomized.
Within each run, the bar aperture moved along each axis twice, so that the starting point covered all chosen polar angles. Specifically, the sequence of starting points in each run was: 90°, 225°, 180°, 315°, 270°, 45°, 0°, and 135°. One bar sweep lasted 28 s (1 step/s). Consecutive bar apertures overlapped by 50%. After 4 bar sweeps, a blank interval of 28 s (without the bar apertures and RDK) was presented, during which participants had to refrain from doing the RSVP task (a brief tone cued the beginning and end of this interval). The position and lifetime of each dot in the RDK at the start of every 28s-interval was randomized. Experimental procedures were implemented in Matlab 2014a (8.3; https://uk.mathworks.com/) using Psychtoolbox-3 (3.0.11; Brainard, 1997; Kleiner et al., 2007) and approved by the University College London ethics committee. Written informed consent was obtained from all participants.
Apparatus
Functional and anatomical images were acquired at a field strength of 1.5 T on a Siemens Avanto magnetic resonance imaging (MRI) scanner. All stimuli were projected onto a screen (resolution: 1920 × 1080 pixels; refresh rate: 60 Hz; background color: gray) at the back of the MRI scanner. Participants viewed the experiment through a head-mounted mirror. The viewing distance was approximately 67 cm. To ensure that participants could view the screen without obstruction, the front visor of a 32 channel coil was removed, leaving 30 effective channels.
MRI acquisition
We collected anatomical images using a T1-weighted magnetization-prepared rapid acquisition with gradient echo sequence (repetition time, TR = 2.73 s; echo time, TE = 3.57 ms; voxel size = 1 mm isotropic; flip angle = 7°; field of view, FoV = 256 mm × 224 mm; matrix size = 256 × 224; 176 sagittal slices) and functional images using a T2*-weighted multiband 2D echo-planar imaging sequence (Breuer et al., 2005, TR = 1 s; TE = 55 ms; voxel size = 2.3 mm isotropic; flip angle = 75°; FoV = 224 mm × 224 mm, no gap, matrix size: 96 × 96, acceleration = 4, 36 transverse slices). The slice tab for the functional images was aligned to be roughly parallel to the calcarine sulcus so that the posterior third of the cortex was well covered.
Preprocessing
The initial 10 volumes of each run were discarded to allow for magnetisation to reach equilibrium. Using SPM8 (6313; https://www.fil.ion.ucl.ac.uk/spm/software/spm8/), functional images were then bias-corrected, realigned, unwarped, coregistered to the anatomical image, and finally projected onto an anatomical surface model constructed in FreeSurfer (5.3.0; Dale, Fischl, & Sereno, 1999; Fischl, Sereno, & Dale, 1999). We generated vertex-wise functional MRI (fMRI) time series per run by determining the functional voxel at half the distance between corresponding vertices in the pial surface and gray-white matter mesh. We then applied linear detrending to the time series of each run and z-standardized them. Surface projection, detrending, and z-standardization were performed in Matlab 2016b (9.1; https://uk.mathworks.com/) using SamSrf7 (7.05; https://github.com/samsrf/samsrf/tree/3c7a0e25090e9097d5e2fd95696c00774acd26d6).
PRF estimation and delineations
The vertex-wise preprocessed time series of the Center condition were averaged across the 2 sessions. We then fit a 2D isotropic Gaussian pRF model with 5 free parameters (x0, y0, σ, β0, and β1) to the vertex-wise average time series. To this end, we first predicted pRF responses by calculating the overlap between the pRF model and an indicator function of the bar aperture for each volume using a 100 × 100 pixel matrix. Specifically, we used a 3D search space of possible values for σ (8.5×2-5.6:0.2:1), x0 and y0, and generated pRF responses for each combination of these values. Values for x0 and y0 were first sampled from the polar angle system (polar angles: 0:10:350°; eccentricities: 8.5×2-5:0.2:0.6) and then transformed to Cartesian coordinates. The pRF response per volume was expressed as mean percent overlap with the pRF model.
To obtain a predicted fMRI time series, we then convolved these pRF responses with a canonical hemodynamic response function (de Haas et al., 2014). Next, we calculated the Pearson correlation between the predicted and the observed fMRI time series and retained the combination of parameter values showing the largest R2 with all R2s ≥ .01. These initial parameter estimates were then used as seeds for an optimization procedure aimed at further maximizing the Pearson correlation between the observed and predicted fMRI time series using a Nelder-Mead algorithm (Lagarias, Reeds, Wright, & Wright, 1998; Nelder & Mead, 1965). Lastly, we estimated β0 and β1 by performing linear regression between the observed and predicted time series. The final parameter maps were smoothed with a spherical Gaussian kernel (FWHM = 3mm). Vertices with a very poor R2 (<.01) or artifacts (σ ≤ 0, β1 ≤ 0 or β1 > 3) were removed prior to smoothing. V1 hemifield maps were manually delineated based on smooth polar angle maps using polar angle reversals (Engel, Glover, & Wandell, 1997; Sereno et al., 1995; Wandell, Dumoulin, & Brewer, 2007). These delineations were used as a mask to extract V1 vertices. Fitting, smoothing, and manual delineations were performed in Matlab 2016b (9.1; https://uk.mathworks.com/) using SamSrf7 (7.05; https://github.com/samsrf/samsrf/tree/3c7a0e25090e9097d5e2fd95696c00774acd26d6).
Simulations
As outlined in the main text, we generated 6 simulation cases: a null effect, a null effect with condition cross-thresholding based on the Baseline condition, a null effect with condition cross-thresholding based on both the Baseline and Interest condition, a null effect with eccentricity-scaled noise, a true effect, and a null effect with equidistant binning (instead of decile binning which was applied to the other cases). These cases were chosen to illustrate a given issue in a clear fashion using an empirical pRF parameter distribution as a basis, not to mimic the exact properties of empirical data (which is unfeasible without explicit knowledge of the noise distribution).
For all simulation cases, x0 and y0 estimates from both cortical hemispheres were pooled and empty data points or obvious artifacts removed (σ ≤ 0 and β1 ≤ 0). Moreover, all simulation cases followed the same general procedure of the null effect involving eccentricity as outlined in the main text (including parameters settings and the same seed for random number generation) with exceptions as follows.
1D post hoc binning analyses on eccentricity
For the simulation cases involving condition cross-thresholding, we removed simulated observations falling outside a certain eccentricity range (≥ 0 and ≤ 6 dva) in the Baseline or Baseline and Interest condition from all conditions (i.e., Baseline, Interest, and Independent). For the simulation case involving eccentricity-scaled noise, we used a small standard deviation (sd = 0.25 dva) of random Gaussian noise to disturb original observations with smaller eccentricities (≥ 0 and < 3 dva) and a larger standard deviation (sd = 2 dva) to disturb original observations with larger eccentricities (≥ 3 dva). For the simulation case involving a true effect, we induced a radial increase in eccentricity of 2 dva in the Interest condition. For the simulation case involving equidistant binning, we used a constant bin width of 1.75 dva and an overall binning range of 0 to 19.25 dva eccentricity. For all simulation cases, the Independent condition consisted of a second draw (resample) of the Baseline condition.
2D post hoc binning analyses on x0 and y0
Apart from a 1D binning analysis on eccentricity, we also conducted a 2D binning analysis on the simulated x0 and y0 values. To this end, we converted the x0 and y0 values to polar coordinates, that is, polar angle and eccentricity (Figure 1). We then binned the x0 and y0 values in the Baseline or Interest condition according to their polar coordinates in the Baseline, Interest, or Independent condition using equidistant bins and calculated the bin-wise x0 and y0 means for each condition. The condition-wise means were visualized as vector graphs. The polar angle bins ranged from 0° to 360° with a constant bin width of 45°. The eccentricity bins ranged from 0 to 22 dva (for the simulation case involving a true effect) or from 0 to 20 dva (for all other simulation cases) with a constant bin width of 2 dva. The 2D binning analysis was performed for all aforementioned simulation cases (apart from the case of equidistant binning of course).
Post hoc binning using repeat data
For the repeat data analysis, we used publicly available pRF estimates from the Human Connectome Project 7 T Retinotopy Dataset (Benson et al., 2018, 2020). These estimates stem from a split-half analysis where a 2D isotropic Gaussian with a subadditive exponent was fit to fMRI time series from the first and second half of 6 pRF mapping runs. For each half, 6 estimates were obtained for each grayordinate (vertex), that is, pRF polar angle, pRF eccentricity, pRF size, pRF gain, percentage of R2, and mean signal intensity. The maximal eccentricity of the mapping area subtended 8 dva. For further details, see Benson et al. (2018).
Following Benson et al. (2018), we analysed complexes of visual areas across hemispheres for the 25th and 75th percentile participants of the R2 distribution using delineations from Wang et al.’s (2015) atlas. Benson et al. (2018) generated the R2 distribution by calculating the median R2 for each participant across grayordinates from both cortical hemispheres within all areas of Wang et al.’s (2015) atlas. The posterior complex consisted of V1-V3, the ventral complex of VO-1/2 and PHC-1/2, the dorsal complex of V3A/B and IPS0-5, and the lateral complex of LO-1/2 and TO-1/2. For our purposes, we focused on the posterior and dorsal complexes, as those came with a larger number of available data points (which was particularly necessary to perform the 2D post hoc binning analysis and generate vector graphs).
To obtain x0 and y0 estimates, polar angle and eccentricity estimates were converted to Cartesian coordinates. The eccentricity, x0, and y0 estimates of the first half were used as a Baseline condition and those of the second half as an Interest condition. Grayordinates with unusual/implausible values (R2 ≤ 0% or σ ≤ 0) in either condition were removed from both conditions.
Similar to the simulation-based analyses, binning was either based on the Interest or Baseline condition and bin-wise means were calculated. Moreover, binning was either performed with or without condition cross-thresholding. As for the latter case, we removed observations falling outside a certain eccentricity range (≥ 0 and ≤ 8 dva) or below a certain R2 cut-off (≤ 2.2%) in the Baseline or Baseline and Interest condition from both conditions. The R2 cut-off of 2.2% was adopted from Benson et al. (2018).
The 1D binning analysis involving eccentricity and the 2D binning analysis involving x0 and y0 were conducted as for the simulated data, although, here, the eccentricity bins for the 2D analysis ranged from 0 to 18 dva with a constant bin width of 2 dva. All binning analyses (including those on simulated data) were implemented in Matlab 2016b (9.1; https://uk.mathworks.com/) using custom code.
Data and code availability
Preprocessed data, custom code, and figures are available at https://doi.org/10.17605/0SF.I0/WJADP.
Declaration of competing interest
The authors declare no conflict of interest.
Supplementary figures
Acknowledgements
This research was supported by European Research Council Starting Grants to DSS (WMOSPOTWU, 310829) and BdH (INDIVISUAL, 852885). BdH was further supported by the Deutsche Forschungsgemeinschaft (222641018–SFB/TRR 135 TP A8).
Footnotes
↵1 To be precise, regression to the mean refers to standard scores (z-scores; Campbell & Kenny, 1999; Kenny, 2005).
↵2 Note that when evaluating data distributions with unequal means, variances, or non-linearity, z-standardization might be necessary to detect regression to or away from the mean (Campbell & Kenny, 1999; Shanks, 2017). In particular, z-standardization makes data distributions directly comparable. As such, bin-wise means should regress to wherever they intersect the identity line. Here, we always display data in native space, as this is typically done in the pRF literature. However, we use crosshairs to indicate the location of the mean and thus provide a visual guideline.
↵3 Note that apart from the visualizations provided here, it might be beneficial to additionally look at Galton squeeze diagrams to detect regression to or away from the mean (Campbell & Kenny, 1999; Shanks, 2017).
↵4 Note that for skewed distributions (such as the gamma-like distribution here), the regression effect might be actually towards the mode and away from the mean of the overall distribution (Schwarz & Reike, 2018). If the location of the overall mode and mean are sufficiently close, our visualizations would be unable to distinguish these two cases.
↵5 Note that the regression was presumably towards the nearest modes of the simulated bimodal distribution (see marginal histograms in Figure 2-figure supplement 3, A., 1st and 2nd columns; Schwarz & Reike, 2018).
↵6 Note that floor/ceiling effects (due to physiological and methodological constraints on the minimum and maximum observable value) and/or the calculation of absolute (raw) vs proportional (%) differences are further factors influencing the artifact’s appearance (de Haas et al., 2014; de Haas, Schwarzkopf, Anderson, & Rees, 2020; Holmes, 2009).