ABSTRACT
Fixation-related potentials (FRPs), neural responses aligned to saccade offsets, are a promising tool to study the dynamics of attention and cognition under natural viewing conditions. In the past, four methodological problems have complicated the analysis of combined eye-tracking and EEG experiments: (i) the synchronization of data streams, (ii) the removal of ocular artifacts, (iii) the condition-specific temporal overlap between the brain responses evoked by consecutive fixations, (iv) and the fact that numerous low-level stimulus and saccade properties also influence the post-saccadic neural responses. While effective solutions exist for the first two problems, the latter ones are only beginning to be addressed. In the current paper, we present and review a unified framework to deconvolve overlapping potentials and control for linear and nonlinear confounds on the FRPs. An open software implementation is provided for all suggested procedures. We then demonstrate the advantages of this analysis approach for three commonly studied free viewing paradigms: face perception, scene viewing, and natural sentence reading. First, for a traditional ERP face recognition experiment, we show how deconvolution can separate stimulus-ERPs from the overlapping muscle and brain potentials produced by small (micro)saccades on the face. Second, in natural scene viewing, we isolate multiple non-linear influences of saccade parameters on the FRP. Finally, for a reading experiment using the classic boundary paradigm, we show how it is possible to study the neural correlates of parafoveal preview after removing the spurious overlap effects caused by the difference in average fixation time. Our results suggest a principal way of measuring reliable fixation-related potentials during natural vision.
INTRODUCTION
During everyday life, we make two to four eye movements per second to extract new information from our visual environment. Despite the fundamentally active nature of natural vision, the electrophysiological correlates of visual cognition have mostly been studied under passive viewing conditions that minimize eye movements. Specifically, in most event-related potential (ERPs) experiments, participants are instructed to fixate the screen center, while stimuli are presented at a comparatively slow pace.
An alternative approach that has gained popularity in recent years are simultaneous recordings of eye movements with the electroencephalogram (EEG) during the free viewing of complex stimuli. In such co-registration studies, the EEG signal can then be aligned to the end of naturally occurring eye movements, yielding fixation-related brain potentials (FRPs, for reviews see Baccino, 2011; Dimigen, Sommer, Hohlfeld, Jacobs, & Kliegl, 2011; Nikolaev, Meghanathan, & van Leeuwen, 2016). Compared to traditional passive stimulation paradigms without eye movements, this data-rich technique has the advantage that it combines the behavioral information gained from eye tracking (such as fixation durations and locations) with the high temporal resolution and neurophysiological markers provided by EEG, allowing the researcher to resolve the attentional, cognitive, and affective processes occurring within individual fixations.
However, the co-registration of eye movements and EEG during free viewing is also complicated by several data-analytic challenges (Dimigen et al., 2011) which have hampered the more widespread adoption of this technique in neurocognitive research. These problems, as illustrated in Figure 1, are (1) the synchronization and integration of the two data streams, the ocular measurement artifacts caused by movements of the eye balls, eye lids, and extraocular muscles, (3) the strong and condition-specific temporal overlap between the brain responses evoked by successive fixations, and (4) the strong and nonlinear influences of visual and oculomotor low-level variables on the neural responses produced by each eye movement. Whereas good solutions exist for the first two problems, the latter two are only beginning to be solved. In the current paper, we describe an integrated analysis framework for EEG analyses during natural vision which addresses these remaining two problems (overlap and low-level influences). We also provide a tutorial review on how this framework can be implemented using a recently introduced open-source toolbox that offers all of the necessary procedures (Ehinger & Dimigen, 2018). Finally, to demonstrate how this approach can improve the analysis of unconstrained viewing experiments and produce new theoretical insights, we will apply it to co-registered datasets from three domains of neurocognitive research: face perception, scene viewing, and sentence reading.
Four problems related to free viewing
The four methodological problems are illustrated in Figure 1: The first problem summarizes several technical issues related to the simultaneous recording, precise synchronization, and joint representation of the two data streams. Nowadays these issues are largely solved by optimizing the laboratory setup and by sending shared trigger pulses to both systems during each trial (e.g. Baccino & Manunta, 2005). The two recordings can then be aligned offline at millisecond precision with existing software (e.g. the EYE-EEG toolbox; Dimigen et al., 2011; see also Baekgaard et al., 2014; Xue et al., 2017) that also adds saccade and fixation onsets as additional event markers to the EEG.
The second problem are the massive distortions of the EEG generated by movements of the eye balls, eyelids, and extraocular muscles during free viewing (for recent reviews, see Plöchl et al., 2012; Dimigen, 2018). The eye balls, in particular, act as electrostatic dipoles that rotate with each eye movement producing large voltage distortions across the scalp (corneoretinal artifact). Two smaller artifacts are produced by the relative movement of the eye lids over the cornea during upward saccades and by the recruitment of the eye muscles at saccade onset (saccadic spike potential, Keren, Yuval-Greenberg, & Deouell, 2010). All three ocular artifacts – corneoretinal, eye lid, and spike potential – need to be removed from the EEG without distorting brain activity. Fortunately, algorithms like independent component analysis (ICA, Jung et al., 1998) are efficient in isolating and removing most of these ocular artifacts, even under free viewing conditions (Nikolaev et al., 2016). The correction can be further improved by taking into account the information provided by the eye-tracker. Specifically, the simultaneously recorded eye-tracking signal is useful to select optimal training data for the ICA algorithm (Keren et al., 2010, Dimigen, 2018), to identify artifact components (Plöchl et al., 2012), and to objectively evaluate the results of the correction. With such optimized procedures, ocular artifacts can be removed almost fully from free viewing experiments (Dimigen, 2018).
The last two problems, differences in temporal overlap and differences in terms of low-level covariates, are much more serious and a direct consequence of the fast pace and quasi-experimental nature of normal visual exploration behavior. In traditional EEG laboratory experiments, the experimenter has full control over the timing and sequence of the presented stimuli and the participant’s motor behavior is often restricted to a single button press. In most cases, it is also possible to match the visual low-level properties of the stimuli between conditions. In contrast, in any experiment with multiple saccades, the participant rather than the experimenter decides where to look and when to look at a stimulus belonging to a given condition. This means that the durations of fixations, the size and direction of saccades, and the low-level features of the stimulus at the currently foveated location (e.g. the local luminance and local contrast of an image) are not only intercorrelated with each other (Nuthmann, 2017), but usually different between the experimental conditions. Since all of these factors also influence the post-saccadic brain response, this can easily lead to condition-specific distortions and incorrect conclusions. In the following sections, we will describe these two problems and some proposed solutions in more detail.
Varying temporal overlap
Event-related potentials that index cognitive processes often last up to a second before the signal tapers off and returns to baseline. In contrast, the average fixation lasts only about 200-400 ms (Rayner, 2009) in most viewing tasks. This rapid pace of natural vision means that brain responses elicited by any given fixation will strongly overlap with those of preceding and following fixations. While this overlapping activity is smeared out due to the normal variation in fixation durations, overlap becomes a problem whenever the distribution of fixation durations differs between conditions.
As an example, consider a visual search task, where participants search for a target item within a set of distractors (e.g. Kamienkowski et al., 2012). It is well-known that on average, task-relevant target items are fixated longer than irrelevant distractors (Brouwer, Reuderink, Vincent, van Gerven, & van Erp, 2013). This in turn means that the visually-evoked lambda response (the analogue of the P1 component in FRPs) from the next fixation will overlap at an earlier latency in the distractor condition than in the target condition (see also Figure 2C for a similar example). Cognitive effects of target processing on the P300 component of the FRP (Kamienkowski et al., 2012) will therefore be confounded with trivial condition differences produced by the change in overlap. In other words, any difference in mean fixation duration between conditions will produce spurious differences in the EEG, even if the real underlying brain activity is the same in both conditions. Furthermore, if the duration of the pre-target fixation differs between conditions (e.g. because of the extrafoveal preprocessing of the target), these distortions will also affect early parts of the FRP waveform, including the baseline interval before fixation onset. The confounding effect of overlapping potentials is also illustrated in Figure 2 for a simulated experiment with cars and faces.
A second, but frequently overlooked type of overlap is that between stimulus-onset ERPs and FRPs. In most free viewing experiments, a single visual stimulus – for example a search array, a sentence, or a scene – is presented at the beginning of each trial. The fixation-related potentials therefore not only overlap with each other, but also with this stimulus-ERP, which is often strong and long-lasting (cf. Figure 1E in Dimigen et al., 2011). This means that fixations that happen early and late during a trial differ systematically in terms of their baseline activity and cannot be directly compared to each other. In practice, this type of overlap can be just as problematic as that from neighboring fixations (e.g. Coco, Nuthmann, & Dimigen, 2018).
Finally, overlapping potentials are also relevant in traditional EEG experiments in which the stimulus-ERP is the signal of interest, and eye movements are just a confound. In particular, potentials from involuntary (micro)saccades executed during the trial have been shown to distort stimulus-locked EEG analyses in the frequency (Yuval-Greenberg, Tomer, Keren, Nelken, & Deouell, 2008) and time domain (Dimigen, Valsecchi, Sommer, & Kliegl, 2009). We therefore need effective methods to disentangle overlapping activity from multiple events.
Several workarounds to the overlap problem have been proposed that center on data selection. One simple approach to reduce overlap is to analyze only long fixations with a minimum duration (e.g. > 500 ms) (e.g. Brouwer et al., 2013; Kaunitz et al., 2014; Kornrumpf, Niefind, Sommer, & Dimigen, 2016); another is to analyze only the first or last fixation in a sequence of fixations, which eliminates overlapping activity from the preceding and subsequent fixation, respectively (e.g. Hutzler et al., 2007). Of course, solutions like these are not optimal because they either exclude a large portion of the data or place strong constraints on the possible experimental designs.
A stark improvement is found in the deconvolution approach that was first used for EEG analyses in the 1980s (Eysholdt & Schreiner, 1982; Hansen, 1983). Here, the measured continuous EEG signal is understood as the convolution of the experiment events (i.e., a vector which contains impulses at the latencies of the experimental events) with the isolated brain responses generated by each type of event (as illustrated in Figure 2B). The inverse operation is deconvolution, which recovers the unknown isolated brain responses given only the measured (convolved) EEG signal and the latencies of the experimental events (Figure 2G; Ehinger & Dimigen, 2018). Deconvolution is only possible because the events show different temporal overlap with each other; conversely, without any jitter (i.e., with a constant stimulus-onset asynchrony or SOA), the responses would be inseparable because it is ambiguous whether the observed activity was evoked by the current event or the preceding event. This can be achieved by experimentally adding jitter to event onset times, or it happens naturally as fixation durations vary between fixations. Such temporal variability allows us to recover the unknown isolated responses, under two assumptions: (1) the brain signals evoked by different events add up linearly and (2) the degree of temporal overlap between the events does not change the processing in the brain itself and the neural response evoked by each event. The first assumption is met due to the linear summation of electrical fields (Nunez & Srinivasan, 2006). The second assumption – that the underlying brain responses are the same, regardless of the amount of overlap in the experiment – is likely incorrect, but in practice still a useful approximation (see Discussion).
Early iterative deconvolution techniques for EEG (in particular the ADJAR algorithm, Woldorff, 1993) have proven difficult to converge (Kristensen, Rivet, & Guérin-Dugué, 2017; Talsma & Woldorff, 2004), do not allow to simultaneously control for the influences of continuous covariates on the EEG, or were designed for specialized applications (Ouyang et al., 2011). More recently, a linear deconvolution method based on the least-squares estimation was successfully applied to solve the overlap problems in EEG (Burns, Bigdely-Shamlo, Smith, Kreutz-Delgado, & Makeig, 2013; Cornelissen, Sassenhagen, & Võ, 2019; Dandekar, Privitera, Carney, & Klein, 2012; Guérin-Dugué et al., 2018; Kristensen, Guerin-Dugué, & Rivet, 2017; Kristensen, Rivet, et al., 2017; Litvak, Jha, Flandin, & Friston, 2013; Lütkenhöner, 2010; Sassenhagen, 2018; N. J. Smith & Kutas, 2015b; Spitzer, Blankenburg, & Summerfield, 2016). This approach, initially applied to fMRI (e.g. Dale & Buckner, 1997; Glover, 1999), has crucial advantages over the previous, often iterative approaches. The properties of the linear model are not only very well understood, but the embedding of deconvolution in the linear model allows for multiple regression, meaning that many different event types (like stimulus onsets, fixation onsets, and button presses) can be modeled within the same model together with continuous covariates. We provide a non-mathematical review of this linear deconvolution approach in the Tutorial Review section further below.
Low-level covariates influencing eye movement-related responses
After adequate correction for overlap, only the fourth, serious problem remains: the massive influence of visual and oculomotor low-level variables on the shape of the saccade- or fixation-related brain responses. As an example, consider the lambda response, the dominant visually-evoked component of the FRP that peaks at occipital scalp sites at around 80-100 ms after fixation onset. The lambda response is the equivalent of the P1 in FRPs, and like the P1, it is generated in striate and/or extrastriate visual cortex (Dimigen et al., 2009; Kazai & Yagi, 2003). Similarly to much of the rest of the FRP waveform (see also Figure 6C further below), the lambda response is strongly influenced by the size of the incoming saccade (Armington & Bloom, 1974; Dandekar, Privitera, et al., 2012; Dimigen et al., 2011; Ries, Slayback, & Touryan, 2018b; Thickbroom, Knezevic, Carroll, & Mastaglia, 1991). If saccade amplitudes differ between conditions, an analogous problem to the previously discussed fixation duration bias will occur – the experimental condition with larger saccades will also have larger lambda responses. Increasing saccade amplitude also increases the amplitude of the pre-saccadic cortical motor potentials that ramp up slowly before eye movement onset in saccade-onset locked ERPs (Becker, Hoehne, Iwase, & Kornhuber, 1972; Everling, Krappmann, & Flohr, 1997; Herdman & Ryan, 2007; Richards, 2003). Due to these premotor potentials, saccade amplitude can also affect the typical baseline interval for the FRP (Nikolaev, Jurica, Nakatani, Plomp, & Leeuwen, 2013; Nikolaev et al., 2016).
Other visual and oculomotor covariates will introduce similar biases. For example, both the stimulus features in the currently foveated image region (Dimigen, Sommer, & Kliegl, 2013; Gaarder, Krauskopf, Graf, Kropfl, & Armington, 1964; Kristensen, Rivet, et al., 2017; Ossandon, Helo, Montefusco-Siegmund, & Maldonado, 2010; Ries et al., 2018b), the fixation location on the screen (Cornelissen et al., 2019; Dimigen et al., 2013), and the angle of the incoming saccade (Cornelissen et al., 2019; Meyberg, Werkle-Bergner, Sommer, & Dimigen, 2015; see also the results in this paper) modulate the FRP waveshape. It is therefore reasonable to conclude that most existing FRP results (including our own) are confounded to some degree, since they did not fully control for overlap and low-level covariates.
Fixation matching: Limitations and problems
One proposed method to partially control for these confounding factors is post-hoc fixation matching (Dias, Sajda, Dmochowski, & Parra, 2013; Dimigen et al., 2011; Kamienkowski et al., 2012; Luo, Parra, & Sajda, 2010; Nikolaev et al., 2016). The underlying idea is simple: After the experiment, the researcher selects those fixations from each experimental condition that are the most similar in terms of overlap and a few of the most important visuomotor covariates (e.g. incoming saccade amplitude) by selecting them based on their distance to each other in this multi-dimensional feature space (e.g. based on the Mahalanobis distance, Nikolaev et al., 2016). These matched subsets of fixations are then compared, while the remaining fixations are discarded. After matching, the oculomotor covariates are as similar as possible across conditions and all conditions are affected by overlap to a similar degree. To describe it differently, with fixation matching, we attempt to convert a quasi-experimental, naturalistic situation (free viewing) back into an orthogonal, well-controlled experiment.
Matching procedures are relatively easy to implement and if the durations of the fixations in the different conditions are also matched, they also address the overlap problem. However, there are also several limitations to this method. First, there is a loss of data due to fixations that cannot be matched. Second, the number of covariates that can be simultaneously matched under practical conditions is limited and likely smaller than the number of variables that are already known to modulate the FRP. In particular, variables shown to affect the waveform include the durations of the preceding and current fixation, the temporal overlap with the stimulus-onset ERP (Coco et al., 2018), the amplitude (e.g., Gaarder et al., 1964; Thickbroom et al., 1991) and direction (Cornelissen et al., 2019; Meyberg et al., 2015) of the incoming saccade, the fixated region of the screen (Dimigen et al., 2013), and local image properties at the foveated spot, such as luminance (Dimigen et al., 2013; Gaarder et al., 1964; Kristensen, Rivet, et al., 2017; Ossandon et al., 2010) and spatial frequency (Ries et al., 2018b). In our experience, in a task like sentence reading, it is rarely possible to match more than two to three covariates, at least if these covariates show sizeable condition differences in the first place. Third, matching approaches are limited to simple factorial designs; it is hard to imagine how a matching procedure would work if the independent variable manipulated in the experiment (e.g. the saliency of an image region) would be continuous rather than categorical in nature. Fourth, currently proposed algorithms for fixation matching are based on predefined thresholds or null-hypothesis testing. In the latter case, it is assumed that there is no longer a difference in the mean of the covariates if the difference is not statistically significant anymore. However, a non-significant difference between the covariates after matching does not mean that the null hypothesis is correct (Sassenhagen & Alday, 2016).
Finally, it is possible that the matching of saccade or fixation properties reduces the actual psychological effect in the data. For example, once fixation durations are matched, we are comparing the neural correlates of two pools of fixations that on average did not differ in terms of the behavioral outcome. In contrast, the fixations that may be the psychologically most relevant ones that contribute the most to the behavioral effect – those at the tail of the distribution – tend to be eliminated from the FRP by the matching. Discarding these fixations might therefore lead to more false negative findings.
Towards a unified model based on linear deconvolution and spline-regression
Based on these problems, it is clear that another solution needs to be found. Instead of selecting fixations, we need to correct for the aforementioned effects. One tool to account for these confounds is multiple linear regression with continuous regressors. The approach, called mass-univariate linear modeling, is already being frequently applied to traditional EEG datasets (Amsel, 2011; Hauk, Davis, Ford, Pulvermüller, & Marslen-Wilson, 2006; Pernet, Chauveau, Gaspar, & Rousselet, 2011; Rousselet, Pernet, Bennett, & Sekuler, 2008; N. J. Smith & Kutas, 2015a) and more recently also to account for linear influences of saccade parameters on FRPs (Weiss, Knakker, & Vidnyánszky, 2016). Importantly, the linear modeling of covariates has also recently been shown to combine easily with the linear deconvolution approach introduced above, both for ERPs (N. J. Smith & Kutas, 2015b) and FRPs (Cornelissen et al., 2019; Ehinger & Dimigen, 2018; Guérin-Dugué et al., 2018; Kristensen, Rivet, et al., 2017).
A fact that complicates the problem even further is that the relationship between saccade properties and the FRP is often nonlinear. For example, Dandekar et al. (2012) and Ries et al. (2018b) found that with increasing saccade amplitude, the lambda response increases in a nonlinear fashion. Boylan and Doig (1989) reported a similar nonlinear relationship for the saccadic spike potential, the burst of eye muscle activity at saccade onset (but see also Keren et al., 2010). As we will confirm below, the influence of some oculomotor covariates on FRPs is indeed highly nonlinear. Ignoring these nonlinear relations and modeling the data by a simple linear predictor can therefore produce suboptimal results and bias the results in an arbitrarily way (e.g. Tremblay and Newman, 2015).
Fortunately, due to the flexible nature of the linear model, it is also possible to model nonlinear relationships within this framework. For this purpose, the linear model is augmented by spline regression, as used in the generalized additive model (GAM, Wood, 2017). Recently, spline regression has been applied to ERPs (Hendrix, Baayen, & Bolger, 2017; Kryuchkova, Tucker, Wurm, & Baayen, 2012; Tremblay, A., & Baayen, 2010; Tremblay & Newman, 2015) and also to FRPs (Van Humbeeck, Meghanathan, Wagemans, van Leeuwen, & Nikolaev, 2018). In this paper, we demonstrate how spline regression can be combined with deconvolution to control for nonlinear influences of some predictors (exemplified here for saccade amplitude and saccade direction) during the free viewing of pictures.
Current paper
Combining the ideas presented above, we propose that the combination of linear deconvolution with nonlinear spline regression (GAM) can solve both of the remaining problems, overlap and confound control. In the remainder of this paper, we will first describe both methods on an intuitive level by building up a model for a typical free viewing experiment, step by step. To illustrate the advantages of this approach on real data, we then use the recently introduced unfold toolbox (http://www.unfoldtoolbox.org; Ehinger & Dimigen, 2018) to analyze combined eye-tracking/EEG from three paradigms: face recognition, visual search in scenes, and reading.
(NON)LINEAR DECONVOLUTION: A TUTORIAL REVIEW
In this section, we first review the basic principles of deconvolution modeling within the linear regression framework. Afterwards, we will outline how deconvolution combines with the concept of spline predictors to model nonlinear effects. Introductions to linear models, regression-based ERPs (rERP), and deconvolution within the rERP framework can be found in Smith and Kutas (2015a, 2015b), Ehinger & Dimigen (2018), and Sassenhagen (2018). Recent applications to FRPs are found in Dandekar et al. (2012), Kristensen et al. (2017b), Guérin-Dugué et al. (2018), Coco et al. (2018), Burwell et al. (2019) and Cornelissen et al. (2019). A more technical description of some of the steps summarized below is provided in Ehinger & Dimigen (2018). In the following, we restrict ourselves to a description of the general principles in a less technical and more intuitive fashion.
Deconvolution in the Linear Model
Linear modeling of the EEG
Before we introduce linear deconvolution, let us first look at another common way that multiple regression is applied to EEG: Mass-univariate linear modeling (Pernet et al., 2011). Figure 3A illustrates this approach for a simple experiment with two conditions, but the concept can be generalized to arbitrary designs. In a first step, we cut the data into epochs around the onsets of experimental events, for example fixations (e.g. Weiss et al., 2016). For each of the n time points in the epoch, we then fit a separate regression model, which tries to explain the observed data over trials at the given time point t:
The same model can also be written in matrix notation:
Here, X is the design matrix. Each of its rows represents one observation (e.g. one fixation onset on a target object) and each of its columns represents one predictor and its value in the respective trial (e.g., the type of object fixated and the incoming saccade size). b is a vector of to-be-estimated parameters (regression coefficients or betas) that we wish to estimate and e is the error term, a vector of residuals. For each time point relative to the event, we run the model again and estimate the respective betas. The result is a series of betas for each predictor, also called a regression-ERP (rERP), that can be plotted against time just like a traditional ERP waveform. Such mass-univariate models has been successfully applied to ERP studies in which many covariates affect the neural response (see also Amsel, 2011; Hauk et al., 2006; Rousselet et al., 2008), but they cannot account for varying temporal overlap between neighboring events. In other words, if overlap differs between conditions, this approach will produce biased estimates.
Deconvolution within the linear model
The linear deconvolution approach – as illustrated in panels B, C, and D of Figure 2 – directly addresses this issue of overlapping brain responses evoked by subsequent events. As an example, consider Figures 3B and 3C, which show a small part of a continuous EEG recording. We can see that the EEG recorded at sample 25 of the experiment (grey dashed box) is the sum of the responses to three different events: the early part of the brain response to an event of type A (also occurring at this sample), the late part of the response to an event of type B (which had occurred four samples earlier) and the late part of the response to a second event of type A (which had occurred five samples earlier). During free viewing, the temporal overlap with neighboring fixation events is slightly different for each fixation. Additionally, in many experiments, participants look at different categories of stimuli during each trial, leading to variable event sequences. For example, in the hypothetical car/face experiment depicted in Figure 1, fixations on cars are sometimes followed by a fixation on a face and sometimes by a fixation on another car. Due to both sources of variability – in terms of temporal overlap, the sequence of events, or both – it is possible to recover the non-overlapped signals in a regression model.
One property that distinguishes linear deconvolution from the mass-univariate approach is that the input EEG data needs to be continuous rather than epoched. This is due to the fact that for a correct estimation, we need to consider the temporal relationship between all event-related responses that happened close to each other in time. If we would cut the EEG into short epochs, the epochs would likely not contain all of the preceding and following events that also influenced the signal. If we would instead cut the EEG into very long epochs, the epochs would start to overlap with each other, meaning that some data points would enter the model multiple times, biasing the estimation. For these reasons, we need to model the continuous EEG (after it was preprocessed and eye movement markers were added, for example with the EYE-EEG toolbox).
We set up the deconvolution model by generating a new design matrix Xdc (where dc stands for deconvolution), which spans all samples of the continuous EEG recording (Figure 3C). Like in the mass univariate model (cf. Figure 3A), the columns of this design matrix code the condition of the events in our model (e.g. “Is this an event of type A?”). In order to explain linear deconvolution, we need to introduce the concept of local time (τ), which describes the time (in samples) relative to the onset of a given type of event. In the simplified example of Figure 3, we model just the n = 5 time points following each event (from τ = 1 to τ = 5 after event onset). In a realistic scenario, one would model a couple of hundred sampling points before and after the fixation onset (e.g. in a time window from -200 to 800 ms). The time range should be chosen so that it captures the entire fixation-related EEG response, including oculomotor potentials that reflect saccade planning and precede fixation onset (e.g. Becker et al., 1972; Everling et al., 1997; Herdman & Ryan, 2007; Richards, 2003). To set up the deconvolution model, we then have to add n = 5 new predictors to the model per event. These predictors will directly model the time course of the event-related response; a process that we will call time-expansion and explain in the following (Figure 3C):
The first n columns in the new design matrix Xdc belong to the predictor A. The first column codes the first time point after that type of event occurred (τ = 1). This column will be 0 at every sample, except at those latencies of the EEG when an event of that type occurred; there, we set it to 1. The second column codes the second time point after the event, τ = 2. This column will get a 1 at all continuous EEG samples that were recorded one sample after an event of type A and zero everywhere else. We repeat this process for all five sampling points relative to each event A. Afterwards, we repeat the same for all occurrences of event B, adding five more columns to the design matrix.
If we look at the expanded design matrix produced by this process (Figure 3C), we see diagonal staircase-like patterns. We can now also immediately see which parts of the individual evoked responses contribute to the observed continuous EEG signal, as highlighted here for sample number 25. The resulting time-expanded design matrix Xdc is large; it has as many rows as there are samples in the EEG. The number of its columns is given by the number of predictors (here: 2) multiplied by the number of modeled time points (here: n = 5). Thus, to solve the regression model, we need to solve a large linear equation system. Fortunately, this computationally difficult problem can nowadays be solved efficiently with modern algorithms for solving sparse matrices (time-expansion produces a very sparse matrix, i.e. a matrix mostly filled with zeros, as illustrated in Figure 3C). In summary, the regression formula changes little from the mass univariate model to the deconvolution model:
Solving this formula for b results in n betas for each event type/predictor, one for every time point modeled (see Figure 3C, right side). However, in contrast to the mass univariate model, we do not need to calculate a separate model for each time point; instead, all betas for a given EEG channel are returned in just one fit of the model. This time series of betas, or rERP, represent the non-overlapping brain responses for each predictor. Like traditional ERP waveforms, rERPs can be visually inspected, plotted as waveforms or scalp topographies, entered into a dipole analysis, and compared statistically (N. J. Smith & Kutas, 2015a, 2015b). Furthermore, because the estimation is linear, linear operations like a baseline correction can also be applied to the rERPs after the deconvolution (N. J. Smith & Kutas, 2015b).
Modeling nonlinear effects
As explained above, some predictors have nonlinear influences on the EEG. As we will show in the empirical part of this paper, considering nonlinear effects is especially important in free viewing experiments. In the linear regression framework, nonlinear effects can be accounted for in different ways. Common approaches are to transform individual predictors (e.g. via a log transform) or to include higher-order terms in the model (e.g. quadratic and cubic terms; polynomial regression). However, these approaches have drawbacks. The transformation of individual predictors (such as saccade amplitude) necessitates a-priori knowledge about the correct shape of the relationship, but for ERPs/FRPs, this shape is often unknown. For more complex relationships, e.g. for circular predictors with non-sinusoidal shapes, it can be difficult to find a good transformation function without resorting to an inefficient Fourier set. By using a polynomial regression one could in principle fit any arbitrary relationship, but in practice one often observes oscillatory patterns (Runge, 1901). These patterns occur because each additional term added to the polynomial acts on the entire range of the predictor, that is, it affects the fit at all saccade amplitudes, rather than just locally.
A more flexible option is spline regression, a technique also commonly summarized under the name GAM (Wood, 2017). Splines can be understood as local smoothing functions (Figure 4) and they have the advantage that they are defined locally, that is over only a short range of the continuous predictor (e.g. just for saccade amplitudes between 2° and 4°). This solves the problem of oscillatory patterns and makes the predictors easily interpretable and less dependent on the exact fit of the other parameters. The following section attempts a brief and intuitive overview of spline regression.
Figure 4A shows a hypothetical nonlinear relationship between a predictor and the EEG. As an example, the predictor might be saccade amplitude and we see that the difference between a saccade of 1° and 2° has a much larger influence on the amplitude of the fixation-related lambda response than the difference between a 11° and 12° saccade. Obviously, a linear function would not fit this data well. An alternative way to model this relationship is to represent the continuous predictor, that is, the independent variable, by a set of basis functions (Figure 4B to 4E). One simple way to do this is to split up the range of the independent variable into multiple distinct bins (as used by Dandekar, Privitera, et al., 2012).
In the regression model, such a basis set is implemented by adding one additional column to the design matrix for each bin or basis function. For example, if we split the values of the saccade amplitude into six bins (Figure 4B), we would add five columns plus the intercept to the design matrix X. The independent variable now covers several columns and each column models a certain range of possible saccade amplitudes (e.g. 0-2°, 2-4°, 4-6°, …). When we solve the model, we estimate the beta weights for each basis function. As Figure 4D shows, this produces a better fit to the data as it captures the nonlinear relationship (Dandekar et al., 2012b). However, it also produces abrupt jumps between category borders, which can decrease statistical power and increase type-1 errors (Austin & Brunner, 2004). So instead, it is strongly recommended to keep the continuous predictors continuous (Altman & Royston, 2006; Bennette & Vickers, 2012; Collins, Ogundimu, Cook, Manach, & Altman, 2016; Royston, Altman, & Sauerbrei, 2006).
A better alternative, illustrated in panels D and E of Figure 4, is to use spline functions as a basis set. This is conceptually similar to the categorization approach but makes use of the assumption that the fit should be smooth. Instead of defining distinct bins, we cover the range of possible values of the independent variable with a number of spline functions that overlap with each other, as shown in Figure 4D (how exactly the spline set is constructed is outside the scope of this paper, but the interested reader is referred to Wood, 2017).
If we now evaluate this set of splines at a given value of the independent variable (e.g. for a saccade of 3.1°), we obtain non-zero values not just for a single predictor, but for several neighboring spline functions as well (e.g. three functions are non-zero in Figure 3D). The different splines will have different strengths as they have different amounts of overlap with this value. When we solve the model, the spline functions are again weighted by their respective beta coefficients and summed up to obtain the modeled values. In contrast to the categorization approach, the result is a smooth, nonlinear fit (Figure 4E). In practice, however, we are not interested in the betas for each of the individual spline functions; instead, we want to evaluate the whole set of overlapping splines at certain values of the independent variable. For example, to visualize the effect of saccade amplitude on FRPs (cf. Figure 6C), we might want to evaluate the spline set, weighted by the previously estimated betas, at several saccade sizes of interest (e.g. 0.6°, 5°, 10°, and 15°).
Analyzing a free viewing dataset with the unfold toolbox
In the following, we will briefly go through the practical steps to run and analyze a (non)linear deconvolution model, using the unfold toolbox and the simulated example from Figure 2. Parts of the following description are adapted from our toolbox paper (Ehinger & Dimigen, 2018). For a detailed technical documentation of the toolbox and its features, we refer the reader to that paper.
To begin, we need a continuous EEG dataset in the EEGLAB format (Delorme & Makeig, 2004) that also contains event markers (i.e. triggers) for the experimental events of interest (e.g. stimulus onsets). For free viewing experiments, we need additional events that code the onsets of saccade and/or fixations as well as the respective properties of these eye movements (e.g. amplitude and direction of saccades, duration and location of fixations, the type of object fixated). With existing software like the EYE-EEG toolbox, such eye movement events can be easily imported or detected in the synchronized eye tracking data. In most cases, the EEG data that we wish to analyze should have already been preprocessed (e.g. filtered) and corrected for ocular artifacts, for example with the eye tracker-guided ICA procedures (Dimigen, 2018; Plöchl et al., 2012) also implemented in EYE-EEG.
We then start the modeling process by writing down the model formula, which defines the design matrix X of the regression model. In the unfold toolbox, models are specified using the common Wilkinson notation (Wilkinson & Rogers, 1973) that is also used in other statistics software like R. Using this notation, we might define the following model for the hypothetical free viewing experiment depicted in Figure 2, during which participants looked at cars and faces:
Here, the FRP is modeled by an intercept term (1) that describes the overall waveform, by a categorical variable (or factor) is_car that codes whether the currently looked at object is a car (1) or a face (0), and by a continuous linear predictor that codes the amplitude of the saccade that precedes fixation onset. It is also possible to define interactions (e.g., is_car * sacc_amplitude).
As explained earlier, it is unrealistic to always assume a linear influence of oculomotor behavior on the EEG. We can therefore relax this assumption and model saccade amplitude as a nonlinear predictor. With the following formula, the effect of saccade amplitude would be modeled by a basis set consisting of six splines1:
In the same model, we can simultaneously model brain responses evoked by other experimental events, such as stimulus onsets or button presses. Each of these other event types can be modeled by their own formula. For example, in the car/face task, it would be important to also model the stimulus-ERP that is elicited by the onset of the car/face display on the screen. Otherwise this stimulus-long ERP will distort the baseline intervals of the following FRPs (Dimigen et al., 2011). This issue is crucial in experiments in which stimuli belonging to different conditions are fixated at slightly different average latencies after stimulus onset (e.g. Coco et al., 2018). For example, if the first fixation in a trial is aimed more often at a face than at a car, the face-FRP will be distorted differently by the overlapping stimulus-locked waveform that the car-ERP. Fortunately, the ERP evoked by the stimulus presentation can be simply accounted for by adding an additional intercept model for all stimulus events. In this way, it will be removed from the estimation of the FRPs. The complete model would then be:
Once the formulas are defined, the design matrix X is time-expanded to Xdc and now spans the duration of the entire EEG recording. Subsequently, the equation (EEG = Xdc * b + e) is solved for b, the betas. This is done for each channel separately. The resulting regression coefficients, or betas, correspond to the subject-level ERP waveforms in a traditional ERP analysis (Smith & Kutas, 2015a). For example, in the model above, for which we used treatment coding, the intercept term of the FRP corresponds to the average FRP elicited by a car fixation at the subject level. The other betas, for cat(is_car), will capture the partial effect of that particular predictor – here the effect of fixating a car rather than a face – and therefore corresponds to a difference wave in a traditional ERP analysis (here: car-FRP minus face-FRP). For data visualization or for second-level statistical analyses at the group level, these regression-based waveforms can therefore be treated just like any other subject-level ERP. In the following, we will apply this approach to real datasets.
EXPERIMENTAL METHODS
The empirical part of this paper will demonstrate the possibilities and advantages of (non)linear deconvolution modeling for analyzing free viewing experiments. For this, we will use co-registered data from three commonly studied paradigms: face recognition, scene viewing, and natural sentence reading. The following section briefly summarizes the experimental methods that are common to all three experiments.
Participants
Participants in all three experiments were young adults, mostly psychology students, with normal or corrected-to-normal visual acuity (verified using Bach, 2006). Different participants took part in the three studies. Experiments were conducted in compliance with the declaration of Helsinki (2008) and participants provided written informed consent before participation.
Apparatus & Eye-Tracking
All datasets were recorded in an electromagnetically shielded laboratory at Humboldt-University using identical eye-tracking, EEG, and stimulation hardware. In all experiments, stimuli were presented at a viewing distance of 60 cm on a 22-inch CRT monitor (Iiyama Vision Master Pro 510, resolution 1024 × 768 pixels, vertical refresh 100 or 160 Hz depending on the experiment). Binocular eye movements were recorded at a rate of 500 Hz with a table-mounted eye tracker (IView-X Hi-Speed, SMI GmbH) that was frequently (re-)calibrated using a 9-point or 13-point grid and validated on a 4-point grid. Stimulus presentation and recordings were controlled by Presentation software (Neurobehavioral Systems, Albany, CA). Saccades and fixations were detected offline using the algorithm of Engbert & Kliegl (2003) as implemented in the EYE-EEG toolbox.
Electrophysiological recordings
Electrophysiological signals were recorded from either 46 (Face and Scene experiment) or 64 (Reading experiment) Ag/AgCl electrodes. EEG electrodes were mounted in a textile cap at standard 10-10 system positions. Electro-oculogram (EOG) electrodes were positioned at the outer canthus and infraorbital ridge of each eye. Data were amplified with BrainAmp amplifiers (Brain Products GmbH) and digitized at 500 Hz, with electrode impedances kept < 5 kΩ. Electrodes were initially referenced against an electrode on the left mastoid bone, but digitally re-referenced to an average reference. The Face and Scene data was acquired with a time constant of 10 s, whereas the Reading data was acquired as DC data. Offline, data of all experiments were high-pass filtered with the cutoff (−6 dB) set to 0.1 Hz using EEGLAB’s finite response windowed sinc filter with default settings. Datasets were also low-pass filtered at 40 Hz (Reading and Scene experiment) or 100 Hz (Face experiment) using the same function.
EEG and eye-tracking were synchronized offline using EYE-EEG (version 0.81) based on shared trigger pulses sent on each trial from the presentation computer to the two computers recording EEG and eye movements. The mean synchronization error of the two time series was < 1 ms, computed based on the trigger alignment. Proper signal synchronization was additionally verified by computing the cross-correlating function between the horizontal gaze position signal and the horizontal bipolar electrooculogram which consistently peaked at or near lag zero (function checksync.m in EYE-EEG).
EEG data from the Scene experiment were corrected for ocular artifacts using the Infomax ICA algorithm that was trained on band-pass filtered training data (Dimigen, 2018). Ocular components were then removed using the eye-tracker-guided selection method proposed by Plöchl et al., 2012 (variance ratio threshold: 1.1). Data for the reading experiment were artifact-corrected using Multiple-Source Eye Correction (Berg & Scherg, 1994) as implemented in BESA (Besa GmbH). Data of the Face experiment were not corrected for ocular artifacts, because there was a central fixation instruction during this experiment and the data therefore only contained comparatively small saccades. Instead, for this dataset, artifact-contaminated intervals were identified offline by moving a window with a length of 2000 ms in steps of 100 ms across the continuous recording. Whenever this window contained a peak- to-peak voltage difference > 150 µV at any channel, the corresponding EEG interval was removed from the deconvolution model. This is accomplished by setting all columns of the time-expanded design matrix (Xdc) to zero for these “bad” time intervals; this way, they will be ignored in the regression model.
Statistics
Statistical comparisons in all experiments were performed using the threshold-free cluster-enhancement method (Mensen & Khatami, 2013; S. M. Smith & Nichols, 2009), a data-driven permutation test that controls for multiple comparisons across time points and channels (using at least n = 1000 random permutations).
EXPERIMENT 1: FACE PERCEPTION
In typical EEG experiments, participants are instructed to avoid eye movements. Yet, high-resolution eye-tracking reveals that even during attempted fixation, the eyes are not completely motionless, but frequently perform microsaccades, small involuntary eye movements with a typical amplitude of less than 1° (Rolfs, 2009). Depending on the task, these microsaccades are often found together with small exploratory saccades that are aimed at informative regions of the stimulus, for example the eye region of a face. In the following, we will therefore refer to both kinds of small eye movements simply as “miniature saccades” (Yuval-Greenberg et al., 2008).
Previous co-registration studies have shown that despite their small size, miniature saccades can generate sizable eye muscle (Yuval-Greenberg & Deouell, 2009) and brain potentials (Dimigen et al., 2009; Gaarder et al., 1964; Meyberg et al., 2015) in EEG. Furthermore, because the amplitude and rate of miniature saccades often differs systematically between experimental conditions (Rolfs, 2009), these additional signals can seriously distort stimulus-locked analyses in the time and frequency domain (Dimigen et al., 2009; Yuval-Greenberg et al., 2008). In the following, we will demonstrate how deconvolution and spline regression can be used to control for the effects of miniature saccades and improve the data quality even in a standard ERP experiment in which participants were told to maintain fixation.
Participants
Twelve participants took part in the study. Here we analyze the data of 10 participants (19-35 years old, 8 female), because the data of two participants could not be synchronized across the whole recording. Single-subject data from one participant of this study was also shown in Ehinger & Dimigen (2018).
Methods
During this experiment, previously described in the supplement of Dimigen et al. (2009), participants saw 480 pictures of human faces (7.5° × 8.5°) with a happy, neutral, or angry facial expression. The participants’ task was to classify the face’s emotional expression using three manual response buttons. At the start of each trial, a small (0.26°) central fixation cross appeared for 1000 ms. It was then replaced by the face for 1350 ms. Afterwards, the fixation cross re-appeared. Before the experiment, participants received written instruction to keep their fixation at the screen center for the duration of each trial.
We synchronized the EEG and eye-tracking data and then specified the following model in the unfold toolbox: The formula for the stimulus onset events was ERP ~ 1, meaning that the stimulus-ERP was modeled by a constant intercept term (the factor “emotion” was ignored for the current analysis). Potentials from miniature saccades were modeled by the formula Saccade-ERP ~ 1 + spl(sacc_amplitude,6), that is, by a constant intercept term and by a predictor coding the saccade amplitude. Because the effect of saccade amplitude on the post-saccadic brain response is nonlinear, this effect was modeled here by six splines. Neural responses were estimated in the time window between -500 and 800 s around each stimulus/saccade event.
Results & Discussion
Results are shown in Figure 5. The eye-tracking data revealed that participants made at least one miniature saccade in the vast majority (99%) of trials. With a median amplitude of 1.5° (Figure 5B), most of these saccades were not genuine microsaccades, but rather small exploratory saccades that were aimed at the eyes or at the mouth region, two parts of the face that were informative for the emotion classification task (see Figure 5A).
The histogram in the lower part of Figure 5C shows that the rate of miniature saccades reached a maximum around 240 ms after stimulus onset. Each miniature saccade elicits its own visually-evoked lambda response (Dimigen et al., 2009) which peaks around 110 ms after saccade onset. Therefore, we would expect an impact of saccades on the stimulus-ERP waveform that begins around 350 ms (i.e. 240 + 110 ms) after stimulus onset. Indeed, if we compare the stimulus-ERP with and without deconvolution at the occipital electrode Oz (panels C versus E), we see a positive shift in the uncorrected signal that starts around 350 ms and continues until the end of the analysis time window.
Figure 5D shows the saccade-ERP, again with and without deconvolution. As expected, the saccadic response was also changed by the deconvolution, because we removed from it the overlapping influences of the stimulus-ERP as well as those other miniature saccades. Similar results have recently also been reported in Kristensen et al. (2017 their Figure 6).
This example shows how linear deconvolution can separate eye movement-related potentials from stimulus-locked activity to obtain an unbiased version of the stimulus-ERP not contaminated by saccade evoked activity. Deconvolution is especially important in experiments in which the rate, direction, or amplitude of saccades differs between conditions (Dimigen et al., 2009; Engbert & Kliegl, 2003; Meyberg, Sommer, & Dimigen, 2017; Yuval-Greenberg et al., 2008). However, even in cases where oculomotor behavior is the same in all conditions, the signal-to-noise ratio of the stimulus-locked ERP should improve after removing the brain-signal variance produced by miniature saccades. Another major advantage compared to traditional averaging approaches is that we also obtain a clean, unbiased version of the (micro)saccade-related brain potentials in the task. This is interesting, since brain potentials from microsaccades have been shown to carry some valuable information about the participant’s attentional state (Meyberg et al., 2015). With deconvolution, we can now mine these additional brain responses to learn more about the participant’s attentional or cognitive state.
EXPERIMENT 2: SCENE VIEWING
Next, we will model fixation-related activity during natural scene viewing. As explained in the Introduction, the properties of eye movement-related brain potentials are not fully understood. What is clear, however, is that in addition to local stimulus features, saccade properties strongly influence then neural response following fixation onset (Armington and Bloom, 1974; Thickbroom et al., 1991). This means that even a slight mismatch in oculomotor behavior between two conditions will produce spurious differences between the respective brain signals. Fortunately, linear deconvolution modeling can simultaneously control for overlapping potentials and the effects of numerous oculomotor (e.g. saccade size) and visual (e.g. foveal image luminance) low-level variables.
In the following example, we model FRPs from a visual search task on natural scenes. For the sake of clarity, in this example we focus on the results of only two oculomotor variables: saccade amplitude and direction. These variables are highlighted here, because as we will show below, they both have clearly nonlinear effects on the FRP. Previous findings already indicate a nonlinear influence of saccade amplitude on FRPs (Armington & Bloom, 1974; Dandekar, Privitera, et al., 2012; Dimigen et al., 2009; Ries et al., 2018b; Thickbroom et al., 1991). For example, when executed on a very high-contrast background, microsaccades of just 0.3° generate a lambda responses (P1) that is not much smaller than that following a 4.5° saccade (cf. Figure 2 in Dimigen et al., 2009). Results like these suggest that it is insufficient to model saccade amplitude as a linear predictor. Effects of saccade direction on the post-saccadic neural response have only been recently described (Meyberg et al., 2015, Cornelissen et al., 2019). As we show below, the direction of the preceding saccade has a significant and nonlinear effect on the FRP during scene viewing.
Participants
Ten young adults (age: 19-29 years, 4 female) with normal or corrected-to-normal visual acuity participated in the study.
Methods
In the scene viewing experiment, participants searched for a target item hidden within images of natural scenes. Scenes consisted of grayscale versions of the first 33 images of the Zurich Natural Image Database (Einhäuser, Kruse, Hoffmann, & König, 2006), a collection of photographs taken in a forest (see Figure 6A for an example). The scenes spanned 28.8° × 21.6° of visual angle (800 × 600 pixels) and were centered on an otherwise empty black screen (resolution: 1024 × 768 pixel). During each trial one image was shown. The participant’s task was to find a small dark gray dot (0.4 cd/m2) that appeared at a random location within the scene at a random latency 8-16 seconds after scene onset. At first, the dot appeared with a diameter of just 0.07°, but then gradually increased in size over the course of several seconds. Once the participant found the target, they pressed a button, which terminated the trial. A full manuscript on this dataset is currently in preparation.
We modeled two types of events: the stimulus onset at the beginning of each trial and the fixation onsets. The models we specified were:
For the stimulus-ERP we included only an intercept term, which captures the long-lasting ERP evoked by the initial presentation of the scene. Including it in the model ensures that FRPs are not distorted by the overlap with this stimulus-ERP. For the fixation onset events we modeled the horizontal and vertical fixation position on the screen and the saccade amplitude using spline predictors. In addition, we modeled the direction of the incoming saccade that precedes the fixation. Because the angle of a saccade is a circular predictor (from 0° to 360°), it was modeled by a set of five circular splines (Ehinger & Dimigen, 2018). Responses were modeled in a time window from -400 to 800 ms around each event.
To sum up the model, we modeled the ERP elicited by the scene onset and the FRPs elicited by each fixation on the scene and allowed for several nonlinear effects of saccade properties on the FRP. In the following, we will exemplarily focus on the results for two of these predictors: saccade amplitude and saccade angle.
Results & Discussion
Figure 6B summarizes eye movement behavior in the visual search task. Saccades had a median amplitude of 4.9° and fixations lasted 264 ms on average. The electrophysiological results are summarized in Figure 6C to 6F. These panels show the isolated effects of saccade amplitude and saccade direction, corrected for overlapping potentials. As Figure 6C shows, the isolated effects reveal a long-lasting impact of saccade amplitude on the FRP: At electrode Oz, located centrally over primary visual cortex, saccade amplitude influenced all time points of the FRP up to 600 ms after fixation onset. The results also confirmed that this effect is indeed highly nonlinear: For example, the increase in lambda response amplitude as a function of saccade size was steep for smaller saccades (< 6°) but slowly leveled off for larger saccades. Such nonlinearities were observed for all ten participants (Figure 6D). It is obvious that a nonlinear model is more appropriate for this data than a linear one.
Interestingly, the angle of the incoming saccade also influenced the FRP in a highly nonlinear manner. In Figure 6E, this is shown for lateralized posterior electrode PO8, located over the right hemisphere. The corresponding scalp topographies for saccades of different directions are shown in Figure 6F, in the time window 200 to 300 ms after fixation onset. It can be seen how saccade direction modulates the FRP’s scalp distribution, with rightward-going saccades generating higher amplitudes over the left hemisphere and vice versa (see Cornelissen et al., 2019; and Meyberg et al., 2015 for similar findings). Note that this effect is not due to corneoretinal artifacts, which were successfully suppressed with ICA. This effect of saccade direction is also not explained by different fixation locations on the screen following saccade offset (Dimigen et al., 2013), since horizontal and vertical fixation position were also included as predictors in the model (results not shown here).
Discussion
In this example, we simultaneously modeled the effects of some low-level oculomotor covariates on FRPs. During scene viewing, these covariates are commonly intercorrelated (Nuthmann, 2017) and also correlated with higher-level cognitive factors (e.g. whether the fixated item is the search target). Furthermore, as we show here, they influence long intervals of the FRP waveform in a nonlinear way. A strictly linear model (e.g. linear saccade amplitude in Cornelissen et al., 2019; Kristensen, Guerin-Dugué, et al., 2017; Weiss et al., 2016) is therefore not ideal to capture these complex relationships.
In addition to the covariates discussed here, one could easily enter more low-level predictors into the model, such as the local luminance (Ossandon et al., 2010; Dimigen et al., 2013; Kristensen et al., 2017b) and spatial frequency spectrum (Ries et al., 2018) of the currently foveated part of the image. Finally, to study higher-level cognitive effects on the fixation-related P300 in this task (Dandekar, Ding, Privitera, Carney, & Klein, 2012; Dias et al., 2013; Kamienkowski et al., 2012), one could for example add a categorical predictor (0 or 1) coding whether or not the fixated screen region contained the search target participants were looking for. The next example will illustrate how we can reliably study the time course of such psychologically interesting effects despite differences in motor behavior.
EXPERIMENT 3: NATURAL READING
During free viewing experiments, the psychological manipulation of interest is typically linked to a change in fixation duration, which will distort the FRPs. A classic task to illustrate this problem and its solution via deconvolution modeling is reading. In ERP research on reading, sentences are traditionally presented word-by-word at the center of the screen. While this serial visual presentation procedure controls for overlapping potentials, it differs in important ways from natural reading (Kliegl, Dambacher, Dimigen, & Sommer, 2014; Sereno & Rayner, 2003). One key property of visual word recognition that is neglected by serial presentation procedures is that the upcoming word in a sentence is usually previewed in parafoveal vision (eccentricity 2-5°) before the reader looks at it (Schotter, Angele, & Rayner, 2012). The parafoveal preprocessing then facilitates the recognition of the word once the word is fixated. This facilitation is evident in the classic preview benefit (Rayner, 1975) in behavior: words that were parafoveally visible during the preceding fixations receive 20-40 ms shorter fixations (Vasilev & Angele, 2017) than words that were gaze-contingently masked with a different word or a meaningless letter string before being fixated.
Combined eye-tracking/EEG studies have recently established tentative neural correlates of this preview benefit in FRPs during natural reading: An early effect, corresponding to a reduction of the late parts of the occipitotemporal N1 component between about 180-280 ms after fixation onset (“preview positivity”) sometimes followed by a later effect at around 400 ms that may reflect a reduction of the N400 component by preview (Degno et al., 2019; Dimigen, Kliegl, & Sommer, 2012; Kornrumpf et al., 2016; Li, Niefind, Wang, Sommer, & Dimigen, 2015; Niefind & Dimigen, 2016). However, an inherent problem with these studies is that the difference in fixation times measured on the target word also changes the amount of overlap with the following fixations. This raises the question to which degree the reported neural preview effects are real or just a trivial consequence of the different overlap situation in the condition with and without an informative preview. Below we will demonstrate how deconvolution modeling can answer this question by separating genuine preview effects from spurious overlap effects.
Participants
Participants were native speakers of German with normal or corrected-to-normal visual acuity (age 18-45 years old, 27 female). Here we present results from the first 48 participants recorded in this study. A manuscript describing the full dataset in detail with all experimental manipulations is currently in preparation2.
Methods
Participants read 144 pairs of German sentences belonging to the Potsdam Sentence Corpus 3, a set of materials previously used in psycholinguistic ERP research and described in detail in Dambacher et al. (2012). On each trial, two sentences were successively presented as single lines of text on the monitor (Figure 7A). Participants read the sentences at their own pace and answered occasional multiple-choice comprehension questions that were presented randomly after one third of the trials. Sentences were displayed in a black font (Courier, 0.45° per character) on a white background. The second sentence contained a target word (e.g. “weapon”) for which the parafoveal preview was manipulated using Rayner’s (1975) boundary paradigm (Figure 7A): Before fixating the target word, that is, during the preceding fixations, readers either saw a correct parafoveal preview for the target word (e.g. “weapon”) or a non-informative preview that consisted of a meaningless but visually similar letter string of the same length (e.g. “vcrqcr”). During the saccade to the target, while visual sensitivity is reduced (Matin, 1974), the preview mask was always gaze-contingently exchanged with the correct target word (e.g. “weapon”). This display change was executed with a mean latency of < 10 ms and typically not noticed by the participants, as validated with a questionnaire after the experiment (Dimigen et al., 2012).
In a first analysis step, all trials were marked that contained eye blinks, a loss of eye-tracking data, a late display change (executed > 10 ms after saccade offset), a skipping of the target word, or excessive non-ocular EEG artifacts. Remaining non-ocular artifacts were detected by shifting a moving window (4000 ms long) across the continuous EEG and by marking all intervals in which the window contained peak-to-peak voltage differences > 150 µV in any channel (Ehinger & Dimigen, 2019). In the deconvolution framework, these “bad” intervals can then be easily excluded by setting all predictors in the design matrix to zero during these bad intervals (Smith & Kutas, 2015b). The mean number of remaining trials per participant was 59.8 (range: 38-71) for the invalid and 51.9 (range: 41-69) for the valid preview condition.
In the second step, we modeled both the ERP elicited by the sentence onset (with its intercept) as well as the FRP evoked by each reading fixation. The model specified was where is_targetword and is_previewed are both binary categorical predictors coding whether or not a fixation was on the manipulated target word and whether or not that target word had been visible or not during the preceding fixations, respectively. For reasons of simplicity, saccade amplitude was left out of the model, because it was similar between preview conditions. We estimated responses from -600 to +1000 ms around each event. For group statistics, the interval between -300 and +600 ms after fixation onset was submitted to the TFCE permutation test.
Results & Discussion
Table 1 reports the fixation durations in the target region of the sentence. The average duration of all fixations during sentence reading was only 212 ms, meaning that FRPs were strongly overlapped by those from preceding and subsequent fixations. Figure 7B visualizes the distribution of first-fixation durations on the target word (e.g. “weapon”) as a function of whether the parafoveal preview for this word was valid (blue line) or invalid (red line). Note that the two conditions only differ in terms of what the participant saw as a preview during the preceding (pre-target) fixation; the foveated word is the same in both conditions. As expected, fixation durations on the pre-target word were not affected by the preview manipulation. However, the subsequent first-fixation on the target word itself was on average 38 ms shorter in the condition in which a valid rather than an invalid preview was provided. This preview benefit was significant, t(47) = 11.42, p < 0.0001. In gaze duration, which is the summed duration of all first-pass fixations on the target word, the effect was also significant, with a difference of 46 ms, t(47) = 12.47, p < 0.0001. Results therefore replicate the classic preview benefit in behavior (Rayner, 1975).
Figure 7C presents the corresponding FRP waveforms. The plotted channel is PO9, a left-hemispheric occipitotemporal electrode where the strongest preview effects have been observed previously (Dimigen et al., 2012; Kornrumpf et al., 2016). Time zero marks the fixation onset on the target word. Note that at this time, readers are always looking at the correct target word; conditions only differ in terms of what was parafoveally visible during the preceding fixation. As Figure 7C shows, permutation testing revealed a significant effect (p < 0.05, under control of multiple comparisons) of preview condition on the FRP after conventional averaging (without deconvolution). The black bars in panels 7C, 7D, and 7E highlight the duration of the underlying spatiotemporal clusters at electrode PO9. Please note that because these clusters are computed during the first stage of the permutation test (Sassenhagen & Draschkow, 2019) they are themselves not stringently controlled for multiple comparisons (unlike the overall test result). However, their temporal extent provides an indication of the intervals that likely contributed to the overall significant effect of preview. With conventional averaging, clusters extended across three intervals after fixation onset: early: 228-308 ms, middle: 378-404 ms, and late: 464-600 ms (black bars in Figure 7C). However, if we look at the underlying single-trial EEG activity sorted by the fixation duration on the target word (lower panel of Figure 7C), it becomes obvious that a relevant part of the brain potentials after 200 ms is actually not generated by the fixation on the target word but by the next fixation (n+1) in the sentence. Since this next fixation begins on average 38 ms later in the invalid preview condition (see Table 1), this creates a spurious condition difference in the FRP.
Figure 7D shows the same data, corrected for overlapping potentials. In the sorted single-trial data (lower panel), the activity related to the current fixation is preserved, whereas that from the next fixation is removed. Crucially, the overall significant effect of preview (p < 0.05) is preserved. However, instead of three, we now only observe two clusters, one extending from 216 to 318 ms (reflecting the early “preview positivity”) and another beginning at 464 ms and lasting until the end of the epoch at 600 ms (possibly reflecting a late preview effect on the N400). In contrast, the middle cluster around 390 ms has disappeared (see Figure 7C), because this difference was caused by overlapping activity. This confounding effect of overlapping potentials is confirmed in Figure 7E, which shows only the activity produced by the neighboring fixations. The permutation test confirmed that this overlapping activity alone indeed produced a significant difference between conditions (p < 0.05), with the underlying cluster extending from 352 to 426 ms.
Discussion
Existing studies on neural preview effects did not control for the signal distortions produced by the difference in fixation time. Our analysis confirms for the first time that the previously reported neural preview effects are not trivial artifacts of overlapping activity, but genuine consequences of parafoveal processing. This insight is important because during natural vision, most visual objects are already partially processed in the parafoveal or peripheral visual field before they enter the fovea. In other words, while ERP researcher have traditionally presented isolated objects during steady fixation, visual objects are typically primed and/or predicted during natural vision based on a coarse extrafoveal preview of the object. Indeed, a similar preview effect, as shown here for words, was recently reported for previewed human faces (Buonocore, Dimigen, & Melcher, 2019; Edwards, VanRullen, & Cavanagh, 2018; Huber-Huber et al., 2018). This indicates that the modulation of the N1 component by previews may be a characteristic feature of visual object recognition under real-world conditions.
In summary, this application shows how deconvolution can disentangle cognitive effects from spurious overlap effects, allowing us to resolve the precise time course of attentional and cognitive processes during individual fixations.
GENERAL DISCUSSION
We applied deconvolution modeling and nonlinear spline regression to combined eye-tracking and EEG datasets recorded during free viewing. Examples from three commonly used paradigms demonstrate that this approach is suitable to control spurious effects and can also provide new theoretical insights into cognitive processes during natural vision. In the face recognition study, our analysis confirmed that the overlapping muscle and brain potentials produced by small involuntary eye movements during the task are problematic. However, we show that they can be effectively removed with deconvolution modeling to obtain clean stimulus-locked ERPs. Additionally, this opens the interesting possibility to use the now isolated eye movement-related potentials as an additional electrophysiological marker of attentional, cognitive, or affective processing in the task (Meyberg et al., 2015; Guérin-Dugué et al., 2018). The scene viewing example was included to demonstrate that at least some of the (numerous) low-level influences on the FRP waveform during free viewing are highly nonlinear. For example, our nonlinear analysis revealed a nonlinear effect of the angle of the incoming saccade on the FRP. Finally, the application to natural reading showed how spurious effects due to different fixation durations can be removed. This allowed us for the first time to describe the time course of neural preview benefits in an unbiased manner. In contrast, the simple averaging approach used in previous studies (e.g. Dimigen et al., 2012; Kornrumpf et al., 2016; Degno et al. 2018) will have necessarily produced incorrect conclusions about the exact time course and morphology of this effect.
In the following, we will further discuss the underlying assumptions, possibilities, and existing limitations of using the (non)linear deconvolution approach for combined eye-tracking/EEG research and outline some interesting future perspectives.
Assumptions of deconvolution models
Traditional ERP averaging is based on the assumption that the underlying event-related response is invariant across all epochs of a given condition and that the average is therefore a sensible description of the individual trials (Otten, L.J., Rugg, 2005). This assumption is likely incorrect, since cortical information processing may vary between trials, as indicated by trial- to-trial differences in reaction times. Deconvolution models are based on the same assumption, namely that the fixation-related response is the same, regardless of the amount of overlap. In other words, we rely on the somewhat unrealistic assumption that the fixation-related neural response does not differ between short and long fixations. The assumption also concerns sequences of fixations. From ERP research, we know that the processing of one stimulus can change the processing of the next one due to adaptation, habituation, or priming (e.g. Schweinberger & Neumann, 2016). Again, it would be surprising if these factors do not also modulate the FRP while a stimulus is rapidly scanned with several saccades. As stated in Ehinger & Dimigen (2018) if such sequential effects occur often enough in an experiment, they can be explicitly modeled within the deconvolution framework. For example, in a scene viewing study, on could add an additional predictor that codes whether a fixation happened early or late after scene onset (Fischer, Graupner, Velichkovsky, & Pannasch, 2013) or whether it was the first fixation or a refixation on a particular image region (Van Humbeeck et al., 2018).
Baseline correction and placement
In ERP research, baseline correction is performed to accommodate for slow drifts in the signal, for example due to changes in skin potential (Luck, 2014). The baseline interval is therefore typically placed in a “neutral” interval before stimulus onset. In experiments with multiple saccades, it is more difficult to find an appropriate neutral baseline (Dimigen et al., 2011; Nikolaev et al., 2016).
The first reason is of methodological nature and directly linked to the problems of overlap and covariates: The baseline for the FRP is often biased because of differences in the duration of the preceding fixation, differences in the size of the preceding saccade, or a different overlap with the stimulus-onset ERP. Several workarounds have been proposed to deal with this problem, such as placing the baseline before trial onset (e.g. Dimigen et al., 2011; Nikolaev et al., 2016) before an earlier fixation (e.g. Coco et al., 2018; Degno et al., 2018, Huber-Huber et al., 2018), or in the first few milliseconds after fixation onset (e.g. Simola, Fevre, Torniainen, & Baccino, 2014, see Nikolaev et al., 2016 for an illustration of different baseline placement options). With deconvolution, this is not necessary, because we can effectively remove the overlapping activity and covariate effects from the baseline interval. In our experience, the baselines of the deconvolved FRPs are essentially flat (as also visible in Figure 7D where only the spike potential of the current fixation remains). This means that a conventional baseline correction can be applied to the deconvolved FRPs.
However, a second reason why the baseline should still be chosen carefully are effects of parafoveal and peripheral processing. Because viewers obtain information about soon-to-be fixated items in peripheral vision (Coco et al., 2018; Buonocore et al., 2019), EEG effects may in some cases already begin before an object is foveated (e.g. Baccino & Manunta, 2005; Luo et al., 2010). In paradigms where such parafoveal-on-foveal effects are likely to occur, it may be sensible to place the baseline interval further away from fixation onset, even if overlapping potentials were removed. An even better option would be to capture these parafoveal-on-foveal effects in the model itself by adding the pre-target fixations as a separate type of event to the model. For example, in the reading experiment reported above, we could have coded the status of each reading fixation (in the predictor is_targetword) not just as “non-target” (0) or “target” (1), but could have added a third category of “pre-target” fixations (those on the word before the manipulated target word). In this way, any potential parafoveal-on-foveal effects produced by seeing the parafoveal non-word mask during the pre-target fixation (Schotter et al., 2012) can be disentangled from the EEG preview benefits that occur after fixating the target.
Time-frequency analysis
While most EEG recordings during free viewing have been analyzed in the time domain, it is also possible to study eye movement-related changes in oscillatory power and phase (Bodis-Wollner et al., 2002; Gaarder, Koresko, & Kropfl, 1966; Hutzler, Vignali, Hawelka, Himmelstoss, & Richlan, 2016; Kaiser, Brunner, Leeb, Neuper, & Pfurtscheller, 2009; Kornrumpf, Dimigen, & Sommer, 2017; Metzner, von der Malsburg, Vasishth, & Rösler, 2015; Nikolaev et al., 2016; Ossandon et al., 2010). Event-related responses in the frequency domain – such as induced changes in power – can last for several seconds and are likely biased by overlapping activity in much the same way as FRPs (Litvak et al., 2013; Ossandon et al., 2019). To address this problem, Ossandón and colleagues (e.g. Ossandón, König, & Heed, 2019) recently used the Hilbert transformation to obtain instantaneous power of the EEG in the alpha band. They then deconvolved the bandpass-filtered and rectified signal, showing that deconvolution can also be applied to EEG oscillations. Deconvolution is also an interesting option to correct for the spectral artifacts that are produced by involuntary microsaccades in time-frequency analyses (Yuval-Greenberg et al., 2008). Specifically, the results of the face recognition experiment (see Figure 5D) show that deconvolution is able to isolate the saccadic spike potential, the eye muscle artifact that is known to produce strong distortions in the gamma band (> 30 Hz) of the EEG. This suggests that cognitive influences on stimulus-locked gamma power can be disentangled from microsaccade artifacts if the continuous gamma band power (rather than the raw EEG) is entered into the model.
An unresolved question concerns the most suitable measure of spectral power to put into the deconvolution model. On this issue, Litvak and colleagues (2013) conducted simulations where they compared the model fits (R2) for different measures of spectral power (raw power, log power, square root of power) and obtained the best results for the square root of power. Further simulations are needed to see which transformation is the most suitable (e.g., power or square root of power) or whether the practical differences are negligible.
Improving the understanding of fixation-related brain activity
There are many ways to further improve the estimation of FRPs during free viewing. For example, the lambda response, the predominantly visually-evoked P1 response following fixation onset, is not fully understood. Existing evidence suggests that it is itself a compound response, consisting of at least two separate subcomponents: a visual “off”-response produced by the beginning of the saccade and a visual “on”-response that follows the inflow of new visual information at saccade offset (Kazai & Yagi, 2003; Kurtzberg & Vaughan, 1982; Thickbroom et al., 1991). Potentially, deconvolution could be used to separate the saccade onset and saccade offset-related contributions to the lambda response. Another promising application is to isolate the neural correlates of peri-saccadic stimulation, e.g. from gaze-contingent changes to the stimulus made during the saccade (Buonocore et al., 2019; Dimigen et al., 2012; Kleiser, Skrandies, & Anagnostou, 2000; Skrandies & Laschke, 1997).
Another interesting feature of linear deconvolution is that it is possible to add temporally continuous signals – rather than only discrete event onsets – as predictors to the time-expanded design matrix (Gonçalves, Whelan, Foxe, & Lalor, 2014; Lalor, Pearlmutter, Reilly, McDarby, & Foxe, 2006). For example, to partially correct for corneoretinal artifacts, one could add a column to the design matrix that contains the continuous gaze position signal of the eye-tracker, which will then be regressed out from the EEG (as suggested by Dandekar et al., 2012b). Yet another possibility is to add the pupil diameter as a time-continuous predictor. The idea would be that the momentary level of arousal and mental load – as indexed by pupil size – will correlate with the amplitude of the neural response. Other continuous signals that could be added to the model include the luminance profile of a continuously changing stimulus (e.g. of a video watched by the participant) or the sound envelope of concurrent auditory stimuli (e.g. the sound channel of the movie). Finally, signals from accelerometers and other motion sensors could help to account for the head, neck, and body movements that characterize visual exploration behavior outside of the laboratory (Ehinger et al., 2014; Gramann, Jung, Ferris, Lin, & Makeig, 2014).
Integrating deconvolution with linear mixed models
In the present work, we used the two-stage statistical approach that is also commonly used with massive univariate models (Pernet et al., 2001): Regression-ERPs (betas) are first computed individually for each participant and then entered into a second-level group analysis (e.g. a repeated-measures ANOVA or a permutation test). Compared to this hierarchical approach, linear mixed-effects models (e.g. Gelman and Hill, 2007) provide a number of advantages (Baayen et al., 2008; Kliegl et al., 2011), for example the option to include crossed random effects for subjects and items. Mixed models have therefore been applied to analyze fixation durations (e.g. Kliegl, 2007; Nuthmann, 2017; Ehinger et al., 2018) and more recently also FRPs (Dimigen et al., 2011; Degno et al., 2018). In the long term, it will be promising to integrate deconvolution with mixed-effects modeling, but this will require large computational resources (because the EEG data of all participants has to be fitted simultaneously) as well as new algorithms for estimating sparse mixed-effects models (e.g. Wood, Li, Shaddick, & Augustin, 2017).
Towards a full analysis pipeline for free viewing EEG
In Figure 1, we summarized four challenges that have complicated eye-tracking/EEG research in the past. We believe that there are now adequate solutions to all four problems. The unfold toolbox that was used for the current analyses is fully compatible with the existing EYE-EEG toolbox. In a first step, EYE-EEG can be used to synchronize the recordings, to add saccade and fixation events to the data, and to suppress eye movement artifacts with specialized ICA procedures (Dimigen, 2018, Plöchl et al., 2012). In the next step, the artifact-corrected EEG can be read into the unfold toolbox to model the fixation-related responses. Taken together, the two toolboxes provide a full pipeline to address the four problems of co-registration.
Conclusions
In this paper we have presented a framework for analyzing eye movement-related potentials and exemplified its advantages for three popular paradigms. By controlling for overlapping potentials and low-level influences, the (non)linear deconvolution framework allows us to study new exciting phenomena that were previously confounded by these issues and difficult or impossible to investigate. In combination with existing approaches for data integration and artifact correction, this opens new possibilities to investigate natural viewing behavior with EEG without compromising data quality.
Footnotes
The authors would like to thank Linda Gerresheim and Anna Pajkert for their help with collecting some of the datasets used here as well as Peter König, Anna Lisa Gert, and Lisa Spiering for feedback on this work. Collection of the reading dataset was supported by a grant from Deutsche Forschungsgemeinschaft (DFG FG 868-A2).
↵1 The number of splines that cover the range of the predictor determines how flexible the fit is. A larger number of splines allows us to model more complex relationships but also increases the risk of overfitting the data. See Ehinger & Dimigen (2018) for a discussion.
↵2 The target word in the second sentence was also manipulated in terms of its contextual predictability and lexical frequency (Dambacher et al., 2012). Here we focus only on the factor preview.