Neural Signatures of Reward and Sensory Prediction Error in Motor Learning

Dimitrios J. Palidis; Joshua G.A. Cashaback; Paul L. Gribble

doi:10.1101/262576

Abstract

At least two distinct processes have been identified by which motor commands are adapted according to movement-related feedback: reward based learning and sensory error based learning. In sensory error based learning, mappings between sensory targets and motor commands are recalibrated according to sensory error feedback. In reward based learning, motor commands are associated with subjective value, such that successful actions are reinforced. We designed two tasks to isolate reward and sensory error based motor adaptation, and recorded electroencephalography (EEG) from humans to identify and dissociate the neural correlates of reward and sensory error processing. We designed a visuomotor rotation task to isolate sensory error based learning which was induced by altered visual feedback of hand position. In a reward learning task, we isolated reward based learning induced by binary reward feedback that was decoupled from the visual target. We found that a fronto-central event related potential called the feedback related negativity (FRN) was elicited specifically by reward feedback but not sensory error feedback. A more posterior component called the P300 was evoked by feedback in both tasks. In the visuomotor rotation task, P300 amplitude was increased by sensory error induced by perturbed visual feedback, and was correlated with learning rate. In the reward learning task, P300 amplitude was increased by reward relative to non reward and by surprise regardless of feedback valence. We propose that during motor adaptation, the FRN might specifically mark reward prediction error while the P300 might reflect processing which is modulated more generally by prediction error.

New and Noteworthy We studied the event related potentials evoked by feedback stimuli during motor adaptation tasks that isolate reward and sensory error learning mechanisms. We found that the feedback related negativity was specifically elicited by reward feedback, while the P300 was observed in both tasks. These results reveal neural processes associated with different learning mechanisms and elucidate which classes of errors, from a computational standpoint, elicit the FRN and P300.

Introduction

It is thought that sensorimotor adaptation is driven by two distinct error signals, sensory prediction error (SPE) and reward prediction error (RPE), and that both can simultaneously contribute to learning (Huang et al. 2011; Izawa and Shadmehr 2011; Shmuelof et al. 2012; Galea et al. 2015; Nikooyan and Ahmed 2015). Electroencephalography (EEG) has been used to identify neural signatures of error processing in various motor learning and movement execution tasks, but it remains unclear how these neural responses relate to distinct reward and sensory-error based motor learning mechanisms (Krigolson et al. 2008; Torrecillos et al. 2014; MacLean et al. 2015). Here we identified neural signatures of sensory error and reward feedback processing in motor learning using separate learning paradigms that produce comparable changes in behavior.

In theories of motor adaptation, SPE occurs when the sensory consequences of motor commands differ from the predicted outcomes. SPE is thought to occur in visuomotor rotation (VMR) paradigms in which visual feedback of hand position is rotated relative to the actual angle of reach. Adaptation, in which motor output is adjusted to compensate for perturbations, is thought to be driven largely by SPE in these tasks (Izawa and Shadmehr 2011; Marko et al. 2012). Sensory error feedback activates brain regions including primary sensory motor areas, posterior parietal cortex, and cerebellum (Inoue et al. 2000,2016; Krakauer et al. 2004; Diedrichsen et al. 2005; Bédard and Sanes 2014). Tanaka et al. (2009) propose that SPEs computed by the cerebellum produce adaptation via changes in synaptic weighting between the posterior parietal cortex and motor cortex. Furthermore, strategic aiming also contributes to behavioural compensation for visuomotor rotations in a manner that is largely independent from the automatic visuomotor recalibration that is driven by cerebellar circuits (Mazzoni and Krakauer 2006; Benson et al. 2011; Taylor et al. 2014; McDougle et al. 2016).

Recent research suggests that a reward based learning process can also contribute to motor adaptation in parallel to a sensory-error based learning system, and that reward feedback can drive motor learning even in the absence of sensory error feedback (Izawa and Shadmehr 2011; Therrien et al. 2015; Holland et al. 2018). Reward learning has been isolated experimentally by providing participants with only binary reward feedback, indicating success or failure, without visual feedback of hand position. Reward-based motor learning has been modelled as reinforcement learning, which maps actions to abstract representations of reward or success, rather than to the sensory consequences of action. In reinforcement learning theory, if the outcome of an action is better than the predicted outcome, a positive reward prediction error (RPE) occurs which drives an increase in the agent’s estimate of expected value of that reward outcome, along with an increase in the future likelihood of selecting that particular action. Conversely, if the outcome is worse than expected, a negative RPE diminishes the estimated value and likelihood of selecting that action. Phasic dopaminergic signaling in the VTA and striatum is consistent with encoding of RPE (Glimcher 2011), and dopaminergic activity has been implicated in reward based motor learning (Galea et al. 2013; Pekny et al. 2015).

EEG has been used to identify neural correlates of error monitoring, but few studies have employed motor adaptation tasks. An event-related potential (ERP) known as the feedback related negativity (FRN) has been proposed to reflect processing of RPE in the context of reinforcement learning. There is evidence that FRN reflects activity in anterior cingulate or supplementary motor cortical areas, and the reinforcement learning theory of the FRN states that it is driven by phasic dopamin ergic signaling of RPE (Holroyd and Coles 2002; Nieuwenhuis et al. 2004; Walsh and Anderson 2012; Heydari and Holroyd 2016). Other accounts attribute the FRN to processes involving conflict monitoring or general prediction error as opposed to RPE (Alexander and Brown 2011; Baker and Holroyd 2011a; Botvinick 2011). Motor adaptation paradigms are particularly opportune to test the reinforcement learning theory of the FRN as they afford dissociation between reinforcement learning processes and learning through sensory prediction error.

The FRN potential is superimposed on the P300, a well characterized positive ERP component that peaks later than FRN, and with a more posterior scalp distribution. It has been proposed that the P300 reflects the updating of a model of stimulus context (Donchin and Coles 1988). Both the FRN and the P300 have been observed in response to errors in motor tasks (Krigolson et al. 2008; Torrecillos et al. 2014; MacLean et al. 2015; Reuter et al. 2018; Savoie et al. 2018). In this paper we describe experiments in which we isolated and compared EEG responses to both reward and sensory error feedback using separate adaptation paradigms that produced comparable changes in behavior. We tested the idea that the FRN is a neural signature of feedback processing that specifically supports reward based motor learning, while the P300 reflects a process which is generally related to feedback processing but is particularly important for sensory error based learning.

Material and Methods

Experimental Design and Statistical Analysis

Participants made reaching movements toward a visual target while holding the handle of a robotic arm. A screen displayed visual information related to the task but occluded vision of the arm. The setup and procedure regarding the reaching movements is described under “Apparatus/Behavioral Task”. Feedback pertaining to reach angle was provided at movement endpoint, and feedback was manipulated such that participants adapted their reach direction to compensate for the manipulations. Participants were instructed that each reach terminating within the target would be rewarded with a small monetary bonus. Participants underwent alternating experimental blocks of a reward learning task and a visuomotor rotation task. The design for each task is described under “reward learning task” and “visuomotor rotation task” methods subsections, respectively.

In the visuomotor rotation task, a cursor appeared at movement endpoint to represent the position of the hand. In a randomly selected 50% of trials, a perturbation was imposed, such that the cursor feedback was rotated around the starting position by a fixed angle, indicating a reach angle that was shifted relative to the unperturbed feedback. The magnitude of the perturbation was varied across blocks, and was either .75 deg or 1.5 deg. Behaviourally, we tested for trial by trial adaptive responses that compensated for the perturbations. We compared the neural responses to rotated and non-rotated feedback to assess the neural correlates of processing sensory error feedback during adaptation, and we tested whether this effect was modulated by the size of the perturbations. The perturbations were small relative to the size of the target, such that participants nearly always landed in the target, fulfilling the goal of the task and earning a monetary reward, even on the perturbed trials. Thus, reward and task error were constant between perturbed and non-perturbed feedback, and by comparing the two conditions we could assess the neural and behavioural response to sensory error without the confounds of reward or task error processing.

In the reward learning task, no cursor appeared to indicate the position of the hand, but instead binary feedback indicated whether or not participants succeeded in hitting the target and earning monetary reward. This allowed us to assess the neural and behavioral responses to reward feedback in isolation from sensory error processing, as visual information revealing the position of the hand was not provided. Reward was delivered probabilistically, with a higher probability of reward for reaches in one direction than the other, relative to participants’ recent history of reach direction. This was intended to induce adaptation such that participants would adjust their reaches towards the direction that was rewarded at a higher probability. The overall reward frequency was controlled and manipulated across blocks, so that participants experienced both reward and non-reward feedback in both low and high overall reward frequency conditions. We compared the neural responses to reward and non-reward feedback to assess the neural correlates of reward processing during adaptation. We compared the responses to frequent and infrequent feedback to assess effects related to expectation, under the assumption that outcomes which occurred less frequently in a given block would violate expectations more strongly (Reward feedback in the high reward frequency condition and non-reward feedback in the low reward frequency conditions were deemed “frequent”, while non-reward feedback in the high reward frequency condition and reward feedback in the low reward frequency condition were deemed “infrequent”).

The trial averaging procedures used to estimate the neural responses to various feedback conditions are described under “Event Related Component Averaging”, and the analysis of these neural responses is described under “P300 Analysis” and “Feedback Related Negativity Analysis”.

Results of statistical tests are reported in the Results section, under “Behavioral Results”, “Feedback Related Negativity Results”, and “P300 Results”. Linear relationships between behavioural and EEG measures were assessed using robust regression, implemented by the Matlab fitlm function with robust fitting option. This method uses iteratively reweighted least squares regression, assigning lower weight to outlier data points. Student’s t-tests were performed using MATLAB R2016b, and Lilliefors test was used to test the assumption of normality. In the case of non-normal data, Wilcoxin signed rank test was used to test pairwise differences. Repeated measures analyses of variance (ANOVAs) were conducted using IBM SPSS Statistics version 25. For all ANOVAs, Mauchly’s test was used to validate the assumption of sphericity.

Participants

A total of n=20 healthy, right-handed participants were included in our study (23.21 ± 3.09 years old, 12 females). Three participants underwent the experimental procedure but were excluded due to malfunction of the EEG recording equipment. One participant who reported that they performed movements based on a complex strategy that was unrelated to the experimental task was excluded. Participants provided written informed consent to experimental procedures approved by the Research Ethics Board at The University of Western Ontario.

Experimental Procedure

Participants first performed a block of 50 practice trials. After the practice block the experimenter fitted the EEG cap and applied conductive gel to the electrodes before beginning the behavioral task (see “EEG Data Acquisition” below). The behavioral procudure consisted of 8 experimental blocks, including four 115-trial blocks of a reward learning task and four 125-trial blocks of a visuomotor rotation task. The order of the blocks alternated between the two task types but was otherwise randomized. Participants took self-paced rests between blocks.

Apparatus/Behavioral Task

Participants produced reaching movements with their right arm in a horizontal plane at chest height while holding the handle of a robotic arm (InMotion2, Interactive Motion Technologies, Massachusetts, United States; Fig 1). Position of the robot handle was sampled at 600 Hz. A semi-silvered mirror obscured vision of the arm and displayed visual information related to the task. An air-sled supported each participant’s right arm.

Figure 1:

Experimental setup. A, Participants reached to visual targets while holding the handle of a robotic arm. Vision of the arm was obscured by a screen that displayed visual information related to the task. B, During reaches, hand position was hidden but an arc shaped cursor indicated the extent of the reach without revealing reach angle. Feedback was provided at reach endpoint. C, In the reward learning condition, binary feedback represented whether reaches were successful or unsuccessful in hitting the target by turning green or red, respectively. Reach adaptation was induced by providing reward for movements that did not necessarily correspond to the visual target. D, In the visuomotor rotation condition, feedback represented the endpoint position of the hand. Adaptation was induced by rotating the angle of the feedback relative to the actual reach angle.

Participants began each trial with their hand at a start position in front of their chest at body midline (mid-saggital plane). The start position was displayed using a red circle with a diameter of 1 cm (Fig 1a). A white circular target was displayed 14 cm away from the start position (Fig 1a). A cursor indicated hand position only while the hand was within the start circle. The start position turned green to cue the onset of each reach once the handle had remained inside the start position continuously for 750 ms. Participants were instructed that they must wait for the cue to begin each reach, but that it was not necessary to reach immediately or react quickly upon seeing the cue.

Participants were instructed to make forward reaches and to stop their hand within the target. An arc shaped cursor indicated reach extent throughout the movement, without revealing the angle of the hand relative to the start position. In the first 5 baseline trials of each block, continuous position feedback was provided and consisted of an additional circular cursor indicated the position of the hand throughout the reach. In all subsequent reaches for each block, the cursor indicating hand position disappeared when the hand left the start position, and only the arc shaped cursor indicating movement extent was shown. A viscous force field assisted participants in braking their hand when the reach extent was greater than 14 cm.

The robot ended each movement by fixing the handle position when the hand velocity decreased below 0.03 m/s. During this time, while the hand was fixed in place (for 700 ms) visual feedback of reach angle was provided. Feedback indicated either reach endpoint position, a binary reward outcome, or feedback of movement speed (see below). Visual feedback was then removed and the robot guided the hand back to the start position. During this time no visual information relating to hand position was displayed.

Reach endpoint was defined as the position at which the reach path intersected the perimeter of a circle (14 cm radius), centered at the start position. Reach angle was calculated as the angle between a line drawn from the start position to reach endpoint and a line drawn from the start position to the center of the target, such that reaching straight ahead corresponds to 0 deg and counter-clockwise reach angles are positive (Fig 1a). Feedback about reach angle was provided either in the form of endpoint position feedback or binary reward feedback. The type of feedback, as well as various feedback manipulations, varied according to the assigned experimental block type (See “Reward Learning Condition” and “Visuomotor Rotation Condition”). Endpoint position feedback consisted of a stationary cursor indicating the position of movement endpoint, while reward feedback consisted of the target turning either red or green to indicate that the reach endpoint missed or hit the target, respectively.

Movement duration was defined as the time elapsed between the hand leaving the start position and the moment hand velocity dropped below 0.03 m/s. If movement duration was greater than 700 ms or less than 450 ms no feedback pertaining to movement angle was provided. Instead, the gray arc behind the target turned blue or yellow to indicate that the reach was too slow or too fast, respectively. Participants were informed that movements with an incorrect speed would be repeated but would not otherwise affect the experiment.

To minimize the impact of eye-blink related EEG artifacts, participants were asked to fixate their gaze on a black circular target in the center of the reach target and to refrain from blinking throughout each arm movement and subsequent presentation of feedback.

Practice Block

Each participant completed a block of practice trials before undergoing the reward learning and visuomotor rotation experiments. Practice continued until participants achieved 50 movements within the desired range of movement duration. Continuous position feedback was provided during the first five trials, and only endpoint position feedback was provided for the subsequent 10 trials. After these initial 15 trials, no position feedback was provided outside the start position.

Reward Learning Task

Binary reward feedback was provided to induce adaptation of reach angle. Each participant completed 4 blocks in the reward learning condition. We manipulated feedback with direction of intended learning and reward frequency as factors using a 2×2 design (direction of learning x reward frequency) across blocks. For each direction of intended learning (clockwise and counter-clockwise), each participant experienced a block with high reward frequency and a block with low reward frequency. Participants performed blocks with each direction of intended learning to control for bias in reach direction. Reward frequency was manipulated to assess effects related to expectation, under the assumption that outcomes which occurred less frequently in a given block would violate expectations more strongly. Each reward learning block continued until the participant completed 115 reaches with acceptable movement duration. Participants reached towards a circular target 1.2 cm (4.9 deg) in diameter. The first 15 reaches were baseline trials during which the participant did not receive reward feedback. Continuous position feedback was provided during the first five trials, and only endpoint position feedback was provided for the subsequent 10 trials. After these baseline trials, no position feedback was provided, and binary reward feedback was provided at the end of the movement. Participants were told that they would earn additional monetary compensation for reaches that ended within the target, up to a maximum of CAD$10 for the whole experiment. Participants were told that rewarded and unrewarded reaches would be indicated by the target turning green and red, respectively.

Unbeknownst to participants, reward feedback was delivered probabilistically. The likelihood of reward depended on the difference between the current reach angle and the median reach angle of the previous 10 reaches. In the high reward frequency condition, reward was delivered with probability of 100% if the difference between the current reach angle and the running median was in the direction of intended learning, and at a probability of 30% otherwise (eq. 1). When the running median was at least 6 deg away from zero in the direction of intended learning, reward was delivered at a fixed probability of 65%. This was intended to minimize conscious awareness of the manipulation by limiting adaptation to ± 6 deg. In the low reward frequency condition, reward was similarly delivered at a probability of either 70% or 0% (eq. 2). When the running median was at least 6 deg away from zero in the direction of intended learning, reward was delivered at a fixed probability of 35%. Reach angle and feedback throughout a representative experimental block is shown in Figure 2.

Figure 2:

Reach angles of a representative participant. A) We show the reward learning block assigned to the clockwise adaptation with high reward frequency condition. Reaches were rewarded with 100.0% probability for reach angles less than the median of the previous 10 reaches, and with 30.0% probability for reach angles greater than this running median. Reward was delivered at a fixed probability of 65.0% when the running median was less than −6 degs, indicated by the ‘Non-Adaptation’ portion of the block. B) The visuomotor rotation block assigned to the 1.5 degree rotation condition is shown. The rotation is imposed randomly in 50% of trials. The rotation is initially counterclockwise but reverses when the mean of the previous five reach angles becomes less than −6.0 deg.

We employed this adaptive, closed loop reward schedule so that the overall frequency of reward was controlled. While participants adapted their reach angle to the task, the task adapted to the changing reach angle, as each reach was assessed relative to the recent history of reaches. This allowed us to assess correlations between neural measures and behavior without confounding learning and reward frequency.

Where p is probability of reward described separately for the high and low reward frequency conditions, θ is the reach angle on trial i, z = 1 for counter-clockwise learning blocks, and z = −1 for clockwise learning blocks.

Visuomotor Rotation Task

Endpoint feedback was rotated relative to the actual reach angle to induce sensory-error based adaptation. Each participant completed 4 blocks in the visuomotor rotation condition. We manipulated feedback with initial rotation direction and perturbation size as factors using a 2×2 design across blocks. For each direction of initial rotation (clockwise and counter-clockwise) each participant experienced a block with large rotation (1.5 deg) and a block with small rotation (.75 deg). Each block continued until participants completed 125 reaches within acceptable movement duration limits. Participants reached towards a circular target 2.5 cm (10.2 deg) in diameter. Participants first performed 25 baseline reaches during which position feedback reflected veridical reach angle. A cursor continuously indicated hand position during the first 10 trials. A static cursor indicated movement endpoint position for the subsequent 15 trials. After the baseline reaches, the adaptation portion of each block began unannounced to participants and continued until the participant performed 100 additional reaches of acceptable movement duration.

During the adaptation trials, endpoint position feedback was provided that did not necessarily correspond to the true hand position. Participants were instructed that endpoint feedback within the target would earn them bonus compensation, but no explicit reward feedback was provided during the experiment. To determine the feedback angle in the small and large perturbation conditions, in a randomly selected 50% of trials we added a rotation of 0.75 deg or 1.5 deg, respectively, to the true reach angle. In addition, on every trial, we subtracted an estimate of the current state of reach adaptation (eq. 3).

To illustrate the purpose of subtracting a running estimate of the current state of adaptation, we can consider the case in which reach angle was adapted by +2 deg through cumulative exposure to a -.75 deg rotation. If the state of adaptation is accurately estimated and subtracted from the true reach angle, then a reach angle of +2 deg will result in either unperturbed feedback at 0 deg or rotated feedback at -.75 deg. The online estimate of adaptation consisted of a running average of the previous 6 reach angles and a model of reach adaptation which assumed that participants would adapt to a fixed proportion of the reach errors experienced during the previous 3 trials. A windowed average centered around the current reach angle could serve as a low pass filter to estimate the current state of reach adaptation, but the running average during the experiment was necessarily centered behind the current reach angle. Thus, an online model was necessary to account for adaptation that would occur in response to errors experienced on trials included in the running average. Mis-estimation of the learning rate in the online model would lead to systematic bias in the feedback angle, while a perfect model would lead to unperturbed feedback that is distributed around 0 deg with variance that reflects the natural movement variability. An adaptation rate of 0.25 was chosen for the online model on the basis of pilot data.

This design allowed us to compare perturbed and unperturbed feedback in randomly intermixed trials, which is generally advantageous in ERP experiments. Previous studies have imposed a fixed perturbation throughout a block of trials and compared early trials, in which the visual error is large, to late trials, in which the error has been minimized through adaptation (Tan et al. 2014; MacLean et al. 2015). However, in such designs the independent variable is not experimentally controlled, and differences in neural response might be attributed to changes in the state of adaptation or simply habituation to feedback, as opposed to sensory error per se. Another alternative is to impose random rotations in either direction, but previous work has demonstrated that neural and behavioural responses are larger for consistent perturbations, presumably because the sensorimotor system attributes variability in feedback to noise processes and downweights adaptive responses accordingly (Tan et al. 2014).

As in the reward learning condition, we sought to limit the magnitude of adaptation to 6 deg in an attempt to minimize conscious awareness of the manipulation. This limit was implemented by reversing the direction of the perturbation whenever the average reach angle in the previous six movements differed from zero by at least 6 deg in the direction of intended reach adaptation. Reversing the direction of the perturbation caused participants to adapt in the opposite direction. Reach angle and feedback angle throughout a representative experimental block is shown in Figure 2.

X denotes feedback angle, θ denotes reach angle, and q denotes the perturbation. z denotes the direction of the perturbation (z = 1 for counter-clockwise perturbations, and z = −1 for clockwise perturbations). s denotes the size of the perturbation (.75 deg or 1.5 deg in the small and large error conditions, respectively). u is a discrete random variable that is realized as either 1 or 0 with equal probability (50%).

EEG Data Acquisition

EEG data were acquired from 16 cap-mounted electrodes using an active electrode system (g.Gamma; g.tec Medical Engineering) and amplifier (g.USBamp; g.tec Medical Engineering). We recorded from electrodes placed according to the 10-20 system at sites Fz, FCz, Cz, CPz, Pz, POz, FP1, FP2, FT9, FT10, FC1, FC2, F3, F4, F7, F8, referenced to an electrode placed on participants’ left earlobe. Impedances were maintained below 5 kU. Data were sampled at 4800 Hz and filtered online with band-pass (0.1-1,000 Hz) and notch (60 Hz) filters. The amplifier also recorded data from a photo-diode attached to the display monitor to determine the timing of stimulus onset.

Behavioral Data Analysis

Reward learning task

Motor learning scores were calculated for each participant as the difference between the average reach angle in the counter-clockwise learning blocks and the average reach angle in the clockwise learning blocks. We chose to assess reach angle throughout the entire task, as opposed to only in a window of trials at the end of each block, primarily because reach direction was often unstable and a smaller window was susceptible to drift. Furthermore, this metric of learning measured not only the final state of adaptation but also reflected the rate of adaptation throughout the block without assuming a particular function for the time course of learning. Lastly, this metric was not dependent on the choice of a particular subset of trials.

We excluded baseline trials and trials that did not meet the movement duration criteria, as no feedback related to reach angle was provided on these trials (6.5% of trials in the VMR task, 7.4% of trials in the RL task.)

Visuomotor rotation task

To quantify trial-by-trial learning we first calculated the change in reach angle between successive trials, as in eq. 4.

We then performed a linear regression on Δθ_i with the rotation imposed on the trial i as the predictor variable. The rotation was either 0, ± 0.75, or ± 1.5 deg. This regression was performed on an individual participant basis, separately for each of the four visuomotor rotation conditions (corresponding to feedback rotations of −1.5, −0.75, 0.75, and 1.5 deg). For these regressions, we excluded trials that did not meet the duration criteria or that resulted in a visual error of greater than 10 deg (M=2.65 trials per participant, SD=4.3), as these large errors were thought to reflect execution errors or otherwise atypical movements. We took the average of the resulting slope estimates across blocks, multiplied by −1, as a metric of learning rate for each participant, as it reflects the portion of visual errors that participants corrected with a trial-by-trial adaptive process. Based on simulations of our experimental design using a standard memory updating model (Thoroughman and Shadmehr 2000) (not described here), we found that it was necessary to perform the regression separately for each rotation condition, as collapsing across the different rotation sizes and directions could introduce bias to the estimate of learning rate.

EEG Data Denoising

EEG data were resampled to 480 Hz and filtered offline between 0.1-35 Hz using a second order Butterworth filter. Continuous data were segmented into 2 second epochs time-locked to feedback stimulus onset at 0 ms (time range: −500 to +1500 ms). Epochs containing artifacts were removed using a semi-automatic algorithm for artifact rejection in the EEGLAB tool-box (see Delorme and Makeig (2004) for details). Epochs flagged for containing artifacts as well as any channels with bad recordings were removed after visual inspection. Subsequently, extended infomax independent component analysis was performed on each participant’s data (Delorme and Makeig 2004). Components reflecting eye movements and blink artifacts were identified by visual inspection and subtracted by projection of the remaining components back to the voltage time series.

Event Related Component Averaging

We computed event related potentials (ERPs) on an individual participant basis by trial averaging EEG time series epochs after artifact removal. We selected trials corresponding to various feedback conditions in each task as described in the following sections. For each feedback condition ERPs were computed on an individual participant basis separately for recordings from channels FCz and Pz. All ERPs were baseline corrected by subtracting the average voltage in the 75 ms period immediately following stimulus onset. We chose to use a baseline period following, as opposed to preceding, stimulus onset because stimuli were presented immediately upon movement termination, and the period before stimulus presentation was more likely to be affected by movement related artifacts. Importantly, we did not observe any ERPs with onsets within the baseline period. Trials in which reaches did not meet the movement duration criteria were excluded, as feedback relevant to reach adaptation was not provided on these trials (6.5% of trials in the VMR task, 7.4% OF trials in the RL task.).