Abstract
Cortical entrainment of the auditory cortex to the broad-band temporal envelope of a speech signal is crucial for speech comprehension. This entrainment results in phases of high and low neural excitability which structure and decode the incoming speech signal. Entrainment to speech is strongest in the theta frequency range (4–8 Hz), the average frequency of the speech envelope. If a speech signal is degraded, for example masked by irrelevant information such as noise, entrainment to the speech envelope is weaker and speech intelligibility declines.
Besides perceptually evoked cortical entrainment, transcranial alternating current stimulation (tACS) can entrain neural oscillations by applying an electric signal to the brain. Accordingly, tACS-induced entrainment in auditory cortex has been shown to improve auditory perception. The aim of the current study was to externally modulate speech intelligibility by means of tACS such that the electric current corresponds to the envelope of the presented speech stream.
Participants performed the Oldenburg sentence test with sentences presented in noise in combination with tACS. Critically, the time lag between sentence presentation and tACS was manipulated from 0 to 250 ms in 50-ms steps (auditory stimuli were simultaneous to or preceded tACS).
First, we were able to show that envelope-tACS modulated sentence comprehension such that on average sentence comprehension at the time lag of the best performance was significantly better than sentence comprehension of the worst performance. Second, sentence comprehension across time lags was modulated sinusoidally.
In sum, envelope tACS modulates intelligibility of speech in noise presumably by enhancing (time lag with in-phase stimulation) and disrupting (time lag with out-of-phase stimulation) cortical entrainment to the speech envelope in auditory cortex.
Introduction
The life span of the world population is continuously increasing. This increase in age entails a strong increase in hearing loss. In 2030, the prevalence of clinically relevant hearing loss is expected to be doubled. Consequently, hearing loss will be one of the seven most prevalent chronic diseases (WHO 2004; WHO 2013).
Despite huge advances in the development of hearing aid technology, challenging listening situations such as the so-called cocktail party problem (Cherry, 1953) are still not entirely compensated for. The present study will suggest transcranial alternating current stimulation (tACS) as a possibility to improve speech comprehension in adverse listening situations.
The ability to comprehend speech is tightly linked to the neural oscillatory response in auditory cortex. That is because auditory cortex tracks the spectro-temporal structure of speech also known as cortical entrainment (e.g., Abrams et al., 2008; Lalor & Foxe, 2010; Luo & Poeppel, 2007). In general, neural oscillations are known to entrain to the temporal structure of external stimuli. Across different sensory modalities, it has been shown that a rhythmically presented stimulus leads to a phase and amplitude alignment of neural oscillations within the frequency of the presented rhythm. Entrainment of oscillations results in distinct phases of high and low excitability on the population level, thereby generating time windows of a higher firing likelihood (Engel et al., 2001; Lakatos et al., 2005). This synchronization of neural oscillations and external stimuli facilitates the processing of relevant information (Lakatos et al., 2008; Schroeder et al., 2010).
Entrainment to the broad band temporal envelope of speech has been shown to be crucial for speech comprehension (e.g., Doelling et al., 2014). The strongest response can be found in the theta range around 4–8 Hz which corresponds to the average frequency rate of a speech signal (Chandrasekaran et al., 2009; Ghitza & Greenberg, 2009). The functional role of cortical entrainment to the envelope has been widely discussed (for a review, see Ding & Simon, 2014;). Whereas some argue that entrainment or phase locking to the envelope only tracks the acoustic properties of the envelope (Howard & Poeppel, 2010; Millman et al., 2015; Steinschneider et al., 2013) others argue that entrainment is a mechanism of syllabic parsing (Giraud & Poeppel, 2012) or sensory selection, (i.e., segragating the speech stream from background noise; Schroeder & Lakatos, 2009). Overall, neural oscillations in the 4–8-Hz range are supposed to be a means to structure and decode the incoming acoustic stream (for a review, see Peelle & Davis, 2012).
The extent to which auditory cortex is able to track a speech stream depends on the amount of spectral detail. That is, the more degraded the speech stream the worse the neural tracking of the envelope (Peelle & Davis, 2012). Along these lines, Ahissar et al. (2001) could show that the ability to comprehend speech depends on how well auditory cortex tracks the speech signal (see also Peelle et al., 2013). A study by Ding and Simon (2013) manipulated the signal-to-noise ratio of continuous speech and stationary noise. They were able to show that entrainment to the envelope in the 4 to 8-Hz range decreases with increasing noise level. These findings indicate that entrainment to the speech envelope is correlated with noise level and speech intelligibility.
If a successful entrainment of oscillations in the auditory cortex is essential for speech intelligibility, it should be possible to increase intelligibility of an acoustically degraded speech signal by inducing cortical entrainment and to decrease intelligibility by disrupting cortical entrainment. If this would be established, the speech comprehension of hearing aid users who experience hearing impairments could be improved by externally inducing cortical entrainment to the speech envelope. One technique that is capable to modulate ongoing brain oscillations is transcranial alternating current stimulation (tACS). tACS induces cortical entrainment in a frequency specific manner (for reviews, see Antal & Paulus, 2013 and Herrmann et al., 2013; Helfrich et al., 2014; Zaehle et al., 2010). By modulating the resting membrane potential of neurons via tACS, oscillatory activity is guided by the external oscillations on the population level (Fröhlich & McCormick, 2010). At the same time, tACS is able to disrupt entrainment by means of stimulation that is anti-phasic to synchronized cortical activity (Helfrich, Knepper, et al., 2014; Strüber et al., 2014).
Similar to entrainment to external stimuli, tACS induced entrainment has been shown to modulate auditory perception (for a review, see Heimrath et al., 2016). For example, tACS in the alpha frequency range (10 Hz) led to cortical entrainment which in turn modulated detection of pure tones embedded in noise in a sinusoidal manner. That is, pure-tone detection depended on the phase of the stimulation signal (Neuling et al., 2012). For tACS in the delta frequency range (4 Hz), phase dependent detection of near-threshold auditory click trains could be shown as well (Riecke et al., 2015). Both studies were able to provide insight that entrainment induced by tACS modulates the perception of near-threshold auditory stimuli. Two studies that investigated the impact of tACS on speech stimuli showed that phoneme detection with syllables is modulated by 40-Hz tACS in younger as well as in hearing impaired adults (Rufener, Oechslin, et al., 2016; Rufener, Zaehle, et al., 2016).
The aim of the current study was to test whether tACS modulates speech intelligibility. Participants performed the well-established Oldenburg sentence test (OLSa; Wagner et al., 2001) with sentences presented in noise in combination with tACS. Critically, the tACS signal corresponded to the broad-band envelope of the presented sentences (i.e., envelope tACS), in order to simulate the LFPs during presentation of clear speech. Here, we manipulated the time lag between sentence presentation and tACS from 0 to 250 ms in 50-ms steps (auditory stimuli were simultaneous to or preceded tACS). This was done for two reasons: First, so far it is unclear what time lag is appropriate such that tACS onset and auditory stimulation onset are aligned in auditory cortex. Second, in order to observe a modulation of sentence comprehension by tACS, different time lags allowed for aligned/in-phase and possibly beneficial entrainment as well as out-of-phase or disrupting entrainment possibly impeding sentence comprehension. Overall, we expected to show that envelope tACS in auditory cortex modulates sentence comprehension and that sentence comprehension fluctuates across time lags.
Methods
Participants
Nineteen healthy young adults (11 female, mean age = 23.5 years) without prior or current neurological or psychiatric disorders participated in the study. The experimental protocol was approved of by the ethics committee of the University of Oldenburg, and has been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki. All participants gave written informed consent before the beginning of the experiment.
Experimental procedure
Each participant completed the experiment in two sessions on two different days in order to assure tACS to be below 20 minutes per day. The participants were seated upright in a recliner in a dimly lit room before the stimulation electrodes and headphones were attached. Afterwards, the tACS threshold was determined. To this end, we started the stimulation (3 Hz, 5 sec) with an intensity of 400 μA and asked the participants to indicate whether they perceived a skin sensation or phosphene. The stimulation intensity was increased by steps of 100 μA until the participant indicated skin sensation or phosphene perception or an intensity of 1500 μA was reached. In case the participant already reported an adverse effect at 400 μA, the intensity was reduced to a start level of 100 μA. Stimulation intensity throughout the experiment corresponded then to the highest intensity that was not perceived by the participant. Then, the participant’s absolute threshold of hearing was determined for each ear (i.e., subjective hearing threshold). After that the participant was familiarized with the speech test (Oldenburger sentence test, see below). Subsequently, the participant completed 9 lists of the test that were randomly assigned to 9 tACS conditions (Figure 1A), which took around 45 minutes. The design was double-blind, i.e., neither the participant, nor the experimenter knew the order of the conditions before the end of the experiment. After each list, the participants had to indicate whether they perceived the electrical stimulation. Following the speech test, the participants were asked to complete a questionnaire on possible adverse effects of tACS (Neuling et al., 2013). See Figure 1A for a schematic display of the experimental procedure.
Oldenburg sentence test
The Oldenburg sentence test (OLSa; Wagner et al., 2001) is an adaptive test that can be used to obtain a speech comprehension threshold (SCT) during noise. For the test, 30 sentences that consisted of five words are presented and the participant is asked to repeat the sentences orally. The test material consists of 40 test lists of which 22 (on two days) were randomly chosen for each participant. For each test session, the two test lists were used to familiarize the participant with the test material according to the handbook. Subsequently, nine lists were presented, separated by self-paced breaks.
The presented sentences were sampled at 44100 Hz and delivered via MATLAB to a D/A converter (NIDAQ, National Instruments, TX, USA), attenuated separately for each ear (Tucker-Davis Technologies, Alachua, FL, USA; model PA5), and presented via headphones (HDA 200, Sennheiser, Germany). The noise of the OLSa was presented at 65 dB above individual hearing threshold, the intensity of the sentences was adjusted according to the test manual. The presented noise onset had the same spectral characteristics as the presented sentence, in order to obtain an optimal masking effect of the sentence. The noise was presented from 0.5 s prior to sentence onset until 0.5 s after sentence offset. For each OLSa list, a speech comprehension threshold (SCT) was computed, which is the difference of the sound pressure of the speech signal and the noise of the last 20 sentences. For example, a SCT of −7 dB implies that 50 % of the sentences presented with an intensity lower than 7 dB lower than the noise were repeated correctly.
Transcranial alternating current stimulation
tACS was applied using a battery-driven NeuroConn DC Stimulator Plus (NeuroConn GmbH, Ilmenau, Germany). Stimulation electrodes were placed in a bipolar montage on Cz (7 × 5 cm) and bilaterally over the primary auditory cortices on T3 and T4 (4.18 × 4.18 cm) according to the international 10-20 EEG system (Figure 1B). The impedance was kept below 10 kΩ by applying a conductive paste (Ten20, D.O. Weaver, Aurora, CO, USA).
The stimulation signal corresponded to the envelope of the concurrent speech signal (i.e., envelope tACS). As control condition, anodal and cathodal direct current (DC+, DC−) matched to the duration of the speech signal were stimulated as well as no electrical stimulation (sham) (see Figure 1B). The envelope-tACS was generated by extracting the envelope of each OLSa sentence. Here, the absolute values of the Hilbert transform of the audio signal were computed and then filtered with a second order Butterworth filter (10 Hz, low-pass). To maximize tACS efficacy, peaks and troughs exceeding 25 % of the highest absolute peak were scaled to 100 %. This way, the peak-to-peak intensity was normalized.
Kubanek and colleagues (2013) reported a lag of around 100 ms between speech envelope and auditory cortex oscillations that are entrained by the signal. In order to find the time lag at which auditory stimulation and tACS were aligned in auditory cortex (highest behavioral effect) and also to test how tACS modulates speech comprehension across different time lags, envelope tACS was assigned to 6 different delay conditions (see Figure 1A). Envelope tACS was initiated between 0 and250 ms (50 ms step-size) after the onset of the speech signal. The tACS signal (sampled at 44100 Hz) was delivered via MATLAB to the D/A converter and fed into the external signal input of the DC Stimulator.
Data Analysis
For each participant, we computed the SCT for each of the nine tACS conditions (see Figure 2A). We contrasted best and worst performance (only the six envelope-tACS conditions) of each participant with a t-test, regardless of the time lag. Then, in order to understand the nature of the modulation of sentence comprehension by tACS, each participant’s performance data were peak-aligned to the best performance and z-transformed. We determined this peak to be at bin three. That is the position of the time lag of 100 ms, corresponding to the median time lag of the best performances of all participants (see below). Due to the circular fluctuation of the SCTs (seeFigure 2A) we fitted a sine wave to the data, comparable to the approach of Neuling et al. (2012; see also Naue et al., 2011). As a control, we also performed a linear fit and a quadratic fit on the aligned performance data of each participant. For the sine wave fit, the following equation was fitted: y = a* sin(f *2π * x + c), where x corresponds to the tACS bins [0;250 ms]. The parameters a, f, and c were estimated by the fitting procedure, where a was the amplitude of the sine wave, f the frequency, and c the phase shift. Parameter a was bound between zero and three (note that data were z-transformed and not likely to exceed a z-value of three), f between two and eight (we expected that a modulation frequency would be in the theta range comparable to the syllable rate of the sentences), and c between zero and 12.57 (4 * π). The initial parameters inserted in the function were randomly drawn from within the range restrictions of each parameter, respectively. For the linear fit (y = ax), the coefficient a was computed with a random starting value. The quadratic fit (y = ax2 + bx) was computed to estimate the linear coefficient b and the quadratic coefficient a. Note that all of the equations do not contain a parameter for the intercept because the fitted data was z-transformed and hence the mean was zero. The model fits were computed with the lsqcurvefit function with Matlab (version 8.0, Optimization Toolbox) that allowed for 1000 iterations in order to find the best model. Coefficients of determination R2 was reported for each fit as an indicator for the goodness of the model fit.
Finally, the Bayesian information criterion (BIC; Schwarz, 1978) was calculated for each function, in order to compare the goodness of fit of the sinusoidal, quadratic, and linear fits. The BIC corrects for an unequal number of parameters. BIC scores were compared with an rmANOVA and post-hoc t-tests, in order to determine the function that represents the modulation in sentence comprehension best. T-tests were FDR-corrected. All analyses were exclusively performed on the envelope-tACS conditions and not on tDCS as well as sham (see Figure 1) in order to assess the possible modulation of speech comprehension solely based on the effect of varying time lags.
Results
The aim of the present study was to find out whether envelope tACS modulates sentence comprehension in noise and to specify the nature of the modulation. Participants listened to sentences with different signal-to-noise ratios. Their ability to comprehend the presented sentences was quantified by the sentence comprehension threshold (SCT), the signal-to-noise ratio at which participants comprehended 50 % of a sentence. tACS was induced simultaneously or with a time lag of 50 to 250 ms. Figure 2A depicts the SCTs for each time lag (i.e., delay between auditory onset and tACS onset), as well as anodal and cathodal tDCS and sham. To test whether tACS at different time lags led to a modulation of sentence comprehension at all, we sorted the SCTs by best and worst performance of each participant and compared both groups with a t-test (Figure 2B). The t-test showed that the SCTs of the best performances were significantly lower than the SCTs of the worst performances (t(18) = 8.8, p < 0.0001, Figure 2B). This effect indicates that tACS modulated sentence comprehension. The overall improvement in sentence comprehension was 0.86 dB on average (sd = 0.43 dB). The median time lag that led to the best performance was 100 ms and for the worst was 50 ms. Critically, time lags improving or impeding performance were not consistent across participants.
Therefore, for further analyses all participants’ SCTs of all six time lags were peak-aligned such that the best performance condition was at the third position, originally corresponding to 100 ms (Figure 3A, left panel). Next, a sine fit, a quadratic fit, and a linear fit were performed on the aligned data, in order to understand the nature of the tACS induced modulation of speech comprehension.
The grand average and single-subject sinusoidal fits are displayed in Figure 3A. The average goodness-of-fit is R2 = 0.73 (sd = 0.13). Parameters were estimated resulting in this equation, averaged across participants: y = 1.15 * sin(6.1 *2π * x + 6.04). The quadratic fit had an average goodness-of-fit of R2 = 0.13 (sd = 0.11) with coefficients of y = 23.03x — 4.63x (see Figure 3B for grand average and single-subject fits). The average goodness-of-fit of the linear fit was R2 = 0.04 (sd = 0.04) with the following coefficient: y = 0.08x. The rmANOVA across the BIC scores of the three fits showed that the goodness-of-fit differed significantly (F(2,36) = 41.09, p <0.0001). Post-hoc t-tests revealed that the BIC score of the sinusoidal fit was significantly lower than the scores of the quadratic and linear fit (quadratic: t(18) = –7.18; p < 0.0001; linear: t(18) = –5.95, p = 0.0002). That means that the goodness-of-fit of the sinusoidal fit was significantly better than the fit of the quadratic and linear functions. Thus, the sine wave represents the modulation of speech intelligibility by envelope-tACS best.
Discussion
The aim of the present study was to show that envelope-tACS in auditory cortex modulates sentence comprehension. We employed a speech intelligibility task in combination with envelope-tACS induced at different time lags to find the delay with the highest behavioral benefits.
The present study provides first evidence that sentences comprehension can be modulated by envelope-tACS and that this modulation fluctuates sinusoidally across different time lags. This finding supports previous claims on the important role of cortical entrainment in auditory cortex for speech comprehension (for a review, see Zoefel & VanRullen, 2015). The sinusoidal fluctuation is in line with the results of Neuling et al., 2012. They were able to show that pure-tone detection depends on the phase of the tACS signal. That is, the optimal tACS phase allowed increased excitability of auditory cortex and consequently improved auditory perception. Similarly, time lags in the present study varied to assess at which time lag the envelope-tACS signal and the speech envelope are aligned and in phase in auditory cortex. That way, tACS is assumed to induce a phase reset at the point in time when the speech signal reaches auditory cortex at the optimal phase. Consequently, with a different time lag tACS and the speech signal were not aligned. That most likely disrupted entrainment of auditory cortex to the speech envelope and thereby disrupted speech comprehension.
However, the sinusoidal modulation of speech comprehension implies that after one cycle of the oscillation or sine curve, tACS again appears to be more beneficial for sentence comprehension. This is most likely due to the periodicity of entrained neural oscillations and the speech envelope. The circular regularities of the envelope (speech as well as tACS) most probably induced an oscillatory neural response in auditory cortex due to cortical entrainment of the envelope. Consequently, speech comprehension, depending on entrainment to the envelope oscillates in this circular manner. The average frequency of the fitted sine waves was 6.1 Hz which is within the range of the amplitude modulation of speech at frequencies that are crucial for intelligibility (4 to 16 Hz; Drullman et al., 1994; Shannon et al., 1995).
The median of the time lags with the best performance was at 100 ms after speech onset. This time lag corresponds to the latency of the auditory N100, a negative evoked potential elicited by auditory stimulation after approximately 100 ms (for a review, see Näätänen & Picton, 1987). The N100 has been argued to be a signature of auditory pattern recognition and integration (Näätänen & Winkler, 1999). Although its characteristics such as amplitude, latency, and origin can be manipulated by different features of a stimulus, the N100 remains one of the robust initial responses to speech (and non-speech sounds) in auditory cortex. Moreover, studies computing cross correlations between the envelope of a sentence and the respective EEG response or computing spectro-temporal response functions found the peak of the effects at a latency of 100 ms (Ding & Simon, 2012; Horton et al., 2013). Ding and Simon explain that this latency indicates that the EEG response to a speech signal is driven by the amplitude modulations of the stimulus 100 ms ago.
Aiken and Picton (2008) report that their transient response models, another measure for the relationship between speech signal and EEG response, peaks at a later latency approximately around 180 ms. These divergent findings on the latency of the EEG response are in line with our findings. Although, our data show that an envelope-tACS lag of 100 ms has a strong sentence comprehension benefit, the individual latency of best comprehension performance was found at any of the six time lags ranging from 0 ms to 250 ms. It appears as if the latency from stimulus presentation to entrainment in auditory cortex varies strongly across individuals. Thus, to estimate the individual time lag of auditory stimulation and tACS, further experiments will be needed to measure individual entrainment latencies with EEG prior to envelope-tACS application.
The importance of studying transcranial electric stimulation (tES) lies in its possibilities of clinical applications. Different studies have demonstrated the diverse clinical benefit of tACS (for a summary of therapeutical applications of tES, see Vosskuhl et al., 2015). Prehn-Kristensen et al. (2014), for example, showed that 0.75 Hz tES improves slow wave sleep thereby enhancing the consolidation of declarative memory in children with attention-deficit/hyperactivity disorder. Furthermore, different studies investigated the effect of repetitive transorbital alternating current stimulation (rtACS) on neurological deficits. Here, the stimulating electrodes are placed at or near the eyeball. rtACS has been shown to significantly restore vision in optic neuropathy (Sabel et al., 2011) as well as in stroke patients suffering from optic nerve damage (Gall et al., 2016). Fedorov et al. (2010) also induced rtACS on ischemic stroke patients and showed that non-invasive rtACS modulates brain plasticity and enhances neurological recovery. tACS has also been successfully applied in the treatment of Parkinson’s disease. Brittain et al.(2013) were able to significantly reduce tremor by inducing indivudally adjusted tACS in- and out-of phase of the frequency of the tremor rhythm. With a different approach, Krause et al. (2014) improved motor performance in patients suffering from Parkinson’s disease by inducing 20-Hz tACS to the motor cortex.
Along these lines, the results of the present study pave the way for another clinical application of tACS. The reported data support the hypothesis that envelope-tACS is beneficial for the intelligibility of sentences with a bad signal-to-noise ratio (SNR). People suffering from hearing loss experience bad SNRs constantly. For example, hearing aids are fitted to improve the SNR by filtering, amplifying, and compressing acoustic signals. Digital hearing aids, in contrast to analog hearing aids, even respond to ongoing analysis of the signal and the background on a more complex level (for a review, see Pichora-Fuller & Singh, 2006). In addition to the conventional mode of operation of hearing aids, envelope-tACS could be applied to counteract hearing impairment. Envelope-tACS would then improve the neural SNR in auditory cortex by enhancing cortical entrainment to the relevant speech envelope and thereby improving speech comprehension. However, cortical responses to speech in older adults, the main audience of hearing aid users, need to be investigated in order to successfully implement envelope tACS in hearing devices.
Conclusion
Altogether, in this study we have demonstrated that tACS indeed modulates intelligibility of speech in noise presumably by enhancing and disrupting cortical entrainment to the speech envelope in auditory cortex. We were able to show that there is a time lag at which tACS appears to be most beneficial albeit the optimal time lag varies between participants. Moreover, we suggest that envelope tACS has the potential to counteract hearing impairment when implemented in hearing aids.
Author contributions
TN, CSH: designed the study; TN: acquired the data; TN, AW: analyzed the data; AW, TN, CSH: wrote the article.
Conflict of Interest Statement
CSH has applied for a patent for the method of envelope-tACS and has received honoraria as editor from Elsevier Publishers, Amsterdam.
Acknowledgments
This research was funded by German Research Foundation (Deutsche Forschungsgemeinschaft, DFG Cluster of Excellence 1077 “Hearing4all”). We thank Jonas Obleser for fruitful discussions.