Abstract
Spoken language is thought to be facilitated by an ensemble of predictive mechanisms, yet the neurobiology of prediction for both speech perception and production remains unknown. We used intracranial recordings (31 patients, 6580 electrodes) from depth probes implanted along the anteroposterior extent of the supratemporal plane during rhythm listening, speech perception, and speech production. This revealed a frequency-multiplexed encoding of sublexical features during entrainment and a traveling wave of high-frequency activity across Heschl’s gyrus. Critically, we isolated two predictive mechanisms in early auditory cortex with distinct anatomical and functional characteristics. The first mechanism, localized to bilateral Heschl’s gyrus and indexed by low-frequency phase, predicts the timing of acoustic events (“when”). The second mechanism, localized to planum temporale in the language-dominant hemisphere and indexed by gamma power, predicts the acoustic consequence of speech motor plans (“what”). This work grounds cognitive models of speech perception and production in human neurobiology, illuminating the fundamental acoustic infrastructure – both architecture and function – for spoken language.
Introduction
Humans efficiently extract speech information from noisy acoustic signals and segment this into meaningful linguistic units. This complex and poorly understood process is fluidly accomplished for a wide range of voices, accents, and speaking rates1,2. Given the quasi-periodic nature of speech3, the computational load associated with its decoding can be reduced by utilizing temporal prediction. Anticipating the arrival of salient acoustic information enables optimal potentiation of neural networks4–7 and discretization of the continuous signal into linguistic elements4,8,9. Evidence for cortical entrainment – the synchronization of extrinsic pseudo-periodic stimuli and intrinsic neural activity – to speech10–14 have driven speculation that cortical oscillations may encode temporal prediction.
The production of speech is another human ability that relies on predictive mechanisms. There is now strong evidence that speech planning involves prediction of the sensory consequences of the action15–20. It remains unclear, however, which levels of auditory cortical processing are involved in this process and where such mechanisms are instantiated in the cortex.
The identification and analysis of predictive mechanisms for language function requires a methodology with high temporal resolution, fine spatial resolution, and direct access to neuronal populations in human early auditory cortex. We used large-scale intracranial recordings (31 patients, 6580 electrodes), focusing on depth electrodes placed in an innovative trajectory along the anteroposterior extent of the supratemporal plane. We investigated cortical entrainment and prediction during a novel amplitude-modulated white noise stimulus, as well as during natural language speech. These experiments yield crucial insights into the rapid, transient dynamics of prediction for “when” and “what” in Heschl’s gyrus and planum temporale.
Results
Entrainment to Low-Level Acoustic Stimuli
Recordings along the supratemporal plane revealed entrainment of early auditory cortex to rhythmic amplitude-modulated white noise (80% depth at 3 Hz for 3 seconds, then constant amplitude for 1 second; Figure 1A). Heschl’s gyrus and the transverse temporal sulcus (HG/TTS; Figure 1B) encoded stimulus features in gamma power (65-115 Hz, Figure 1C) and low-frequency phase (2-15 Hz, Figure 1D). Following a low-latency high-magnitude broadband response to stimulus onset, this region entrained to subsequent acoustic pulses. Phase space trajectories of gamma power (Figure 1G) and low-frequency phase (Figure 1H) revealed three clearly dissociable states corresponding to rest (pre-stimulus), stimulus onset, and entrainment (beginning with the second pulse). These results were robust across the patient cohort; at least one electrode recorded an acoustic entrainment response in all patients with a supratemporal depth probe in the language dominant hemisphere (n = 18). Patients with homologous electrodes in the language non-dominant hemisphere (n = 4) demonstrated equivalent entrainment.
Gamma, beta, and low-frequency power together yielded a frequency-multiplexed encoding of acoustic envelope (Figure 1E). Gamma power was in-phase with the stimulus, beta power was resynchronized at the trough of the stimulus, and low-frequency power was modulated by the rising edge of each pulse. The unique encoding at each frequency band suggests distinct functional channels for acoustic processing. In contrast, only low-frequency phase was reset at the rising slope of each pulse – the acoustic edge. Phase reset was not observed in beta or gamma. These encodings in power and phase were found to generalize for faster modulations of the temporal envelope (5 Hz and 7 Hz, Extended Data 3).
During entrainment, we also resolved the spatiotemporal topography of gamma power along the mediolateral extent of HG/TTS (Figure 1F). A traveling wave of cortical activity coincided with each acoustic pulse, beginning at medial HG/TTS adjacent to the inferior circular sulcus of the insula and propagating laterally across the supratemporal plane to the lip of the lateral fissure (see video: Figure 1I). Each wave began approximately 80 ms before the acoustic pulse maximum and ended approximately 80 ms afterwards, traversing HG/TTS at a speed of 0.1 m/s.
Distinct Substrates Encode Acoustic Onset and Entrainment
Immediately posterior to HG/TTS in the planum temporale (PT), a distinct functional region generated a transient response to white noise. This region featured a high-magnitude increase in gamma power accompanied by broadband low-frequency phase reset that returned to pre-stimulus baseline activity after a single acoustic pulse. We separated this transient response from entrainment using non-negative matrix factorization – an unsupervised clustering algorithm – across all supratemporal electrodes (n = 289). This analysis revealed a distinct anteroposterior response gradient from sustained entrainment in HG/TTS to transient activity in PT (Figure 2B,D,G). This spatial distribution was significant for both gamma power (Figure 2A; rs = 0.4392, p < 10−5) and low-frequency phase (Figure 2C; rs = 0.7508, p < 10−16). Classification by both measures were strongly correlated (Figure 2E; rs = 0.4768, p < 10−8); only 4 of 289 electrodes showed a mixed classification (i.e. entrainment bias in gamma power with transient bias in low-frequency phase, or the reverse; Figure 2F). The entrainment response was noted in both language dominant and non-dominant cortex, but the transient response was limited to language dominant cortex.
The spatial topology of early auditory cortical responses was further elucidated within a single patient who underwent two separate implants (Extended Data 1): one with surface grid and another with depth electrodes. Strong entrainment encoding in HG/TTS and a robust transient response in PT were observed at electrodes along the supratemporal depth probe, but not at any subdural electrodes directly overlying superior temporal gyrus. This unique case indicates that the entrainment and transient responses are selectively encoded by early auditory cortex. Crucially, lateral superior temporal gyrus does not appear to be engaged in acoustic entrainment driven by sublexical features21.
Prediction of Low-Level Acoustic Stimuli
To isolate neural mechanisms supporting prediction in HG/TTS, we quantified the persistence of entrainment to a 3 Hz acoustic envelope after the stimulus rhythm ceased (Figure 3A). Low-frequency phase maintained an entrained state for one cycle after the last acoustic pulse (Figure 3B); by the second cycle, this temporal organization of cortical phase was not significantly distinct from pre-stimulus baseline. In contrast, the entrained relationship between gamma power and the acoustic envelope did not carry predictive information in either cycle after the last acoustic pulse (Figure 3C). Thus, prediction in early auditory cortex is best modeled by low-frequency phase reset at acoustic edges. This neural mechanism is engaged within a single cycle of a rhythmic acoustic stimulus and remains active for at least one cycle afterwards. Such a neuro-computational solution for entrainment and prediction provides a neurobiological basis for cognitive models of speech perception4,8,9,22.
Entrainment to Natural Language Speech
In a second experiment, patients (n = 20) named common objects cued by short spoken descriptions (e.g. they heard “a place with sand along a shore” and articulated “beach”). For each sentence (Figure 4A), we extracted a pair of key features suggested by our analysis of rhythmic white noise: acoustic envelope and edges. The former describes the instantaneous amplitude of speech, while the latter demarcates moments of rapid amplitude gain. We evaluated the engagement of neural substrates that entrained to the white noise stimulus during natural language speech. Power in HG/TTS was significantly correlated with the acoustic envelope of speech (low-frequency, rs = −0.0659, p < 10−3; beta, rs = −0.0532, p < 10−3; gamma, rs = 0.0736, p < 10−3; Figure 4B) at a frequency-specific delay (low-frequency, 135 ms; beta, 100 ms; gamma, 60 ms; Figure 4C). Low-frequency phase organization in HG/TTS was significantly increased during the 125 ms following acoustic edges in speech (p < 10−3; Figure 4D). Furthermore, it was significantly greater following acoustic edges than following syllabic onsets (p = 0.0061; Figure 4D) – a similar characteristic, but derived from and specific to speech. These findings are concordant with the frequency-multiplexed encoding of acoustic envelope and the low-frequency phase reset at acoustic edges observed during entrainment to the white noise stimulus. The neural response was preserved during reversed speech, emphasizing the sublexical nature of this process. The cortical encoding of the speech envelope (Figure 4C) and of edges (Figure 4E) was localized to HG/TTS – the same supratemporal region that demonstrated entrainment and prediction for the white noise stimulus.
Natural language speech recruited a much broader set of neuroanatomic substrates than white noise, including planum polare, lateral superior temporal gyrus, and superior temporal sulcus (Figure 4F). In the patient with both surface grid and depth electrodes, only speech induced significant activity in the lateral temporal grid electrodes (Extended Data 1). This supplementary speech-specific cortex is presumably engaged for the downstream processing of higher-order language features (e.g. phonemes23).
Supratemporal Dissociations in Speech Perception and Production
We compared neural activity in both HG/TTS and PT during listening and speaking – externally and internally generated speech. In each patient with a supratemporal depth probe, the pair of electrodes with the strongest entrainment and transient responses were identified during the rhythmic white noise condition. These criteria selected electrodes in HG/TTS and PT, respectively (Figure 5A). Gamma power in these regions was analyzed relative to sentence and articulation onset for a representative individual (Figure 5B) and across the group (Figure 5C). HG/TTS responded strongly during both listening and speaking, remaining active for the duration of each sentence and throughout articulation. In contrast, PT also responded strongly following sentence onset with peak activity at 100 ms; however, this region was quiescent during articulation (Figure 5D).
We further characterized the spatial distribution of the transient response during speech listening and its suppression during speech production using non-negative matrix factorization. As for the white noise stimulus, gamma power yielded sustained entrainment and transient response types (Figure 6A) along a robust anteroposterior distribution (Figure 6E,G; rs = 0.4869, p < 10−10). These were strongly correlated with the class biases for white noise listening (Figure 6B; rs = 0.6029, p < 10−20). When this factorization was applied to gamma power during articulation (Figure 6C,F,H), the sustained entrainment response was preserved (rs = 0.6764, p < 10−16) while the transient response type was suppressed (rs = 0.0935, p = 0.4726). Of the 30 electrodes demonstrating a transient response during speech listening, only 1 retained this classification during articulation (Figure 6D). The functional dissociation at PT between externally and internally generated speech provides the first direct evidence for the theory of predictive coding during speech production22,24 via motor-to-sensory feedback25–27.
Discussion
Direct intracranial recordings of the supratemporal plane resolved the functional architecture of entrainment and prediction in human early auditory cortex at an unprecedented resolution and scale. Entrainment to speech engages a frequency-multiplexed encoding of two sublexical acoustic features: envelope and edge. A pair of distinct neuroanatomic substrates perform predictive encoding: “when” by HG/TTS in low-frequency phase and “what” by PT in gamma power. The identification and characterization of these mechanisms advances the understanding of how human cerebral cortex parses continuous acoustic input during both speech perception and production.
Entrainment and Prediction of “When” in Heschl’s Gyrus
Entrainment is the synchronization of two quasi-periodic systems – intrinsic neural oscillations with extrinsic rhythmic signals. It is thought to play an important role in a variety of cognitive processes including attentional selection7,28,29 and internal timekeeping30–33. Furthermore, entrainment is axiomatic to leading models of speech comprehension4,8,9,22. These theories are supported by evidence that speech envelope distortions impair comprehension34–36 independent of spectral content37–39 and that the degree of neuro-acoustic entrainment modulates intelligibility40–43.
Entrainment has been variably characterized as either the encoding of envelope amplitude in bandlimited cortical power44–47 or of discrete segmental events in evoked response potentials7,10,41,48,49. Using electrodes positioned along the anteroposterior extent of the supratemporal plane, we localized the cortical signature of entrainment to strictly early auditory cortex: Heschl’s gyrus and the transverse temporal sulcus50–52. This signature was considerably more complex than that suggested by prior studies, comprising a frequency-multiplexed encoding of envelope phase – distinct for rising and falling amplitudes of the same magnitude – in low-frequency, beta, and gamma power. We also identified a separate, concurrent encoding of acoustic edges in low frequency phase reset. This encoding uniquely persisted after the entraining stimulus ended, consistent with the behavior of a predictive neural mechanism. Importantly, identical cortical substrates of entrainment were engaged during natural sentence listening.
Our findings generate insights to the nature of entrainment and its support of speech perception. First, acoustic entrainment has been described by others as either a “continuous mode” for acoustic processing29,53–57 or simply a recurring series of transient evoked responses10,58,59. The former interpretation is most consistent with our observations of a non-adapting entrained state that is distinct from the evoked response at stimulus onset and that endures after the entraining stimulus ends. Second, in contrast to prior studies42,45, we found that reversed speech drove an equivalent degree of entrainment in early auditory cortex; furthermore, acoustic edges were more strongly encoded in cortical phase than syllabic onsets – a linguistic feature with similar frequency and periodicity. This supports the assertion60 that entrainment is driven by sublexical acoustic processing, perhaps even inherited from subcortical regions (e.g. medial geniculate nucleus). Third, the multiband encoding of acoustic envelope in cortical power is richer than has been previously suggested4,8,9. While gamma power does track the instantaneous acoustic envelope, both beta and low frequency power contribute unique information. This supports frequency-multiplexed acoustic processing22,61–63 with each band representing distinct channels of information exchange64–67.
The utility of entrainment is thought to be the organization of transient excitability states within neuronal populations68–72. Discrete high excitability periods constitute “windows of opportunity” for input into sensory cortex, as evidenced by peri-threshold detection studies in somatosensory73, visual33,74,75, and auditory46,60,76 regions. During listening, such windows might serve to segment speech to facilitate comprehension77. More generally, the temporal organization of high excitability periods could serve to minimize temporal uncertainty in stimulus processing and detection7,78–80. This view was corroborated by a behavioral study of responses to the same white noise stimulus used in these experiments that revealed a striking relationship between detection accuracy and the preceding rhythmic stimulus81. With direct intracranial recordings, we found that low-frequency phase reset anticipates the first “missing” acoustic edge.
These results constitute strong evidence for neural mechanisms in early auditory cortex supporting entrainment and prediction, both fundamental computational elements in models of speech perception4,8,9,22. The characteristics of these mechanisms contrast with the presumption of “a principled relation between the time scales present in speech and the time constants underlying neuronal cortical oscillations that is both a reflection of and the means by which the brain converts speech rhythms into linguistic segments”4 supported by “cascaded cortical oscillations”8 or a “hierarchy of nested oscillations”9. Our results are instead consistent with the predictive encoding of “when” by a bandlimited complex of discrete computational channels, each arising from distinct patterns of hierarchical cortical connectivity22.
Transient Response and Prediction of “What” in Planum Temporale
While entrainment was constrained to Heschl’s gyrus and the transverse temporal sulcus, we observed a distinct transient response in planum temporale. The transient response was characterized by a brief spike in gamma power and rapid reset of low-frequency phase immediately following acoustic onset. Interestingly, this response was not engaged during self-generated speech. Such preferential engagement for unexpected sound is consistent with predictive encoding during speech production26,27. Upon execution of a speech motor plan, a learned internal model generates an efference copy82–84 – an expected sensory result. When the acoustic input matches this efference copy, no cortical signal is generated; however, when a mismatch occurs (e.g. externally-generated sound or speech), an error signal results22. This is precisely what we observed in the planum temporale, distinct from entrainment in Heschl’s gyrus.
Our results advance understanding of the neurobiology of predictive speech coding in two respects. First, functional studies have revealed single-unit preference in primary auditory cortex for listening or speaking in both non-humans85 and humans82. It has recently been asserted that these response tunings overlap – an “intertwined mosaic of neuronal populations”86 in auditory cortex. Instead, the complete anteroposterior mapping of the supratemporal plane in 31 patients enabled us to identify a distinct neuroanatomical organization in planum temporale. Second, several groups report cortical response suppression specific for self-generated speech19,86–89. We reveal two distinct modes that enable this suppression: a partial reduction of activity in Heschl’s gyrus and a complete absence of the transient response in planum temporale. The stapedius reflex85,90 does not explain the latter mode, suggesting a neural mechanism of suppression. All together, we provide compelling evidence for efference copies – predictive encoding of “what”22 – and their essential role in speech production26,27.
Author Contributions
Conceptualization: GH, NT; Methodology: KJF, GH, NT; Software: KJF, PSR; Formal Analysis: KJF; Investigation: KJF; Data Curation: KJF, PSR, NT; Writing – Original Draft: KJF; Writing – Review & Editing: KJF, GH, NT; Visualization: KJF; Supervision: NT; Project Administration: NT; Funding Acquisition: NT.
Competing Interests
The authors declare no competing interests.
Materials & Correspondence
Please direct all correspondence and material requests to Nitin Tandon.
Methods
Population
31 patients (18 male, 13 female; mean age 31 ± 8; mean IQ 96 ± 15) undergoing evaluation of intractable epilepsy with intracranial electrodes were enrolled in the study after obtaining informed consent. Study design was approved by the committee for the protection of human subjects at the University of Texas Health Science Center. A total of 6580 electrodes (5742 depths, 838 grids) were implanted in this cohort. Only the 4003 electrodes (3494 depths, 509 grids) unaffected by epileptic activity, artifacts, or electrical noise were used in subsequent analyses.
Hemispheric language dominance was evaluated in all patients with intra-carotid sodium amytal injection91 (n = 5), fMRI laterality index92,93 (n = 7), cortical stimulation mapping94 (n = 8), or the Edinburgh Handedness Inventory95 (n = 11). 29 patients were confirmed to be left-hemisphere language-dominant. 1 patient was found to be left-handed by EHI and did not undergo alternative evaluation; they are assumed to be left-hemisphere dominant, but were excluded from laterality analysis. Three patients were found to be right-hemisphere language-dominant; 2 by intra-carotid sodium amytal injection and 1 by fMRI laterality index.
Paradigms
Two distinct paradigms were used. The first experiment featured amplitude-modulated white noise, while the latter experiment contained natural speech. All were designed to evaluate the response of early auditory cortex to external acoustic stimuli. Stimuli were played to patients using stereo speakers (44.1 kHz, 15” MacBook Pro 2013) driven by either MATLAB (first experiment) or Python (second experiment) presentation software.
The first experiment presented patients with a single-interval two-alternative forced-choice perceptual discrimination task81. The stimulus comprised two periods. In the first, wideband Gaussian noise was modulated (3 Hz, 80% depth) for 3 seconds. In the second, the modulation waveform ended on the cosine phase of the next cycle to yield 833 ms of constant-amplitude noise. Furthermore, 50% of trials featured a peri-threshold tone (1 kHz, 50 ms duration, 5ms rise-decay time) that was presented at one of 6 temporal positions and at an amplitude level from 1 of 3 values. The temporal positions were separated by a quarter-cycle of the modulation frequency beginning with the constant-amplitude noise. The amplitude levels covered a range of 12 dB. On each trial, the patient was required to indicate via a key press whether a tonal signal was present during the unmodulated segment of the masking noise. All patients each completed 100 trials.
In the second experiment, patients engaged in an auditory-cued naming task: naming to definition. The stimuli were single sentence descriptions (average duration of 1.97 ± 0.36 seconds) recorded by both male and female speakers. These were designed such that the last word always contained crucial semantic information without which a specific response could not be generated (e.g., “A round red fruit.”)96. Patients were instructed to articulate aloud the object described by the stimulus. In addition, temporally-reversed speech was used as a control condition. These stimuli preserved the spectral content of natural speech, but communicated no meaningful linguistic content. For each stimulus, patients were instructed to articulate aloud the gender of the speaker. 20 patients each completed 180 trials.
MR acquisition
Pre-operative anatomical MRI scans were obtained using a 3T whole-body MR scanner (Philips Medical Systems) fitted with a 16-channel SENSE head coil. Images were collected using a magnetization-prepared 180° radiofrequency pulse and rapid gradient-echo sequence with 1 mm sagittal slices and an in-plane resolution of 0.938 × 0.938 mm97. Pial surface reconstructions were computed with FreeSurfer (v5.1)98 and imported to AFNI99. Post-operative CT scans were registered to the pre-operative MRI scans to localize electrodes relative to cortical landmarks. Grid electrode locations were determined by a recursive grid partitioning technique and then optimized using intra-operative photographs100. Depth electrode locations were informed by implantation trajectories from the ROSA surgical system.
ECoG acquisition
Stereo-electroencephalographic depth probes with platinum-iridium electrode contacts (PMT Corporation; 0.8 mm diameter, 2.0 mm length cylinders; adjacent contacts separated by 1.5-2.43 mm) were implanted using the Robotic Surgical Assistant (ROSA; Zimmer-Biomet, Warsaw, IN) registered to the patient using both a computed tomographic angiogram and an anatomical MRI101,102. Each depth probe had 8-16 contacts and each patient had multiple (12-16) such probes implanted. Surface grids – subdural platinum-iridium electrodes embedded in a silastic sheet (PMT Corporation, Chanhassen, MN; top-hat design; 3 mm diameter cortical contact) – were surgically implanted via a craniotomy93,100,103. ECoG recordings were performed at least two days after the craniotomy to allow for recovery from the anesthesia and narcotic medications. 29 patients were implanted with depth probe electrodes; 4 patients were implanted with surface grid electrodes. Notably, a pair of patients had 2 separate implants: first with depth probe electrodes and subsequently with surface grid electrodes.
Data were collected at a 2000 Hz sampling rate and 0.1-700 Hz bandwidth using NeuroPort NSP (Blackrock Microsystems, Salt Lake City, UT). Stimulus presentation software triggered a digital pulse at trial onset that was registered to ECoG via digital-to-analog conversion (MATLAB: USB-1208FS, Measurement Computing, Norton, MA; Python: U3-LV, LabJack, Lakewood, CO). Continuous audio registered to ECoG was recorded with an omnidirectional microphone (30-20,000 Hz response, 73 dB SNR, Audio Technica U841A) placed adjacent to the presentation laptop. For the naming to definition and reversed speech experiments, articulation onset and offset were determined by offline analysis of the amplitude increase and spectrographic signature associated with each verbal response.
Cortical areas with potentially abnormal physiology were excluded by removing channels that demonstrated inter-ictal activity or that recorded in proximity to the localized seizure onset sites. Additional channels contaminated by >10 dB of line noise or regular saturation were also excluded from further analysis. The remaining channels were referenced to a common average comprised of all electrodes surviving these criteria. Any trials manifesting epileptiform activity were removed. Furthermore, trials for the naming to definition and reversed speech experiments in which the patient answered incorrectly or after more than 2 seconds were eliminated.
Digital signal processing
Line noise was removed with zero-phase 2nd order Butterworth bandstop filters at 60 Hz and its first 2 harmonics. The analytic signal was generated with frequency domain bandpass Hilbert filters featuring paired sigmoid flanks (half-width 1 Hz)104–107. For spectral decompositions, this was generalized to a filter bank with logarithmically spaced center frequency (2 to 16 Hz, 50 steps) and passband widths (1 to 4 Hz, 50 steps). Instantaneous amplitude and phase were subsequently extracted from the analytic signal. In this fashion, we used both narrowband and wideband analyses to precisely quantify the frequency driving a local cortical response and its timing, respectively.
Statistical analysis
Analyses were performed with trials time-locked to stimulus onset (all experiments) and to articulation onset (only naming to definition and reversed speech experiments). The baseline period for all experiments was defined as −300 to −50 ms relative to stimulus onset.
Instantaneous amplitude was squared and then normalized to the baseline period, yielding percent change in power from baseline. Statistical significance was set at the 0.01 level, evaluated with the Wilcoxon signed-rank test, and subjected to familywise error correction. The alignment of instantaneous phase at each trial θnwas quantified with inter-trial coherence (ITC), defined as follows where N is the number of trials:
Statistical significance was set at the 0.01 level, evaluated with the Rayleigh z test, and subjected to FDR control. All time traces were smoothed after statistical analysis with a Savitsky-Golay polynomial filter (3rd order, 83ms frame length) for visual presentation.
Non-negative matrix factorization (NNMF) is an unsupervised clustering algorithm108. This method expresses non-negative matrix A ∈ Rmxn as the product of “class weight” matrix W ∈ Rmxk and “class archetype” matrix H ∈ Rkxn, minimizing:
The factorization rank k = 2 was chosen for all analyses in this work. Repeat analyses with higher ranks did not identify additional response types. We optimized the matrix factorization with 1000 replicates of a multiplicative update algorithm (MATLAB R2018b Statistics and Machine Learning Toolbox). Two types of inputs were separately factorized: mean gamma power and low-frequency phase ITC. Gamma power values less than 20% above baseline and low-frequency phase ITC less than baseline were rectified. These were calculated for the m electrodes in the supratemporal plane at n time points. Factorization thus generated a pair of class weights for each electrode and a pair of class archetypes – the basis function for each class. Class bias was defined as the difference between the class weights at each electrode. Response magnitude was defined as the sum of class weight magnitudes at each electrode. Separate factorizations were estimated for white noise listening and for natural speech listening. The latter was then applied to self-generated speech by conserving the class archetypes and recalculating the class weights:
Response classifications were established by applying a binary threshold to the class biases. Spearman correlations were calculated using only electrodes with a large response magnitude (>20).
Surface-based mixed-effects multilevel analysis (SB-MEMA) was used to provide statistically robust109–111 and topologically precise107,112,113 effect estimates of band-limited power change from the baseline period. This method, developed and described previously by our group105,114, accounts for sparse sampling, outlier inferences, as well as intra- and inter-subject variability to produce population maps of cortical activity. SB-MEMA was run on short, overlapping time windows (150 ms width, 10 ms spacing) to generate the frames of a movie portraying cortical activity. All maps were smoothed with a geodesic Gaussian smoothing filter (3 mm full-width at half-maximum) for visual presentation.
Data Availability
The datasets collected and analyzed during the current study are not publicly available as patients did not consent to such distribution but grouped data representations are available from the corresponding author on reasonable request.
Acknowledgements
We thank all the patients who participated in this study; laboratory members at the Tandon lab (Matthew Rollo and Jessica Johnson); neurologists at the Texas Comprehensive Epilepsy Program (Jeremy Slater, Giridhar Kalamangalam, Omotola Hope, Melissa Thomas) who participated in the care of these patients; and all the nurses and technicians in the Epilepsy Monitoring Unit at Memorial Hermann Hospital who helped make this research possible.
This work was supported by the National Institute on Deafness and Other Communication Disorders 5R01DC014589-04, National Institute of Neurological Disorders and Stroke 5U01NS098981-03, and National Institute on Deafness and Other Communication Disorders 1F30DC017083-01.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.
- 13.
- 14.↵
- 15.↵
- 16.
- 17.
- 18.
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.
- 52.↵
- 53.↵
- 54.
- 55.
- 56.
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.↵
- 64.↵
- 65.
- 66.
- 67.↵
- 68.↵
- 69.
- 70.
- 71.
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.↵
- 85.↵
- 86.↵
- 87.
- 88.
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.
- 107.↵
- 108.↵
- 109.↵
- 110.
- 111.↵
- 112.↵
- 113.↵
- 114.↵