Abstract
Speech processing is highly incremental. It is widely accepted that listeners continuously use the linguistic context to anticipate upcoming concepts, words and phonemes. However, previous evidence supports two seemingly contradictory models of how predictive cues are integrated with bottom-up evidence: Classic psycholinguistic paradigms suggest a two-stage model, in which acoustic input is represented fleetingly in a local, context-free manner, but quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single unified interpretation of the input, which fully integrates available information across representational hierarchies and predictively modulates even earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of unified and local predictive models. Results provide evidence for some aspects of both. Local context models, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone, do uniquely predict some part of early neural responses; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in non-identical parts of the superior temporal lobes bilaterally, although the more local models tend to be right-lateralized. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition.
Competing Interest Statement
The authors have declared no competing interest.