Summary
A long-standing controversy persists in psycholinguistic research regarding the way phonemes are coded in human auditory cortex during speech perception. The motor theory of speech perception [1, 2] describes phoneme perception in terms of the articulatory gestures that generate it. According to this theory, the objects of speech perception are the intended phonetic gestures of the speaker, such as, ‘lip rounding’, or ‘jaw raising’. Alternatively, auditory theories argue that phonetic processing depends directly on properties of the auditory system [3-6]. According to this view, listeners identify spectro-temporal patterns in phoneme waveforms and match them with stored abstract acoustic representations. Here we recorded spiking activity in the auditory cortex (superior temporal gyrus; STG) from six neurosurgical patients who performed a listening task with phoneme stimuli. Using a Naïve-Bayes model, we show that single-cell responses to phonemes are governed by articulatory features that have acoustic correlates (manner-of-articulation) and organized according to sonority, with two main clusters for sonorants and obstruents. Using the same set of phonemes, we further find that ‘neural similarity’ (i.e. the similarity of evoked spiking activity between pairs of phonemes), is comparable to the ‘perceptual similarity’ (i.e. how much the pair of phonemes sound similar) based on perceptual confusion assessed behaviorally in healthy subjects. Thus phonemes that were perceptually similar, also had similar neural responses. Our findings establish that phonemes are encoded according to manner-of-articulation, supporting the auditory theories of perception, and that the perceptual representation of phonemes can be reflected by the activity of single neurons in STG.