Acquiring absolute pitch in adulthood is difficult but possible

Yetta Kwailing Wong; Kelvin F. H. Lui; Ken H.M. Yip; Alan C.-N. Wong

doi:10.1101/355933

Abstract

Absolute pitch (AP) refers to the rare ability to name the pitch of a tone without external reference. It is widely believed that acquiring AP in adulthood is impossible, since AP is only for the selected few with rare genetic makeup and early musical training. In three experiments, we trained adults to name pitches for 12 to 40 hours. After training, 14% of the participants (6 out of 43) were able to name twelve pitches at 90% accuracy or above, with semitone errors considered incorrect. This performance level was comparable to that of real-world ‘AP possessors’. AP training showed classic characteristics of perceptual learning, including performance enhancement, generalization of learning and sustained improvement for at least one to three months. Exploratory extrapolation analyses suggest that 39.5% and 58.1% of the participants would acquire AP if the training lasted for 60 and 180 hours respectively, suggesting the potential for the majority of the participants to acquire AP. We demonstrate that AP continues to be learnable in adulthood. The extent to which one acquires AP may thus be better explained by the amount and type of perceptual experience.

Absolute pitch (AP) refers to the ability to name the pitch of a tone (e.g., naming a tone as “C”) or to produce it without external reference tones (Takeuchi & Hulse, 1993; Ward, 1999). While the majority of us can effortlessly identify a countless number of faces, objects, and visual and auditory words, most people find it very difficult to name the twelve pitches, and professional musicians are of no exception (Athos et al., 2007; Levitin & Rogers, 2005; Zatorre, 2003). The most extreme estimate states that, in every 10000 people, there is one ‘AP possessor’ who can perform AP judgment accurately and effortlessly (Takeuchi & Hulse, 1993). This rare ability is considered a special talent and endowment for gifted musicians (Deutsch, 2002; Takeuchi & Hulse, 1993; Ward, 1999). The genesis of AP has therefore been a perplexing research topic among musicians, psychologists and neuroscientists for more than a century (Deutsch, 2002; Levitin & Rogers, 2005; Takeuchi & Hulse, 1993; Ward, 1999).

In the acquisition of AP, two prerequisites have been widely accepted: the rare genetic disposition to AP, and an early onset of musical training that is within a critical period in childhood similar to that of language development (Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Chin, 2003; Drayna, 2007; Levitin & Rogers, 2005; Ross, Olson, Marks, & Gore, 2004; Takeuchi & Hulse, 1993; Zatorre, 2003). Accordingly, most professional musicians fail to acquire AP because they do not carry the specific genes and/or because they fail to start their music training within the critical period. Training AP in adulthood, when the critical period of acquiring AP has long passed, should thus be practically impossible.

Past studies showed that AP can improve to some limited extent in adulthood with deliberate practice (Brady, 1970; Cuddy, 1968, 1970; Meyer, 1899; Van Hedger, Heald, Koch, & Nusbaum, 2015), but there is no convincing evidence that adults can attain a performance level comparable to the AP possessors through training (Levitin & Rogers, 2005; Ward, 1999). Nevertheless, all previous training studies had low training intensity (roughly 1-4 hours; Cuddy, 1970; Van Hedger et al., 2015) and/or small sample sizes (3 or less participants per condition; Brady, 1970; Cuddy, 1968; Meyer, 1899). It thus remains unknown if acquisition of AP in adulthood is possible with a more rigorous training protocol.

The current study examined the possibility of AP acquisition in adulthood with perceptual learning paradigms. Perceptual learning has repeatedly demonstrated the human ability to pick up environmental input and fine-tune their perceptual representations in all sensory modalities (Fujioka, Ross, Kakigi, Pantev, & Trainor, 2006; Goldstone, 1998; Kraus & Banai, 2007; Sasaki, Nanez, & Watanabe, 2010; Seitz & Watanabe, 2009; Y. K. Wong, Folstein, & Gauthier, 2011). In three experiments, we trained 43 adults to name the pitch of tones with different combinations of timbres and octaves for 12-40 hours in laboratory and mobile online settings (Table 1). If specific genetic disposition and an early onset of music training are essential for AP acquisition, AP training in adulthood should be largely in vain resulting in very limited improvement in all participants. Alternatively, if AP can be trained in adulthood as a type of perceptual learning, it should be possible, at least for some individuals, to attain a performance level similar to that of real-world ‘AP possessors’. Also, we should observe performance enhancement, generalization of learning, and sustained improvement similar to perceptual learning studies (Fahle & Poggio, 2002; Goldstone, 1998; Y. K. Wong et al., 2011).

View this table:

Table 1.

Details of the training protocols in the three experiments. N indicates the number of participants in each experiment. ‘Mus/Non-Mus’ indicates the number of musicians and nonmusicians in each experiment respectively.

Experiment 1

In Experiment 1, we used a longer and more intensive perceptual learning protocol than previous studies (Cuddy, 1970; Van Hedger et al., 2015; Brady, 1980; Cuddy, 1968; Meyer, 1899) to test whether learning AP in adulthood is possible, and whether the training-induced improvement follows the classic characteristics of perceptual learning. Since how much training contributes to a sufficiently rigorous training to enable AP acquisition in adulthood is an untested empirical question, we decided to conduct a 12-hour training in this experiment, which was comparable to some previous perceptual learning studies in our laboratory (A. C. Wong, Palmeri, & Gauthier, 2009; Y. K. Wong et al., 2011).

Methods

Participants

Ten adults were recruited at City University of Hong Kong and completed the training. They included 2 males and 8 females, who were 23.1 years old on average (SD = 4.50). Seven of them were trained in music for 2-10 years, with the major instrument being piano (N = 5), violin (N = 1) and flute (N = 1). Three were non-musicians who were not formally trained with music before. One additional participant dropped out in the middle of the training and was excluded from all analyses. All participants filled out a questionnaire about their musical training background, including the musical instruments and the highest ABRSM exam passed, and reported if they regarded themselves as ‘AP possessors’. They received monetary compensation for the training and testing. Informed consent was obtained according to the Ethics Committee of City University of Hong Kong.

The sample size was estimated based on a recent AP training study (Van Hedger et al., 2015) using GPower 3.I.9.2. In this study, a large effect size was observed for the training improvement in adults (pretest vs. posttest; f = 1.34). Using the same f the sample size required to detect any training effect at p = .05 with a power of 0.95 was 5 participants. To be more conservative, we recruited 10 participants. This sample size was also consistent with that used in previous perceptual training studies (Chung & Truong, 2013; Y. K. Wong et al., 2011).

Materials

The experiment was conducted on personal computers using Matlab (Natick, MA) with the PsychToolbox extension (Brainard, 1997; Pelli, 1997) at the Cognition and Neuroscience Laboratory at City University of Hong Kong. Participants were requested to bring their own earphone to the training and testing. They adjusted the volume to a comfortable level before the training or testing started.

In Experiment 1, 120 tones from octaves three to six were used. They were complex sine wave tones and piano tones in octaves three to six, and violin tones in octaves four to five. The complex sinewave tones were identical to those in prior AP tests, and was generated by summing a series of sinusoidal waveforms including the fundamental frequency and harmonics (Bermudez, Lerch, Evans, & Zatorre, 2009). The piano tones were recorded with an electric keyboard (Yamaha S31). The violin tones were recorded by a volunteer violinist in a soundproof room. The precision of the tones was checked during recording by a tuner. The sound clips were 32-bit with a sampling rate of 44100Hz. They were edited in Audacity such that they lasted for 1 second with a 0.1-second linear onset and 0.1-second linear offset and were matched with similar perceptual magnitude.

Absolute pitch training

The training included 48 tones from two octaves (C4 to B5) in two timbres of complex sine wave and piano. A pitch-naming task was used. During each trial, an isolated tone was presented for 1s. Then, an image that mapped the 12 pitch names to 12 keys of the keyboard (from ‘1’ to ‘=’ at the top row of keys on a standard keyboard) was presented. Participants were required to name the pitch of the presented tone by keypress within 5 s.

The training was gamified and structured with different levels. If participants achieved 90% accuracy for a certain level, they would proceed to the next level; otherwise they would stay at the same level. The training was completed by finishing 12 hours of training or by passing all 80 levels with 90% accuracy. Participants finished one hour of training per day. They were trained on at least four days per week and finished the training in three weeks.

The 80-level training protocol was organized into ten 8-level parts with an increasing number of pitches (from 3 pitches in the first 8 levels to 12 pitches in the last 8 levels). Each eight-level part consisted of four types of levels, which included tones that were progressively richer in timbres and octaves. Each of the four types of levels were repeated twice, once with trial-by-trial feedback provided, and then once without feedback. For example, participants began the training with three pitches (E, F and F#). At levels 1-2, complex sine wave tones in these three pitches in octave 4 were included, with feedback provided at level 1 and then without feedback at level 2. At levels 3-4, complex sine wave tones in both octaves 4 and 5 were included with feedback and then without feedback. At levels 5-6, complex sine wave tones and piano tones in octave 4 were included with feedback and then without feedback. At levels 7-8, complex sine wave tones and piano tones in octaves 4 and 5 were included with feedback and then without feedback. At the no-feedback levels, participants were not provided with any external feedback of the correctness of the tones, so they could not establish any external reference for the AP naming. Instead, they could only generate answers internally in an absolute manner. Therefore, these no-feedback levels served as mini milestones for participants’ AP performance at 90% accuracy. If they achieved 90% accuracy at the 8^th level, a new pitch was added into the training set, with which they went through the same 8-level part again. Each level included 20 trials, with tones distributed as evenly as possible among the training pitches, octaves and timbres. Semitone errors were considered errors in the training. Before each level, participants were allowed to freely listen to sample tones of the training pitches as many times as they preferred before proceeding to the training. Each training session lasted for an hour, in which individual participants might have finished different numbers of training trials depending on their pace of learning (e.g., the amount of time spent on the training trials or on sample tone listening).

Normally participants earned one point for each correct answer in each trial. To motivate participants, a special trial that was worth three points randomly appeared with a chance of 1/80. Also, participants were given 1, 2 and 3 tokens if they achieved 60%, 75% and 90% accuracy at a training level respectively. With ten tokens, participants would obtain a chance to initiate the three-point special trial when preferred. At most three chances of initiating these special trials at one level were allowed. The special trials did not appear and could not be initiated during the no-feedback levels. This ensured that participants performed the no-feedback levels without any scoring assistance.

Test for generalization

The test for generalization was performed before and within three days after training to examine how well the pitch-naming abilities generalized to untrained octaves and timbres. 120 tones in octaves 3 to 6 were used, in which octaves 4 and 5 were trained, and 3 and 6 untrained. Three timbres were included, with complex sine wave and piano as trained timbres, and violin as an untrained timbre. The tones were presented in three conditions, either with trained octave and timbre, trained octave and untrained timbre, or untrained octave and trained timbre. During each trial, a tone was presented for 1s. Then an image that mapped the 12 pitch names to 12 keys of the keyboard, which was the same as that used in the training, was presented. Participants were required to name the pitch of the presented tone by keypress within 5s. Each tone was presented twice, leading to 240 trials in total. The trials were presented in randomized order. No practice trials were provided in these tests. The dependent measure was the precision of pitch naming, i.e., the average semitone error of participants’ responses relative to the correct responses. We adopted this dependent measure instead of the general naming accuracy because measuring the size of the judgment errors additionally informs the precision of pitch naming performance of the individuals, which is more informative than the binary correctness of the responses as measured by general naming accuracy. An identical test was performed a month later to test whether the AP learning sustained for at least a month.

Results

Acquisition of AP

In general participants made substantial progress in learning to name pitches. At the end of training, they were able to name on average 8.1 pitches (out of 12) at 90% accuracy without any externally provided reference tones or scoring assistance (see Methods), under the stringent scoring criterion of taking semitone errors as incorrect (Figure 1A). Importantly, one of the ten participants passed all levels of training, meaning that he was able to name all of the twelve pitches at 90% accuracy without any externally provided reference tones, suggesting that he has acquired AP through perceptual learning in adulthood.

Figure 1.

The number of pitches learned in the course of training in the three experiments. Number of pitches learned was defined as the number of pitches included at the highest passed levels at 90% accuracy with no feedback or scoring assistance in each individual. The solid line shows the average number of learned pitches across all individuals in each experiment. The other lines showed the course of training of the individuals who passed all levels of training in each experiment. TrainedAP_s6 in Experiment 3 learned to name 12 pitches at 90% accuracy with a subset of training tones (in one octave and one timbre only) at the end of the 16^th hour of training, but she actually passed all levels of training during the 18^th hour of training.

Is this level of AP performance representative of that of the real-world ‘AP possessors’? While the verbal definition of ‘AP possessors’, i.e., one can name pitches accurately without external references, was widely agreed, there was not a single objective definition of the performance level of ‘AP possessors’. We surveyed the literature on Web of Science on 19^th April, 2017 with the term ‘absolute pitch’ in the topic and identified 133 empirical papers. These papers used highly varied definitions of ‘AP possessors’, including self-report, AP performance measurements, or relative performance on AP tasks (such as 3 SDs higher in AP accuracy than ‘non-AP possessors’). We focused on the 66 publications that defined AP objectively based on AP performance instead of self-report, and found that these papers adopted highly varied performance measures to define ‘AP possessors’, including scoring methods (taking semitone errors as correct, partially correct or incorrect; using accuracy or the average size of errors, etc.) and cut-off points. Given these variabilities in definition, we did not see any strong reasons to adopt any single definition of ‘AP possessors’ based on some particular publications. Instead of choosing one single definition, we decided to apply the definition specified in each of the 66 papers to our successfully trained participant to see if this participant would be considered an ‘AP possessor’ in these papers. We recalculated the participant’s performance if needed. We observed that this participant would be considered an ‘AP possessors’ in 83.3% (55 out of 66) of these papers that adopted an objective AP performance-based definition. In other words, the level of AP performance achieved by this successfully trained participant was representative of and comparable to that of real-world ‘AP possessors’ defined in the literature.

Generalization & Sustainability of AP learning

The improved pitch-naming performance generalized to untrained octaves and timbres (Figure 2A). A 2 × 3 ANOVA with Prepost (pretest / posttest) and Stimulus Type (octave & timbre trained / octave trained & timbre untrained / octave untrained & timbre trained) as factors revealed a significant main effect of Prepost, F(1,9) = 38.76,p < .001, = .812, with a smaller pitch naming error at posttest than pretest. No other main effect or interaction was observed, ps > .19, i.e., we did not observe any difference between the naming performance of tones in trained or untrained timbres and octaves.

Figure 2.

Pitch naming error in the pretest and posttests in the three experiments. Pitch naming error was defined as the number of semitone errors from the correct responses. Error bars represent 95% CIs for the interaction effect between Prepost and Stimulus Type in Experiment 1 (A), for the interaction effect between Prepost, Octave and Timbre in Experiments 2 and 3 (B-C).

To check if the improvement sustained for a month, a one-way ANOVA was performed with Prepost (pretest / posttest / a month later) on pitch naming error with the trained tones¹. It revealed a significant main effect of Prepost, F(2,16) = 19.15, p < .001, = .705. Post-hoc LSD test showed that pitch naming error reduced after training, p < .001, and remained similar a month later, p > .250.

Discussion

After the 12-hour AP training, all participants improved their pitch naming performance substantially. On average, they were able to name 8.1 pitches accurately. In particular, one of the participants was able to name all of the twelve pitches at 90% accuracy without externally provided reference tones. This level of AP performance was representative of and comparable to that of real-world ‘AP possessors’ based on a survey of the literature. This indicates that AP acquisition is possible in adulthood, and a 12-hour training protocol was sufficient for at least one of the participants to acquire AP.

In addition, the characteristics of AP learning matched well with that of perceptual learning (Fahle & Poggio, 2002; Goldstone, 1998). Specifically, AP performance improved after training, and the improved performance did not differ between tones in trained or untrained timbres and octaves, suggesting that the AP learning generalized to untrained octaves and timbres. Also, the AP performance was similar right after training and a month later, suggesting that the AP learning sustained for a month. Overall, the AP learning corresponded well with classic characteristics of perceptual learning in terms of performance enhancement, generalization and sustainability.

Experiment 2

In Experiment 2, we aimed to replicate the feasibility of acquiring AP in adulthood through perceptual learning and further characterize AP learning in adulthood. First, we tested the robustness of AP acquisition in adulthood by using a different set of training protocol, including a different set of training tones, training tasks, training duration and design. Second, we asked whether training with a smaller set of stimuli, i.e., tones in one octave and one timbre only, would lead to higher specificity in AP learning, as one would expect based on the perceptual learning literature (Fahle & Poggio, 2002; Goldstone, 1998; Wong et al., 2011). Third, we also explored whether musicians benefit from the training more than non-musicians due to their prior musical training.