Abstract
Individual differences in delay-discounting correlate with important real world outcomes, e.g. education, income, drug use, & criminality. As such, delay-discounting has been extensively studied by economists, psychologists and neuroscientists to reveal its behavioral and biological mechanisms in both human and non-human animal models. However, two major methodological differences hinder comparing results across species. Human studies present long time-horizon options verbally, whereas animal studies employ experiential cues and short delays. To bridge these divides, we developed a novel language-free experiential task inspired by animal decision-making studies. We find that subjects’ time-preferences are reliable across both verbal/experiential differences and also second/day differences. When we examined whether discount factors shifted or scaled across the tasks, we found a surprisingly strong effect of temporal context. Taken together, this indicates that subjects have a stable, but context-dependent, time-preference that can be reliably assessed using different methods; thereby, providing a foundation to bridge studies of time-preferences across species.
Introduction
Intertemporal choices involve a trade-off between a larger outcome received later and a smaller outcome received sooner. Many individual decisions have this temporal structure, such as whether to purchase a cheaper refrigerator, but forgo the ongoing energy savings. Since research has found that intertemporal preferences are predictive of a wide variety of important life outcomes, ranging from SAT scores, graduating from college, and income to anti-social behaviors, e.g. gambling or drug abuse [1,13,24,28,45], they are frequently studied in both humans and animals across multiple disciplines, including marketing, economics, psychology, and neuroscience.
A potential obstacle to understanding the biological basis of intertemporal decision-making is that human studies differ from non-human animal studies in two important ways: long versus short time-horizons and choices that are made based on verbal versus non-verbal (i.e. “experiential”) stimuli. In animal studies, the subjects experience the delay between their choice and the reward (sometimes cued with a ramping sound or a diminishing visual stimulus) before they can proceed to the next trial [8,11,73]. Generally, there is nothing for the subject to do during this waiting period. In human studies, subjects usually make a series of choices (either via computer or a survey) between smaller sooner offers and larger offers delayed by months or years [2,49]. (We are aware of only a handful of studies that have used delays of minutes [48] or seconds [25,31,40,60,72]). During the delay (e.g. if the payout is in 6 months) the human subjects go about their lives, likely forgetting about the delayed payment, just as individuals do not actively think about their retirement savings account each moment until their retirement.
Animal studies of delay-discounting take several forms [11,18,62,79], but all require experiential learning that some non-verbal cue is associated with waiting. Subjects experience the cues, delays and rewards, and slowly build an internal map from the cues to the delays and magnitudes. Subjects may only have implicit knowledge of the map, which likely engage distinct neural substrates to the explicit processes engaged by humans when considering a verbal offer [59,61].
Whether animal studies can inform human studies depends on answers to the following questions. Do decisions that involve actively waiting for seconds invoke the same cognitive and neural processes as decisions requiring passively waiting for months? Do decisions made based on experience and perceptual decisions invoke the same cognitive and neural processes as decisions that are made based on explicitly written information?
The animal neuroscience literature on delay-discounting mostly accepts as a given that the behavior of animals will give insight into the biological basis for human impulsivity [23,34,65,70] and rarely [8,66] addresses the methodological gaps considered here. This view is not unfounded. Neural recordings from animals [11] and brain imaging studies in humans [37,49] both find that the prefrontal cortex and basal ganglia are involved in delay-discounting decisions, suggesting common neural mechanisms. Animal models of attention-deficit hyperactive disorder (ADHD) have reasonable construct validity: drugs that shift animal behavior in delay-discounting tasks can also improve the symptoms of ADHD in humans [23,56]. Thus, most neuroscientists would likely predict that our experiments would find high within-subject reliability across both time-horizons and verbal/experiential dimensions.
Reading the literature from economics, a different picture emerges. Traditional economic models dating back to Samuelson [68] posit that agents make consistent intertemporal decisions, thereby implying a constant discount rate regardless of context. In contrast, growing evidence from behavioral economics provides support for the view that discounting over a given time delay changes with the time-horizon [3,7]. Yet, there remains debate in the empirical economics literature about how well discounting measures elicited in human studies truly reflect the rates of time-preference used in real-world decisions since they have been found to vary by the type of task (hypothetical, potentially real, and real), stakes being compared, age of participants and across different domains [15]. Thus, most economists surveying the empirical evidence would be surprised if a design that varied both type of tasks and horizons would generate results with high within-subject reliability.
Here, we have addressed these questions by measuring the discount factors of human subjects in three ways. First, we used a novel language-free task involving experiential learning with short delays [20,42,43,46]. Then, we measured discount factors more traditionally, with verbal offers over both short and long delays. This design allowed us to test whether, for each subject, a single process is used for intertemporal choice regardless of time-horizon or verbal vs. experiential stimuli, or whether the choices in different tasks could be better explained by distinct underlying mechanisms.
Results
In our main experiment, 63 undergraduate students from NYU Shanghai participated in 5 experimental sessions. In each session, subjects completed a series of intertemporal choices. Across sessions, 160 trials were conducted involving each of the following 3 tasks, i) non-verbal short delay (NV, 3 - 64 seconds), ii) verbal short delay (SV, 3 - 64 seconds), and iii) verbal long delay (LV, 3 - 64 days). In each trial, irrespective of the task, subjects made a decision between the sooner (blue circle) and the later (yellow circle) options. In the non-verbal task (Fig. 1A) the parameters of the later option were mapped to an amplitude modulated pure tone. The reward magnitude was mapped to frequency of the tone (larger reward ∝ higher frequency). The delay was mapped to amplitude modulation rate (longer delay ∝ slower modulation). Across trials, the delay and the magnitude of the sooner option were fixed (4 coins, immediately). For the short delay tasks, when subjects chose the later option, a clock appeared on the screen, and only when the clock image disappeared, could they collect their reward by clicking in the reward port. The rewards were accumulated for the duration of the task and used for subject’s payment. In the verbal tasks, the verbal description of the offers appeared within the blue & yellow circles in place of the amplitude modulated sound (Fig. 1B). In the verbal long delay task, after each choice, subjects were given feedback confirming their choice and then proceeded to the next trial. At the end of the session, a single long-verbal trial was selected randomly to determine the payment. If the selected trial corresponded to a subject having chosen the later option, she received her reward via an electronic transfer after the delay.
Subjects’ time-preferences are reliable across both verbal/experiential and second/day differences
Subjects’ impulsivity was estimated by fitting their choices with a hierarchical Bayesian model of hyperbolic discounting with decision noise (Materials and Methods). The model (M6p,4s) had 6 population level parameters (discount factor, k, and decision noise, τ, for each of the three tasks) and 4 parameters per subject: kNV, kSV,kLV and τ. The subject level effects are drawn from a normal distribution with mean zero. Subjects’ choices were well-fit by the model (Fig. 2 & Fig. S1). Since we did not ex ante have a strong hypothesis about how the subjects’ impulsivity measures in one task would translate across tasks, we first examined ranks of impulsivity and found significant correlations across experimental tasks (Table 1). In other words, the most impulsive subject in one task is likely to be the most impulsive subject in another task. This result is robust to different functional forms of discounting and estimation methods (Fig. S2 & Table S3). For example, if we ranked the subjects by the fraction of trials they chose the later option in each task, we obtain a similar result (Spearman r: SV vs. NV r = 0.71; SV vs. LV r = 0.49; NV vs. LV r = 0.30, all p < 0.05) (See SI Results for additional confirmations). The correlations of discount factors across tasks extended to Pearson correlation of log(k) (Fig. 3 & Table 1). We found that k, for all tasks, had a log-normal distribution across our subjects (as in [69]) and shown in Fig. 3C), hence we present our results in log(k).
Consistent with existing research, we find that time-preferences are stable in the same task within subjects between the first half of the block and the second half of the block within sessions and also across experimental sessions that take place every two weeks (SI Results) [4,50]. In our verbal experimental sessions the short and long tasks were alternated and the order was counter-balanced across subjects. We did not find any order effects (Materials and Methods) in both main (bootstrapped mean test, SV-LV-SV-LV vs. LV-SV-LV-SV order for SV and LV log(k), respectively, all p ¿ 0.4) and control experiments (SI Results).
In our experimental design, the SV task has shared features with both the NV and LV task. First, the SV shares time-horizon with the NV task. Second, the SV and LV are both verbal and were undertaken at the same time. The NV and LV tasks differ in both time-horizon and verbal/non-verbal. The only potential feature that is shared between all tasks is delay-discounting. To test whether the correlation between NV and LV might be accounted for by their shared correlation with the SV task, we performed linear regressions of the discount factors in each task as a function of the other tasks (e.g. log(kNV) = βSVlog(kSV) + βLVlog(kLV + β0 + ϵ)). For NV the two predictors explained 63% of the variance (F(60, 2) = 50.63, p < 10−9). It was found that log(kSV) significantly predicted log(kNV) (βSV = 1.28 ± 0.15, p < 10−9) but log(kLV) did not (βLV = −0.12 ± 0.09, p = 0.181). For LV we were able to predict 40% of the variance (F(60, 2) = 19.64, p < 10−6) and found that log(kSV) significantly predicted log(kLV) (βSV = 1.26 ± 0.26, p < 10−5) but log(kNV) did not (βNV = −0.24 ± 0.18, p = 0.181). For SV the two predictors explained 72% of the variance (F(60, 2) = 78.93, p < 10−9). Coefficients for both predictors were significant (βNV = 0.435 ± 0.050, p < 10−9; βLV = −0.223 ± 0.046, p < 10−5); where β = mean ± std.error. We further verified these results by generating 1-predictor reduced models based on the stronger of the 2-predictors for each task and comparing the nested models using Akaike Information Criteria (AIC) and likelihood ratio tests (LR test) (Table 2).
In order to test whether the verbal/non-verbal gap or the time-horizons gap accounted for more variation in discounting we used a linear mixed-effects model where we estimated log(k) as a function of the two gaps (as fixed effects) with subject as a random effect (using the lme4 R package [5,6]). We created two predictors: days was false in NV and SV tasks for offers in seconds and was true in the LV task for offers in days; verbal was true for the SV and LV tasks and false for the NV task. We found that time-horizon (βdays = −0.524 ± 0.235, p = 0.026) but not verbal/non-verbal (βverbal = −0.317 ± 0.235, p = 0.178) contributed significantly to the variance in log(k). This result was further supported by comparing the 2-factor model with reduced 1-factor models (i.e. that only contained either time or verbal fixed effects). Dropping the days factor significantly decreased the likelihood, but dropping the verbal factor did not (Table 3).
We found that subject’s time-preferences were highly correlated across tasks. However, correlation is invariant to shifts or scales across tasks. Our hierarchical model allows us to directly estimate the posterior distributions of log(k) (Fig. 3C) and report posterior means and credible intervals (NV posterior mean = -3.2, 95% credible interval [-3.77, -2.64], SV mean = -3.49, 95% credible interval [-3.86, -3.11], LV mean = -3.95, 95% credible interval [-4.55, -3.34]). Similarly, we can use the posterior probability to test if log(k) shifted and/or scaled between tasks (Materials and Methods). We find that subjects in both SV and NV are more impatient than LV, but not different from each other (i.e. significant shifts between SV and LV, NV and LV, Table 4). There is significant scaling between SV and the other two tasks (Table 4). This is likely driven by subgroups that were exceptionally patient in the LV task (Fig. 3B) or impulsive in the NV task (Fig. 3A).
Controlling for visuo-motor confounds
In the main experiment, we held the following features constant across three tasks: the visual display and the use of a mouse to perform the task. However, after observing the strong correlations between the tasks (Fig. 3) we were concerned that the effects could have been driven by the superficial (i.e. visuo-motor) aspects of the tasks. In other words, the visual and response features of the SV and LV tasks may have reminded subjects of the NV task context and nudged them to use a similar strategy across tasks. While this may be interesting in its own right, it would limit the generality of our results. To address this, we ran a control experiment (n=25 subjects) where the NV task was identical to the original NV task, but the SV and LV tasks were run in a more traditional way, with a text display and keypress response (control experiment 1, SI Method & Fig. S6). We replicated the main findings of our original experiment for ranks of log(k) (Table S5) and correlation between log(k) in SV and LV tasks (Fig. 4B). The Pearson correlation between NV and SV tasks (Fig. 4A) was lower than expected given the 95% confidence intervals of the resampled correlations of the main experiment and assuming 25 subjects (SI Results). This suggests that some of the correlation between SV and NV tasks in the main experiment may be driven by visuo-motor similarity in experimental designs. We did not find shifts or scaling between the posterior distributions of log(k) across tasks in this control experiment (Fig. 4C, NV posterior mean = -3.98, 95% credible interval [-5.44, -2.67], SV mean = -3.8, 95% credible interval [-4.94, -2.75], LV mean = -3.76, 95% credible interval [-4.79, -2.76]).
Strong effect of temporal context
We described above that the discount factors in the LV task, kLV, were almost equivalent (ignoring unexplained variance) to those in the SV task kSV (Fig. 3B). However, the units of kLV are in 1/day and the units of kSV are in 1/seconds. This finding implies that for a specific reward value, if a subject would decrease their subjective utility of that reward by 50% for an increase from 5 to 10 seconds in the SV task, they would also decrease their subjective utility of that reward by 50% for an increase from 5 to 10 days in the LV task. This seems implausible, particularly from a neoclassical economics perspective. However, reward units also change when moving from SV to LV task. In our sessions, the exchange rate in SV was 0.05 CNY per coin (since all coins are accumulated and subjects are paid the total profit), whereas in LV, subjects were paid on the basis of a single trial chosen at random using an exchange rate of 4 CNY for each coin. These exchange rates were set to, on average, equalize the possible total profit between short and long delays tasks. However, even accounting for both the magnitude effect [29,30] and unit conversion (calculations presented in SI Results) the discount rates are still scaled by 4 orders of magnitude from the short to the long time-horizon tasks [53].
One interpretation of this result is that subjects are simply ignoring the units and only focusing on the number. This would be consistent with an emerging body of evidence that numerical value, rather than conversion rate or units matter to human subjects [17,26]. A second possible interpretation is that subjects normalize the subjective delay of the offers based on context, just as they normalize subjective value based on current context and recent history [39,41,76,78]. A third possibility is that in the short delay tasks (NV and SV) subjects experience the wait for the reward on each trial as quite costly, in comparison to the delayed gratification experienced in the LV task. This “cost of waiting” may share some intersubject variability with delay-discounting but may effectively scale the discount factor in tasks with this feature [55].
In an attempt to disentangle these possibilities, we ran a control experiment (n=16 subjects) using two verbal discounting tasks (control experiment 2, SI Method). In one task, the offers were in days (DV). In the other, the offers were in weeks (WV). This way, we could directly test whether subjects would discount the same for 1 day as 1 week (i.e. ignore units) or 7 days as 1 week (i.e. convert units). We found strong evidence for the latter (Fig. 5A). Subjects did not ignore the units: their choices were consistent with rational agents that converted all offers into a common time unit. There is almost perfect correlation (Pearson r = 0.97, p < 0.01) across estimated log(k) (k ~ 1/day) within subjects between verbal task with delays in days and delays in weeks (see control experiment 2 results in SI Results).
Having ruled out the possibility that subjects ignore units of time, we test our second potential explanation: that subjects make decisions based on a subjective delay that is context dependent. We reasoned that if choices are context dependent then it may take some number of trials in each task before the context is set. Consistent with this reasoning, we found a small but significant adaptation effect in early trials: subjects are more likely to choose the later option in the first trials of SV task (Fig. 5B,C). It seems that, at first, seconds in the current task are interpreted as being smaller than days in the preceding task, but within several trials days are forgotten and time preferences adapt to a new time-horizon of seconds.
Discussion
Using three tasks, we set out to test whether the same delay-discounting process is employed regardless of the verbal/non-verbal nature of the task and the time-horizon. We found significant correlations between subjects’ discount factors in the three tasks, providing evidence that there are common cognitive (and presumably underlying neural) mechanisms driving the decisions in the three tasks. In particular, the strong correlation between the short time-horizon non-verbal and verbal tasks (r = 0.79, Fig. 3A) provides the first evidence for generalizability of the non-verbal task; suggesting that this task can be applied to both human and animal research for direct comparison of cognitive and neural mechanisms underlying delay-discounting. However, the correlation between the short-delay/non-verbal task and the long-delay/verbal task is lower (r = 0.36). Taken together, our results suggests animal models of delay-discounting may have more in common with short time-scale consumer behavior such as impulse purchases and “paying-not-to-wait” in mobile gaming [22] and caution is warranted when reaching conclusions from the broader applicability of these models to long-time horizon real-world decisions, such as buying insurance or saving for retirement.
Stability of preferences
The question of stability is of central importance to applying in-lab studies to real-world behavior. There are several concepts of stability that our study addresses. First, is test/re-test stability; second, stability across the verbal/non-verbal gap; third, stability across the second/day gap. Consistent with previous studies [4,40,50], we found high within-task reliability. Choices in the same task did not differ when made at the beginning or the end of the session nor when they were made in sessions held on different days even 2 weeks apart (SI Results).
To our knowledge, there are no studies comparing stability across the verbal/non-verbal gap for delay-discounting. The closest literature that we are aware finds that value encoding (the convexity of the utility function) but not probability weighting is similar across the verbal/non-verbal gap in sessions that compare responses to a classic verbal risky economic choice task with an equivalent task in the motor domain [81]. It may be that unlike time or value, probability is processed differently in verbal vs. non-verbal settings [33].
There are two aspects to the time-horizon gap that may contribute independently to differences in subjects’ preferences between our short and long tasks. First, there is the difference in order of magnitudes of the delays. Second, there is a difference in the experience of the delay, in that all delays are experienced in the short tasks, but only one delay is experienced in the long task.
Our control study comparing discounting of days vs. weeks eliminated the second factor since only one delay was experienced for both days and weeks tasks. We found almost perfect correspondence between the choices in the two tasks (Fig. 5A): subjects discounted 7 days as much as they discounted one week. However, days and weeks are only separated by one order of magnitude, while seconds vs. days are five orders of magnitude apart. So while the days/weeks experiment provides some evidence that the magnitude of the delays does not contribute substantially to variance in choice, it may be that larger differences (e.g. comparing hours vs. weeks) may produce an effect. The evidence from the literature on this issue is mixed. On the one hand, some have found that measures of discount factors on month long delays are not predictive of discount factors for year-long horizons (a difference of one order of magnitude) [44,74] but others have found consistent discounting for the same ranges [36]. Other studies that compared the population distributions of discount factors for short (up to 28 days) to long (years) delays (2 orders of magnitude) found no differences in subjects’ discount factors [2,21]. Some of these discrepancies can be attributed to the framing of choice options: standard larger later vs. smaller sooner compared to negative framework [44], where subjects want to be paid more if they have to worry longer about some negative events in the future.
Several previous studies have compared discounting in experienced delay tasks (as in our short tasks) with tasks where delays were hypothetical or just one was experienced [36, 40, 53, 64]. For example, Lane et al [40], also used a within-subject design to examine short vs. long delays (e.g. similar to our short-verbal and long-verbal tasks) and found similar correlations (r ~ 0.5 ± 0.1) with a smaller sample size (n=16). Consistent with our findings, they found (but did not discuss) a 5 order of magnitude scaling factor between subjects discounting of seconds and days suggesting that this scaling is a general phenomenon.
Cost of waiting vs. discounting future gains
It may seem surprising that human subjects would discount later rewards, i.e. choosing immediate rewards, in a task where delays are in seconds. After all, subjects cannot consume earnings immediately. Yet, this result is consistent with earlier work that suggests individuals derive utility from receiving money irrespective of when it is consumed [48,49,63]. In our design, a pleasing (as reported by subjects) ‘slot machine’ sound accompanied the presentation of the coins in the short-delay tasks. This sound can be interpreted as an instantaneous secondary reinforcer [38]. Further, this result is consistent with studies which find that humans exhibit discount rates comparable to other species when consuming liquid rewards [35]. On the other hand, this would not be surprising for those who develop (or study) “pay-not-to-wait” video games [22], which exploit player’s impulsivity to acquire virtual goods with no actual economic value.
Using a seconds time-horizon may lead one to question if we can measure delay-discounting or if we are capturing the cost of waiting [52]. Waiting or doing nothing, “builds up anxiety and stress in an individual due both to the sense of waste and the uncertainty involved in a waiting situation” [54]. These different interpretations (Paglieri [55] described the delayed option being framed as ‘waiting’ in seconds compared to ‘postponing’ in days time-horizon) may lead one to question what we can learn from comparing within-subject behavior across tasks. Although it is not known how time is perceived, e.g. subjects could overestimate the duration of the short delay, which will lead to greater discounting, we argue that the significant correlations observed indicate there are some shared biological mechanisms underlying each of the three delay-discounting tasks, which could explain why our inability to resist a candy in a seconds time-horizon self-control task predicts our ability to complete college and other long time-horizon behaviors [13,19,28,51] (but see [77]).
Subjective scaling of time
The range of rates of discounting we observed in the long-verbal task was consistent with that observed in other studies. For example, in a population of more than 23,000 subjects the log of the discount factors ranged from -8.75 to 1.4 ( [69], compare with Fig. 3B). This implies that, in our short tasks, subjects are discounting extremely steeply. i.e. they are discounting the rewards per second about the same amount that they discounted the reward per day. This discrepancy has been reported before [40,53]. We consider three (non-mutually exclusive) explanations for this scaling. First, subjects may ignore units. However, by testing overlapping time-horizons of days and weeks we confirmed that subjects can pay attention to units. Second, it may be that the costs of waiting [14,53,55] (discussed above) compared to the cost of postponing is, coincidentally, the same as the number of seconds in a day.
We feel this coincidence is unlikely, and thus favor the third explanation: temporal context. When making decisions about seconds, subjects ‘wait’ for seconds and when making decisions about days subjects ‘postpone reward’ for days [55]. Although our experiments were not designed to test whether the strong effect of temporal context was due to normalizing, existence of extra costs for waiting in real time, or both, we did find some evidence for the former (Fig. 5C). Consistent with this idea, several studies have found that there are both systematic and individual level biases that influence how objective time is mapped to subjective time for both short and long delays [80,83]. Thus, subjects may both normalize delays to a reference point and introduce a waiting cost at the individual level that will lead short delays to seem as costly as the long ones.
Materials and Methods
Participants
For the main experiment, participants were recruited from the NYU Shanghai undergraduate student population on two occasions leading to a total sample of 67 (45 female, 22 male) NYU Shanghai students. Using posted flyers, we initially recruited 35 students but added 32 more to increase statistical power (the power analysis indicates that the total of 63 participants is adequate to detect a medium to strong correlation across subjects, SI Results).
The study was approved by the IRB of NYU Shanghai. The subjects were between 18-23 years old, 34 subjects were Chinese Nationals (out of 67). They received a 30 CNY (~$5 USD) per hour participation fee as well as up to an additional 50 CNY (~$8 USD) per session based on their individual performance in the task (either in NV task, or total in SV and LV tasks, considering the delay of payment in the LV task). The experiment involved five sessions per subject (3 non-verbal sessions followed by 2 verbal sessions), permitting us to perform within-subject analyses. The sessions were scheduled bi-weekly and took place in the NYU Shanghai Behavioral and Experimental Economics Laboratory. In each session, all decisions involved a choice between a later (delay in seconds and days) option and an immediate (now) option. Three subjects did not pass the learning stages of the NV task. One subject did not participate in all of the sessions. These four subjects were excluded from all analyses.
Experimental Design
The experiments were constructed to match the design of tasks used for rodent behavior in Prof. Erlich’s lab. For the temporal discounting task, the value of the later option is mapped to the frequency of pure tone (frequency ∝ reward magnitude) and the delay is mapped to the amplitude modulation (modulation period ∝ delay). The immediate option was the same on all trials for a session and was unrelated to the sound.
Through experiential learning, subjects learned the map from visual and sound attributes to values and delays. This was accomplished via 6 learning stages (0, 1, 2, 3, 4, 5) that build up to the final non-verbal task (NV) that was used to estimate subjects’ discount-factors. Briefly, the first four stages were designed to (0) learn that a mouse-click in the middle bottom ‘reward-port’ produced coins (that subjects knew would be exchanged for money), (1) learn to initiate a trial by a mouse-click in a highlighted port, (2) learn ‘fixation’: to keep the mouse-cursor in the highlighted port, (3) associate a mouse-click in the blue port with the sooner option (a reward of a fixed 4 coin magnitude that is received instantly) (4) associate varying tone frequencies with varying reward at the yellow port (5) associate varying amplitude modulation frequencies with varying delays at the yellow port. On each trial of the stage 3,4 & 5 there was either a blue port or a yellow port (but not both). The exact values for reward and delay parameters experienced in the learning stages correspond to values that are used throughout the experiment. After selecting the yellow-port (i.e. the delayed option), a countdown clock appeared on the screen and the subject had to wait for the delay which had been indicated by the amplitude modulation of the sound for that trial. Any violation (i.e. a mouse-click in an incorrect port or moving the mouse-cursor during fixation) was indicated by flashing black circles over the entire “poke” wall accompanied by an unpleasant sound (for further demonstration of the experimental time flow, please see the videos in SI Movies).
When a subject passed the learning stages (i.e., four successive trials without a violation in each stage, SI Results and Fig. S5), they progressed to the decision stages of the non-verbal task (NV). Progressing from the learning stages, a two-choice decision is present where the subject can choose between an amount now (blue choice) versus a different amount in some number of seconds (yellow choice). During the decision stages the position of blue and yellow circles on the poke wall was randomized between left and right and was always symmetrical (Fig. 1). Each of the 3 non-verbal sessions began with learning stages and continued to the decision stages. In the 2nd and the 3rd non-verbal sessions, the learning stages were shorter in duration.
The final two sessions involved verbal stimuli. During each session, subjects experience an alternating set of tasks: short delay (SV) - long delay (LV) - SV - LV (or LV-SV-LV-SV, counter-balanced per subject). An example of a trial from the short time-horizon task (SV) is shown in the sequence of screens presented in Fig. 1. The verbal task in the long time-horizon (LV) includes Initiation, Decision (as in Fig. 1) and the screen that confirms the choice. There are two differences in the implementation of these sessions relative to the non-verbal sessions. First, the actual reward magnitude and delay are written within the yellow and blue circles presented on the screen, in place of using sounds. Second, in the non-verbal and verbal short delay sessions, subjects continued to accumulate coins (following experiential learning stages) and the total earned was paid via electronic payment at the end of each experimental session. In the long-verbal sessions, a single trial was randomly selected for payment (method of payment commonly used in human studies with long delays, [17]) and shown at the end of the session. The associated payment is made now or later depending on the subject’s choice in the selected trial.
Analysis
For model-based analysis we use hierarchical Bayesian analysis (HBA) brms, 2.0.1 [10,12] that allows for pooling data across subjects, recognizing individual differences and finding full posterior distributions, rather than point estimates of parameters. The means of HBA posteriors of the individual discount-factors for each task are almost identical to the individual fits done for each experimental task separately using maximum likelihood estimation through fmincon in Matlab (SI Method, Fig. S1 & Fig. S2). We further validated the HBA method by simulating choices from a population of ‘agents’ with known parameters and demonstrating that we could recover those parameters given the same number of choices per agent as in our actual dataset. The first non-verbal session data was excluded from model-fitting due to a comparatively high proportion of first-order violations than in the following two non-verbal sessions (see further discussion in SI Results). A 6 population level and 4 subject level parameters model (mixed-effects model (M6p,4s), SI Method) is used to estimate discount-factors and decision-noise from choices. At the subject level this model transforms the stimulus and individual preferences on each trial (inputs to the model include rewards and delays for sooner and later options) into a probability distribution about the subject’s choice. For the non-verbal task, we assumed that the subjects had an unbiased estimate of the meaning of the frequency and AM modulation of the sound. For example, for a given set of parameters the model would predict that trial one will result in 80% chance of the subject choosing later option. First, rewards and delays are converted in the subjective value of each choice option using hyperbolic utility model (Eq. 1). Then, Eq. S3 (a logit, or softmax function) translates the difference between the subjective value of the later and the subjective value of the sooner (estimated using Eq. 1) into a probability of later choice for each subject. Two functions below rely on the four parameters (ki,s: (ki,NV,ki,SV,ki,LV), the discounting factor per subject*task, and τi individual decision noise).
Hyperbolic utility model: where V is the current value of delayed asset and T is the delay time.
Softmax rule: where L is the later, S is the sooner offer and τi is the individual decision noise.
For plotting posteriors of log(k) we calculated probability density estimates (for smoothing) using the ksdensity function in Matlab. The estimate is based on a normal kernel function, and is evaluated at equally-spaced 100 points, xi, that cover the range of the data in x.
To test for differences across tasks we examined the HBA fits using the brms::hypothesis function. This function allows us to directly test the posterior probability that the log(k) is shifted and/or scaled between treatments. This function returns an “evidence ratio” which tells us how much we should favor the hypothesis over the inverse (e.g. ) and we used Bayesian confidence intervals to set a threshold (p < 0.05) to assist frequentists in assessing statistical significance.
The bootstrapped (mean, median and variance) tests are done by sampling with replacement and calculating the sample statistic for each of the 10000 boots, therefore creating a distribution of bootstrap statistics and (i) testing where 0 falls in this distribution for unpaired tests or (ii) doing a permutation test to see whether the means are significantly different for paired tests.
Simulations done for both model-based and model-free analyses are described in detail in SI Results.
Software
Tasks were written in Python using the PsychoPy toolbox (1.83.04, [58]). All analysis and statistics was performed either in Matlab (version 8.6, or higher, The Mathworks, MA), or in R (3.3.1 or higher, R Foundation for Statistical Computing, Vienna, Austria). R package brms(2.0.1) was used as a wrapper for Rstan [32] for Bayesian nonlinear multilevel modeling [10], shinystan [27] was used to diagnose and develop the brms models. Package lme4 was used for linear mixed-effects modeling [6].
Data Availability
Software for running the task, as well as the data and analysis code for regenerating our results are available at github.
Acknowledgments
This work was supported by NYU Shanghai Research Challenge Fund (to S.F.L. and J.C.E.), National Science Foundation of China (NSFC-31750110461) and Shanghai Eastern Scholar Program (SESP) (to E.L.). J.C.E. was additionally supported by (1) Sponsored by Program of Shanghai Academic/Technology Research Leader (15XD1503000); (2) and by Science and Technology Commission of Shanghai Municipality (15JC1400104). We thank Paul Glimcher, Ming Hsu, & Joseph Kable for fruitful discussions. We thank NYU Shanghai undergraduate students Stephen Mathew, Xirui Zhao, Wanning Fu and Jonathan Lin who helped collect data for control experiments.
Footnotes
↵* jerlich{at}nyu.edu
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵