ABSTRACT
While an increased impact of cues on decision-making has been associated with substance dependence, it is yet unclear whether this is also a phenotype of non-substance related addictive disorders, such as gambling disorder. To better understand the basic mechanisms of impaired decision-making in addiction, we investigated whether cue-induced changes in decision-making could distinguish gambling disorder (GD) from healthy control (HC) subjects. We expected that cue-induced changes in gamble acceptance and specifically in loss aversion would distinguish GD from HC subjects.
30 GD subjects and 30 matched HC subjects completed a mixed gambles task where gambling and other emotional cues were shown in the background. We used machine learning and classification to carve out the importance of cue-dependency of decision-making and of loss aversion for distinguishing GD from HC subjects.
Cross-validated classification yielded an area under the receiver operating curve (AUC-ROC) of 68.9% (p=0.002). Applying the classifier to an independent sample yielded an AUC-ROC of 65.0% (p=0.047). As expected, the classifier used cue-induced changes in gamble acceptance to distinguish GD from HC. Especially increased gambling during the presentation of gambling cues was characteristic of GD subjects. However, unexpectedly, cue-induced changes in loss aversion were irrelevant for distinguishing GD from HC subjects. To our knowledge, this is the first study to investigate the classificatory power of addiction-relevant behavioral task parameters when distinguishing GD from HC subjects. The results indicate that cue-induced changes in decision-making are a characteristic feature of addictive disorders, independent of a substance of abuse.
INTRODUCTION
Gambling disorder (GD) is characterized by continued gambling for money despite severe negative consequences (American Psychiatric Association, American Psychiatric Association, & DSM-5 Task Force, 2013). Burdens of GD include financial ruin, loss of social structures, as well as development of psychiatric comorbidities (Bischof et al., 2015; Grinols & Mustard, 2001; Ladouceur, Boisvert, Pépin, Loranger, & Sylvain, 1994; Raylu & Oei, 2002). In line with this clinical picture of impaired decision making, GD subjects have also displayed impaired decision making in laboratory experiments (Clark et al., 2013; Dixon, Marley, & Jacobs, 2003; Glimcher & Rustichini, 2004; MacKillop et al., 2011; Madden, Petry, & Johnson, 2009; Miedl, Peters, & Büchel, 2012; N. M. Petry, 2012; Platt & Huettel, 2008; Romanczuk-Seiferth, van den Brink, & Goudriaan, 2014; Wiehler & Peters, 2015).
Besides impaired decision making, cue reactivity has been a crucial concept in understanding addictive disorders including GD (Beck et al., 2012; Leyton & Vezina, 2013; Schacht, Anton, & Myrick, 2013; Vezina & Leyton, 2009; Wölfling et al., 2011). Through Pavlovian conditioning, any neutral stimulus can become a conditioned stimulus (i.e. a cue) if it has been paired with the effects of the addictive behavior (Mucha, Geier, Stuhlinger, & Mundle, 2000; Potenza MN, Steinberg MA, Skudlarski P, & et al, 2003). In addictive disorders, including GD, cues may induce attentional bias, arousal, and craving for the addictive behavior in periods of abstinence (Carter & Tiffany, 1999; Field, Munafò, & Franken, 2009; Goudriaan, de Ruiter, van den Brink, Oosterlaan, & Veltman, 2010; Heinz et al., 2003; Potenza MN et al., 2003; Wölfling et al., 2011). Treatment of addictive disorders may focus on identifying and coping with individual cues that induce craving for addictive behavior (Bowen et al., 2014; Courtney, Schacht, Hutchison, Roche, & Ray, 2016; Turner, Welches, & Conti, 2014). If we understood better how cues exert control over instrumental behavior and decision making we would be able to improve treatment tools and even public health policy for GD and perhaps other addictive disorders. In the present study we were thus interested in broadening our understanding of the basic mechanisms of impaired decision making in addictions, especially with respect to cue-induced effects on value-based decision making.
The effect of cues exhibiting a facilitating or inhibiting influence on instrumental behavior and decision making is known as Pavlovian-to-Instrumental Transfer (PIT) (Cartoni, Balleine, & Baldassarre, 2016; Cartoni, Puglisi-Allegra, & Baldassarre, 2013; Genauck, Huys, Heinz, & Rapp, 2013; Glasner, Overmier, & Balleine, 2005; Talmi, Seymour, Dayan, & Dolan, 2008). PIT experiments usually have three phases: a first phase where subjects learn an instrumental behavior to gain rewards or avoid punishments, a second phase where subjects learn about the value of arbitrary stimuli through classical conditioning, and a third phase (the PIT phase), where subjects are supposed to perform the instrumental task, while stimuli from the second phase (changing from trial to trial) are presented in the background. The PIT phase measures the effect of value-charged cues on instrumental behavior even though the background cues have no objective relation to the instrumental task in the foreground. In the current study we focus only on the PIT phase. PIT has recently drawn attention in the study of substance use disorders (SUDs) (Dayan, 2009; Everitt et al., 2008; Genauck et al., 2013; Hogarth & Chase, 2012). This is because PIT effects can persist even when the outcome of the instrumental behavior has been devalued (De Tommaso, Mastropasqua, & Turatto, 2018) (De Tommaso, Mastropasqua, & Turatto, 2018), and further because increased PIT has been associated with a marker for impulsivity (Garofalo & Robbins, 2017) and with decreased model-based behavior (Sebold et al., 2016). Lastly, PIT effects tend to be stronger in subjects with a substance-use-disorder than in healthy subjects (Corbit, Janak, & Balleine, 2007; Garbusow et al., 2016; Krank, O’Neill, Squarey, & Jacob, 2008), and increased PIT has been associated with the probability of relapse (Garbusow et al., 2016).
Increased PIT effects are based on Pavlovian and instrumental conditioning and on their interaction. This highlights how addictive disorders rely on learning mechanisms (Heinz, 2017, p. 113 ff.; Heinz, Schlagenhauf, Beck, & Wackerhagen, 2016). GD is an addictive disorder independent of any influence of a neurotropic substance of abuse. The study of PIT in GD may thus further shed light on whether increased PIT in addictive disorders is a result of learning, independent of any substance of abuse, or even a congenital vulnerability (Barker, Torregrossa, & Taylor, 2012).
We are aware of four studies that have observed increased PIT effects on decision making in GD. In three single-group studies, GD subjects have shown higher delay discounting (preferring immediate rewards over rewards in the future) in response to emotional cues vs. neutral cues (Dixon & Holton, 2009), to a casino environment vs. a laboratory environment (Dixon, Jacobs, & Sanders, 2006), and to high-craving vs. low-craving gambling cues (Miedl, Büchel, & Peters, 2014). In a fourth study, GD subjects have been more influenced than HC subjects by gambling stimuli in a response inhibition task (van Holst, van Holstein, van den Brink, Veltman, & Goudriaan, 2012). To our knowledge, however, there are no studies yet that have investigated the effect of cue reactivity on loss aversion in GD.
Loss aversion (LA) is, besides delay discounting, another facet of value-based decision-making. It is the phenomenon wherein people assign a greater value to potential losses than to an equal amount of possible gains (Kahneman & Tversky, 1979). For example, healthy subjects tend to agree to a coin toss gamble (win/loss probability of 0.5) only if the amount of possible gain is at least twice the amount of possible loss. In GD subjects, LA seems to be reduced (Brevers, Cleeremans, Goudriaan, et al., 2012; Genauck et al., 2017; Lorains et al., 2014), but there are also studies that have found no difference in LA between GD and HC subjects (Gelskov, Madsen, Ramsøy, & Siebner, 2016; Takeuchi et al., 2015). LA reduction may stem from altered neural reward and loss anticipation processes in GD subjects (Genauck et al., 2017; Luijten, Schellekens, Kühn, Machielse, & Sescousse, 2017; Romanczuk-Seiferth, Koehler, Dreesen, Wüstenberg, & Heinz, 2015; Sescousse, Barbalat, Domenech, & Dreher, 2013; van Holst, Veltman, Büchel, van den Brink, & Goudriaan, 2012).
High LA protects against disadvantageous gambling decisions. However it has been observed that LA can be transiently modulated by experimentally controlled cues (Mitchell, Gao, Hallett, & Voon, 2016; Schulreich, Gerhardt, & Heekeren, 2016) and that this LA modulation varies considerably across subjects (Charpentier, Martino, Sim, Sharot, & Roiser, 2015). In GD subjects, loss aversion might be particularly cue-dependent leading to reckless gambling especially in casino contexts or at slot machines. In the current study, we thus hypothesized that GD subjects should show stronger stronger PIT effects in their gambling decisions than HC subjects and especially stronger drops in LA when e.g. gambling-related cues are present (i.e. higher “loss aversion PIT”).
So far, we have mentioned studies that have used group-mean difference analyses to investigate decision making or cue reactivity in addictive disorders. This approach is faithful to the desire to explain human behavior rather than predict it (Shmueli, 2010; Yarkoni & Westfall, 2017). However, the desire to only explain human behavior may lead to overly complicated (i.e. overfitted) models, which do not correctly predict human behavior in new samples (Yarkoni & Westfall, 2017). Thus, in the current study we wanted to avoid overfitting and isolate a model with not only explanatory but also predictive value (Gilbert, King, Pettigrew, & Wilson, 2016; Kriegeskorte, Simmons, Bellgowan, & Baker, 2009; Open Science Collaboration, 2015; Simmons, Nelson, & Simonsohn, 2011; Yarkoni & Westfall, 2017). We did so by disentangling the specific benefits of “loss aversion PIT” parameters when distinguishing GD from HC subjects. To this end, we used machine learning methods in addition to classical mean-difference statistics to test our hypotheses. This approach has drawn increasing attention in the field of clinical psychology and psychiatry (Bzdok & Meyer-Lindenberg, 2018; Connor, Symons, Feeney, Young, & Wiles, 2007; Kaplan et al., 2017; Walsh, Ribeiro, & Franklin, 2017). We built and tested an algorithm that decides between various loss aversion models and different models with and without PIT to classify subjects into HC vs. GD groups. Importantly, to avoid overfitting, we used out-of-sample classification (Ahn, Ramesh, Moeller, & Vassileva, 2016; Ahn & Vassileva, 2016; Cerasa et al., 2018; Quentin J. M. Huys, Maia, & Frank, 2016; Seo et al., 2018; Yarkoni & Westfall, 2017). Our results allowed us to disentangle which PIT effects are relevant to distinguish GD from HC subjects.
When selecting cues for this study, we aimed at expanding on existing studies investigating cue-effects in GD (Dixon et al., 2006; Miedl et al., 2014; van Holst, van Holstein, et al., 2012). Besides gambling-related cues, we thus selected additional cues from different motivational and emotional categories (Garbusow et al., 2016) related to GD. These categories comprised images used in gambling advertisements as well as for advertisement of GD therapy and prevention (positive and negative cues).
We expected that our classifier would select models that incorporate the modulation of loss aversion by gambling and other emotional cues (“loss aversion PIT”) to distinguish between HC and GD subjects.
METHODS AND MATERIALS
Samples
GD subjects were diagnosed using the German short questionnaire for gambling behavior questionnaire (KFG) (J. Petry & Baulig, 1996). The KFG diagnoses subjects according to DSM-IV criteria for pathological gambling. Scoring 16 points and over means “likely suffering from pathological gambling”. However, here we use the DSM-5 term “gambling disorder” interchangeably, because the DSM-IV and DSM-5 criteria largely overlap (Rodríguez-Testal, Senín-Calderón, & Perona-Garcelán, 2014). The GD group were active gamblers and not in therapy. The HC group consisted of subjects that had no to little experience with gambling, reflecting the healthy general population as in other addiction studies (Beck et al., 2012; Schacht et al., 2013). We recruited GD subjects via eBay classifieds, and notices in Berlin casinos and gambling halls. Any known history of a neurological disorder or a current psychological disorder (except tobacco dependence) as assessed by the Screening of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) (First, Spitzer, Gibbon, & Williams, 2002) lead to exclusion from the study. There were five subject dropouts (two technical error, one rejected all gambles, two to improve matching). The final sample consisted of 30 GD and 30 HC subjects (Tab. 1). According to the South Oaks Gambling Screen (Lesieur & Blume, 1987; Stinchfield, 2002) (3-point Likert scales), GD subjects differed in gambling habits to HC mainly in frequency of playing slot machines (most frequent answer of GD: “3: once a week or more”, HC: “1: not at all”) (t = 7.30, p < 0.001) and casinos (most frequent answer of GD: “3: once a week or more”, HC: “1: not at all”) (t = 3.99, p = 0.001). GD and HC were matched on relevant variables (education, net personal income, age, alcohol use), except for smoking severity. We thus included smoking severity in the classifier and tested it against classifying based only on smoking severity. For final validation of the fitted classifier we used a sample from another study where subjects performed the affective mixed gambles task in a functional magnetic resonance imaging (fMRI) scanner (see Tab. S2) (Genauck et al., 2018).
Procedure and data acquisition
Subjects completed the task at the General Psychology lab of the Department of Psychology of Humboldt-Universität zu Berlin. They were sitting upright in front of a computer screen using their dominant hand’s fingers to indicate choices on a keyboard. Subjects were attached five passive facial electrodes, two above musculus corrugator, two above musculus zygomaticus, and one on the upper forehead. We recorded electrodermal activity (EDA) from the non-dominant hand. Subjects of the validation sample completed the task head-first supine in a 3-Tesla SIEMENS Trio MRI at the BCAN - Berlin Center of Advanced Neuroimaging. Results of the fMRI and peripheral-physiological recordings will be reported elsewhere.
Affective mixed gambles task
We were inspired by established tasks to measure general LA and LA under the influence of affective cues (Charpentier et al., 2015; Tom, Fox, Trepel, & Poldrack, 2007). As affective cues, four sets of images were assembled: 1) 67 gambling images, showing a variety of gambling scenes, and paraphernalia (gambling cues) 2) 31 images showing negative consequences of gambling (negative cues) 3) 31 images showing positive effects of abstinence from gambling (positive cues): 4) 24 neutral IAPS images (neutral cues). For further information on validation of the cue categories and on access to the stimuli, please see Supplements (1.1). We presented cues of all categories in random order and each gambling cue once. For negative, positive, and neutral cue categories, we randomly drew images from each pool until we had presented 45 images of each category and each image at least once. Hence, we ran 202 trials in each subject. Subjects were each given 20€ for wagering. On every trial, subjects saw a cue that they were instructed to memorize for a paid recognition task after the actual experiment. After 4s (jittered), a mixed gamble, involving a possible gain and a possible loss, with probability P = 0.5 each, was superimposed on the cue. Subjects had to choose how willing they were to accept the gamble (Fig. 1A) on a 4-point Likert-scale to ensure task engagement (Tom et al., 2007). Subjects of an independent validation sample completed the task in an fMRI scanner and had an additional wait period to decide on the gamble (Fig. 1B). Gambles were created by randomly drawing with replacement from a matrix with possible gambles consisting of 12 levels of gains (14, 16, …, 36) and 12 levels of losses (-7, -8, …, -18) (Fig. S2). This matrix is apt to elicit LA in healthy subjects (Genauck et al., 2017; Tom et al., 2007; Tversky & Kahneman, 1992). Gambles were balanced across cue categories according to expected value, variance, gamble simplicity, as well as mean and variance of gain and loss, respectively. Gamble simplicity is defined as Euclidean distance from diagonal of gamble matrix (ed) (Tom et al., 2007) (Fig. S2). Subjects were informed that after the experiment five of their gamble decisions with ratings of “somewhat yes” or “yes” would be randomly chosen and played for real money. HC showed on average 1.00 missed trial, GD 1.05 (no significant group difference, F = 0.022, p = 0.882). In fMRI validation study, HC: 3.13, GD: 4.10, (no significant group difference, F = 0.557, p = 0.457).
Subjective cue ratings
After the task, subjects rated all cues using the Self-Assessment Manikin (SAM) assessment technique (Bradley & Lang, 1994) (reporting on valence: happy vs. unhappy, arousal: energized vs. sleepy, dominance: in control vs. being controlled) and additional visual analogue scales: 1) “How strongly does this image trigger craving for gambling?” 2) “How appropriately does this image represent one or more gambling games?” 3) “How appropriately does this image represent possible negative effects of gambling?” 4) “How appropriately does this image represent possible positive effects of gambling abstinence?”. All scales were operated via a slider from 0 to 100.
All cue ratings were z-standardized within subject. Ratings were analyzed one-by-one using linear mixed-effects regression, using lmer from the lme4 package in R (Bates, Mächler, Bolker, & Walker, 2015), where cue category (and clinical group) denoted the fixed effects and subjects and cues denoted the sources of random effects.
Estimating subject-specific parameters from behavioral choice data
We modeled each subject’s behavioral data by submitting dichotomized choices (somewhat no, no: 0; somewhat yes, yes: 1) into logistic regressions. We dichotomized choices to increase the precision when estimating behavioral parameters, in line with previous studies using the mixed gambles task (Barkley-Levenson, Van Leijenhorst, & Galván, 2013; Genauck et al., 2017; Tom et al., 2007). Regressors for subject-wise logistic regressions were gain (mean-centered) and absolute loss (mean-centered) from the mixed gamble, as well as gamble simplicity (ed), loss-gain ratio (Gelskov et al., 2016) and cue category of the stimulus in the background of the mixed gamble. We defined different logistic regressions by using different trial-based definitions of gamble value (Q) (see Tab. S1), submitted to the logistic function:
Different trial-based definitions of gamble value (Q) reflected two things:
Different ways of modeling LA may be adequate to distinguish a GD from a HC subject. For different models of LA, see e.g. Gelskov et al. (2016) for a ratio model of LA and see (Charpentier et al., 2015; Genauck et al., 2017; Tom et al., 2007) for different additive models (also see Tab. S1).
Different ways of incorporating cue effects on decision-making (PIT effects) may be adequate to distinguish a GD from a HC subject. For example, the model lac assumes …where … where β0 is the intercept, xgain the objective gain value of the gamble, βgain the regression weight for xgain (same holds for xloss and βloss, respectively), and c the dummy-coded column vector indicating the category of the current cue and βc a column vector holding the regression weights for the categories. Lac thus is a weighted linear combination of objective gain, objective loss with an additive influence of cue category. That is, some influence of cue category on decision-making (PIT) is modeled. Note that we have multiple PIT effects here, because βc is a vector of length three, reflecting the three affective categories (gambling, negative, positive) different from neutral. There were also models that did not incorporate any influence of loss aversion or category (intercept-only, a), or just of category (ac), or just of loss aversion (la) or of their interaction (laci):
A model selection procedure could thus choose whether cue-induced effects on loss aversion (“loss aversion PIT”, i.e. the laci model) were important or not to distinguish between GD and HC subjects. Logistic regressions were fit using maximum likelihood estimation within the glm function in R (R Core Team, 2015). Resulting regression parameters were extracted per model (e.g. β0, βgain, βloss for model la) and subject. We then appended the loss aversion parameter (λ) to the estimated coefficients by computing for each subject and pair of βgain, βloss:
Models with names incorporating a “c” (e.g. lac or laci) are those that assume some influence of the cues (i.e. PIT effects). Models laCh, laChci are from (Charpentier et al., 2015). There were 21 different ways (i.e. models) to extract the behavioral parameters per subject (Tab. S1). Note that per model each subject thus had a characteristic parameter vector (the estimated regression weights) and all subjects’ parameter vectors belonging to a certain model constituted the model’s parameter set.
Classification
Our machine learning approach is based on regularized regression and cross-validation as used in other machine learning studies in addiction and psychological research (Ahn et al., 2016; Ahn & Vassileva, 2016; Cerasa et al., 2018; Kaplan et al., 2017; Ryali, Supekar, Abrams, & Menon, 2010; Whelan et al., 2014).
Overall reasoning in building the classifier
The main interest of our study was to assess whether cue-induced changes in decision-making during an affective mixed gambles task can be used to distinguish GD from HC subjects. We hypothesized that shifts in loss aversion that depend on what cues are shown in the background (“loss aversion PIT”) should best distinguish between GD and HC subjects. This means, the laci model’s parameter set should have been the most effective in distinguishing between GD and HC subjects. To test this hypothesis, we used a machine learning algorithm based on regularized logistic regression that selected among various competing parameter sets (from the 21 different models, la, lac, laci, etc.) the set that best distinguished HC and GD subjects. To assess the generalizability of the resultant classifier, we used cross-validation (CV) (Ahn et al., 2016; Guggenmos et al., 2018; Seo et al., 2018; Whelan et al., 2014). Generalizability estimates the predictive power, and hence replicability, of a classifier in new samples (Gilbert et al., 2016; Munafò et al., 2017; Open Science Collaboration, 2015; Simmons et al., 2011; Yarkoni & Westfall, 2017). Note that machine learning algorithms are designed to generalize well to new samples by inherently avoiding overfitting to the training data (Bishop, 2006, p. 9 ff.).
Beyond cross-validation, which uses only one data set (splitting it repeatedly into training and test data set), validation of a classifier on a completely independent sample is the gold-standard in machine learning to assess the quality of an estimated model (Yarkoni & Westfall, 2017). Hence, we estimated the generalization performance also via application of our classifier to a completely independent sample of HC and GD subjects, who had performed a slightly adapted version of the task in an fMRI scanner (Genauck et al., 2018).
Detailed description of algorithm to build classifier
From 21 different parameter sets (Tab. S1), representing different “loss aversion PIT” (e.g. the laci model) and respective control models, we wanted to find the best parameter set to build a classifier (here a logistic regression model) to distinguish between GD and HC subjects in out-of-sample test data (Guggenmos et al., 2018; Whelan et al., 2014) (Fig. S3). We expected to see the laci model winning, because it assumes an interaction between loss aversion and cue categories.
In a first step we used model selection based on cross-validation to find the parameter set that best distinguished between GD and HC (Arlot & Celisse, 2010; Bratu, Muresan, & Potolea, 2008; Varma & Simon, 2006). Using cross-validation for model selection ensures that the selected model will be the one that best generalizes to out-of-sample data (and hence overfitting is avoided). The algorithm used the different parameter sets, one by one, to predict group membership of subjects using logistic ridge regression (Le Cessie & Van Houwelingen, 1992). Ridge regression has one hyperparameter that is tuned to optimize the cross-validated classification power of each parameter set according to the area under the receiver operating curve (AUC-ROC) (Ahn et al., 2016; Ahn & Vassileva, 2016; Whelan et al., 2014; Zacharaki et al., 2009). AUC-ROC ranges from 0.5 (chance) to 1 (perfect sensitivity and specificity) (Provost, Fawcett, & Kohavi, 1998). The parameter set with the highest cross-validated AUC-ROC was selected.
In a second step, the classifier was completed. Smoking severity was not entirely matched between the groups. This is why the algorithm added smoking severity to the parameter set with the best cross-validated AUC-ROC score from the first step to logistic elastic net regression (Zou & Hastie, 2005), optimizing the AUC-ROC by tuning its two hyper-parameters (Whelan et al., 2014), again via cross-validation (Fig. S3). We did not use elastic net regression during model selection because it can force regression parameters to zero (sparse models) (Zou & Hastie, 2005). This would have blurred the interpretable differences between the behavioral models. However, we used elastic net regression in the last step of classifier building, because we were interested whether the algorithm would force parameters of the winning model from the first model selection step to zero, e.g. because a parameter does not add any more information to classification.
We assessed the generalizability of the above algorithm 1000 times via 10-fold cross-validation (Arlot & Celisse, 2010), which yielded a distribution of classifiers and thus of AUC-ROC’s. Note that the cross-validation to estimate generalizability lead to the cross-validations used in the machine learning (for tuning of hyperparameters) of the first and second step to become nested, which is necessary to avoid contamination between training and test data (Arlot & Celisse, 2010; Bratu et al., 2008; Varma & Simon, 2006; Whelan et al., 2014). We computed the mean of the obtained AUC-ROC’s and estimated its p-value by performing the exact same 1000 CV rounds but each time with only smoking severity as predictor (baseline classifier). We then subtracted the AUC-ROC’s of the baseline classifiers one-by-one from the 1000 AUC-ROC’s of the full classifiers. This yielded a distribution of classification improvement (i.e., improvement of AUC-ROC due to using the full classifier instead of the baseline classifier). We tested this distribution against the value of classification improvement under the null-hypothesis (i.e. zero improvement) to obtain a p-value of significance of classification improvement.
To build the final interpretable and reportable classifier, one would usually apply the algorithm once to the complete data set. Since the application to the complete data set still entails cross validation for tuning of the ridge and elastic net regressions’ hyperparameters leading to slightly varying classifiers, the algorithm was not run once but 1000 times on the complete data set. We plotted the ensuing distribution of selected parameter sets and the distribution of the respective regression weights as per-parameter means with 95% percentile bounds. For a graphical illustration of the algorithm see Supplements (Sections 1.3 and 1.6). For the R code and the data please see https://github.com/pransito/PIT_GD_by_release.
Validating the classifier on an independent sample
We applied all 1000 classifiers estimated on the full data set to each of the 60 subjects of the validation sample, yielding 1000 decision values per subject (real-valued scalars). To incorporate the complete distribution of the classifiers, we summed up, for each subject, the decision responses of all 1000 estimated classifiers, yielding one decision value per subject. Using the known true labels of all subjects, the decision values of all subjects were used to compute the AUC-ROC. We compared this obtained AUC-ROC to its distribution under a null-model (10,000 repetitions of random, i.e. coin-flip, classification), to compute a p-value.
Group comparisons regarding acceptance rate and loss aversion parameters
To provide a fuller overview of the data, we also performed classical mean-differences analyses to analyze the choice data, fitting logistic linear mixed effects models using R’s lme4 package (Bates et al., 2015). We present results of model comparisons. We report the p-value of the Chi-Square-difference test (i.e. log-likelihood ratio test) comparing the respective generalized linear models with random effects and their difference in Aikaike Information Criterion, ΔAIC, where positive/negative ΔAIC means model improvement/worsening given the reduction in degrees of freedom. (see Supplements 1.6).
Ethics
Subjects gave written informed consent. The study was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of Charité – Universitätsmedizin Berlin.
RESULTS
Cue ratings
Gambling cues were seen as more appropriately representing one or more gambling games than any other cue category: gambling > neutral (β = 1.589, p < 0.001), gambling > negative (β = 1.197, p < 0.001), gambling > positive (β = 1.472, p < 0.001). They elicited more craving in GD subjects (β = 0.71, p < 0.001). Negative cues evoked more negative feelings in both groups (β = −0.775, p < 0.001) and they were seen as representing negative effects of gambling, more than any other category (see Supplements 2.1). Positive cues were indeed seen as more representative for positive effects of gamble abstinence than any other category. Neutral cues were indeed rated as neutral in valence and as eliciting low arousal (Lang, Greenwald, Bradley, & Hamm, 1993) (Fig. S4).
Prediction of group using behavioral data
The classification algorithm yielded an AUC-ROC of 68.9% (under 0-hypothesis, i.e. with only smoking as predictor: 55.1%, pboot = 0.002) (Fig. 2B; Fig. S6). The most often selected model was the “acceptance rate per category” (ac) model (90.7% of the rounds). Combined with the models laec, lac in 95.8% of the rounds a model was used that incorporated PIT, i.e. an effect of cue category on decisions (Fig. S7). In only 9.3% of the rounds a model was selected that incorporated loss aversion (i.e. gain and loss sensitivities). Validating the estimated classifier in the independent sample, the classifier yielded an AUC-ROC of 65.0% (under random classification: 55.3%, p = 0.047) (Fig. 2C).
Inspection of classifier
Inspecting the classifier’s logistic regression weights, we saw that the classifier places most importance on the shift in gambling acceptance during gambling cues (see Fig. 2D). Note further that the classifier places also some importance on the sensitivity to the negative cues but deselects the sensitivity to positive cues.
Acceptance rate and loss aversion under cue conditions
Overall acceptance rate between groups was not significantly different (HC: 53%, GD: 58%, p = 0.169, ΔAIC = 0). Overall there was a significant effect of cue category on acceptance rate (p < 0.001, ΔAIC = 648) and further there was a significant interaction with group (p = 0.002, ΔAIC = 9). GD subjects showed significantly higher acceptance rate during gambling cues than HC subjects (HC: 49%, GD: 68%, pWaldApprox = 0.003) (Fig. 2A).
The fixed effects for gain sensitivity, absolute loss sensitivity, and LA over all trials for HC (0.26, 0.42, and 1.64) were descriptively larger than for GD (0.19, 0.22, and 1.13). Testing the interaction between group, gain, and loss (i.e. testing for difference of LA between groups) via nested model comparison, yielded p < 0.001, ΔAIC = 93, with sensitivity to loss being significantly smaller in GD subjects pWaldApprox = 0.011. Loss aversion was significantly smaller in GD than in HC (pperm < 0.001). Loss aversion shifts due to category did not differ between groups (see Supplements 2.2).
DISCUSSION
Gambling disorder (GD) is characterized by impaired decision making (Wiehler & Peters, 2015) and craving in response to gambling associated images (Crockford, Goodyear, Edwards, Quickfall, & el-Guebaly, 2005; Goudriaan et al., 2010). However, it is unclear whether specific cue-induced changes in loss aversion exist that distinguish GD from HC subjects. The influence of cue-induced effects on decision-making may be termed Pavlovian-to-Instrumental transfer (PIT). Increase in PIT has been associated with various substance use disorders (Corbit et al., 2007; Garbusow et al., 2016; Genauck et al., 2013), but there are hardly any studies investigating if PIT is relevant in characterizing GD, an addictive disorder independent of substance abuse. To better understand the basic mechanisms of impaired decision-making in addiction, we thus used a machine-learning algorithm with estimation of generalizability to determine the relevance of cue-induced changes on loss aversion (“loss aversion PIT”) in distinguishing GD from HC subjects. We hypothesized that cue-induced changes in gamble acceptance and especially a strong shift of loss aversion by gambling and other affective cues should distinguish GD from HC subjects (i.e. the model representing this effect should have been chosen most often by the algorithm to distinguish GD from HC subjects). To our knowledge, our study is the first to investigate the classificatory power of addiction-relevant behavioral task parameters when distinguishing GD from HC subjects. Moreover, we are not aware of any study specifically investigating the relevance of behavioral PIT effects in characterizing addicted subjects using predictive modeling.
Our algorithm was significantly better in distinguishing GD from HC subjects than the control model, which only used smoking severity as a predictor variable (cross-validated AUC-ROC of 68.0% vs. 56.0%, pboot = 0.003). In an independent validation sample the classifier was almost as accurate (AUC-ROC of 65.4% vs. 55.3%, p = 0.039). When classifying subjects, in 93% of the estimation rounds, our algorithm chose a model with some influence of the cue categories on choices. The most frequently chosen model was the ac model (85%), i.e. a model only accounting for mean shifts in acceptance rate depending on cue category. PIT-related variables could therefore successfully discriminate between GD and HC subjects. We saw that especially the tendency of subjects to gamble more during the presentation of gambling cues was indicative of the subject belonging to the GD group. Contrary to what we expected, “loss aversion PIT” was not useful in distinguishing between GD and HC subjects. In other words, the algorithm never selected the laci model, which included the modulation of gain and loss sensitivity by cue categories. We also did not see this in univariate group comparisons. “Loss aversion PIT” might thus not play a role in distinguishing GD from HC subjects. However, small sample size, as in the present study, may limit the possible complexity of a classifier (Hastie, Tibshirani, & Friedman, 2009, p. 237). It cannot be ruled out that larger and more diverse samples in future studies may produce classifiers allocating at least minor importance to “loss aversion PIT”. Surprisingly, the algorithm also rarely (9.3% of model estimations) selected models that assessed at least overall LA relevant parameters. Instead, the mere shift of gamble acceptance, irrespective of gains and losses at stake, best distinguished GD and HC subjects on a single-case basis. Absolute loss sensitivity and LA, differed, however, significantly between groups in classical mean-difference analyses, in line with previous studies (Genauck et al., 2017; Lorains et al., 2014). This group-mean difference, presumably, was not informative above and beyond the cue-induced changes in general gamble acceptance to warrant a classifier stably incorporating LA. Note however, that this does not mean that the mixed gambles task in the foreground was unnecessary. After all, the choice data stems from exactly this task and with a different task in the foreground, we perhaps would not have recorded choice data relevant for characterizing GD subjects.
We observed that both GD and HC subjects perceived the cues as intended. GD subjects reported higher craving for gambling in response to gambling stimuli as seen in other studies (Crockford et al., 2005; Goudriaan et al., 2010; Limbrick-Oldfield et al., 2017; Potenza et al., 2003; Sodano & Wulfert, 2010; Wulfert, Maxson, & Jardin, 2009). Our results may thus be interpreted as cue reactivity leading to more automatic decision-making in GD subjects. Note that this does not mean that GD subjects simply show higher vigor or more disinhibition to press a button, as in some PIT designs (Prévost, Liljeholm, Tyszka, & O’Doherty, 2012; Talmi et al., 2008). Instead, since the required motor response for saying yes or no changed randomly, gamblers seemed to be indeed more inclined to decide in favor of gambling when gambling cues were shown in the background. Especially because cue influence on LA was not relevant for distinguishing GD from HC subjects, but instead cue influence on general acceptance rate, this may be seen as GD subjects responding more habitually and in a less goal-directed manner (Sebold et al., 2016) when gambling cues are visible.
In the current study, the classifier also put some importance on behavior under negative cues, and, descriptively but not significantly, GD subjects tended to reduce gambling more in the face of negative cues than HC subjects. GD subjects in our study may well know negative consequences of gambling from their own experience and cues representing those negative consequences might thus exert an inhibiting influence on gambling in GD, as has been observed in abstinent alcohol dependent patients (Garbusow et al., 2016). This may be the case even if our GD subjects were neither in GD treatment nor abstinent, different from other studies (Brevers, Cleeremans, Goudriaan, et al., 2012; Brevers, Cleeremans, Verbruggen, et al., 2012; Giorgetta et al., 2014; Lorains et al., 2014; Takeuchi et al., 2015). Future studies should explore the possible power of negative images to inhibit gambling in larger and more heterogeneous GD samples.
Our results show the gambling promoting effects of gambling cues in GD subjects. Alcohol and tobacco advertisement promote alcohol and tobacco use (DiFranza et al., 2006; Lovato, Linn, Stead, & Best, 2003) and advertisement bans and counter-active labels on alcohol and tobacco goods help reduce consumption (Hammond, 2011; Monaghan, Derevensky, & Sklar, 2008; Nelson, 2001). Our results suggest that much like advertisement for these substances, visual stimuli in gambling halls and on slot machines may also increase learning of increased PIT effects. Policy makers may consider our results as another piece of evidence that gambling advertisement is not different from alcohol and tobacco advertisement and extend the bans to include all addiction disorders.
We are not aware of any machine learning studies that have assessed the relevance of a behavioral task measure in characterizing GD. Using this approach, we observed a cross-validated classification performance of AUC-ROC = 0.68. We are aware of one machine learning study that built and tested a classifier in 160 GD patients and matched controls based on personality questionnaire self-report, reaching an AUC-ROC = 0.77 (Cerasa et al., 2018). Studies in the field of substance-based addiction, using behavioral markers and machine learning for classification, report cross-validated AUC-ROC’s of 0.71 to 0.90 for cross-validated classification performance (Ahn et al., 2016; Ahn & Vassileva, 2016; Whelan et al., 2014). However, the current study differs from those studies in some important ways that makes the current observed classification performance, albeit not clinically applicable, highly relevant.
Namely, the above machine learning studies use whole arrays of behavioral tasks and/or personality questionnaires to distinguish addicted from HC subjects. Using this “broadband approach” may have yielded superior classification performance also in the current study. However, we did not strive for maximum classification performance by using a whole range of characteristic information available. Instead, we wanted to assess and delineate the relevance of PIT processes alone during gamble decisions in characterizing GD, while ruling out and testing against the classificatory power of any covariates of no-interest. Our results thus show that cue-induced effects on general gamble acceptance rate, but not on loss aversion, are important features of GD, and not dependent on a substance of abuse. Our results may be a first building block in creating more advanced and more multivariate diagnostic tools for GD and other addictive disorders, especially when combined with other high-performing discriminating features, such as personality profiles and scores from other decision-making tasks. Further, our results invite more in-depth scrutiny of decision-making in GD subjects during the presence of cues, e.g. on neural level (Genauck et al., 2018). Moreover, the above machine learning studies did not use an independent validation sample to corroborate their results. Our independent validation yielded an AUC-ROC of 0.65. This supports the validity of our findings of increased PIT in GD.
STRENGTHS AND LIMITATIONS
Our study and many other studies in psychiatry and psychology suffer from limited sample and effect sizes and a high number of explanatory variables (Button et al., 2013; Kühberger, Fritz, & Scherndl, 2014). Overfitting and failure to replicate are possible risks of such studies (Ioannidis, 2005; Kriegeskorte et al., 2009; Maxwell, Lau, & Howard, 2015; Munafò et al., 2017). We thus deliberately used a machine learning approach with out-of-sample validation to estimate how well the current PIT findings generalize to new GD samples (Whelan et al., 2014; Yarkoni & Westfall, 2017).
However, when carving out the relevance of PIT, we did not match for depression score (BDI) because, epidemiologically, GD is associated with high depression scores (Kessler et al., 2008; N. M. Petry, 2005), meaning it could be seen as a feature of GD. This is why some (Genauck et al., 2017; Goudriaan et al., 2010; Mathar, Wiehler, Chakroun, Goltz, & Peters, 2018), but not all (Sescousse et al., 2013; van Holst, van Holstein, et al., 2012) GD studies use samples unmatched on BDI. In the current study we also did not match for depression because the evidence on the association of PIT and depression is inconclusive (Gotlib, Krasnoperova, Yue, & Joormann, 2004; Q. J. M. Huys et al., 2016; Koster, De Raedt, Goeleven, Franck, & Crombez, 2005; Nord, Lawson, Huys, Pilling, & Roiser, 2018; Rottenberg, Gross, & Gotlib, 2005). However, PIT might play some role in depression and thus also in GD subjects. Future studies should thus address the modulatory effect of depressive symptoms in GD on PIT (Fauth-Bühler et al., 2014).
Further, our classifier was slightly less effective in the independent validation sample than estimated using cross-validation (AUC = 65.4% vs. 68.0%). This might have occurred due to our use of an fMRI version of the affective mixed gambles task in the validation sample. It included an additional decision-making period, during which subjects could not yet answer. This may have led to slight changes in responses with respect to the cue categories. However, this is not necessarily a limitation but a strength since our classifier still performed better than chance. And classifiers that are robust against slight changes in the experimental set-up allow arguably more general conclusions than classifiers that only work with data from the same experimental set-up. Future studies should also use validation samples (Guggenmos et al., 2018).
Lastly, we did not explore the predictive power of PIT variables and clinical symptom scales within GD due to our limited and fairly homogeneous GD sample. Future studies should explore the relationship of PIT variables and clinical variables in bigger, more diverse and longitudinal GD samples and test its predictive power for treatment outcome variables, such as relapse and debt.
CONCLUSION
Our results propose that GD subjects’ acceptance of mixed gambles is cue-dependent and that this cue-dependency even lends itself to distinguishing GD from HC subjects in out of sample data. However, we did not observe that cues specifically shift loss aversion, neither on average, nor in a way relevant to classification. We saw that especially gambling cues lead to increased gambling GD subjects. Observing increased PIT in GD suggests that PIT related to an addictive disorder might not depend on the direct effect of a substance of abuse, but on related learning processes (Heinz, 2017, p. 113 ff.) or on innate traits (Barker et al., 2012). The here reported effects should be explored further in larger, more diverse and longitudinal GD samples as they could inform diagnostics, therapy (Bouchard, Loranger, Giroux, Jacques, & Robillard, 2014; Hone-Blanchet, Wensing, & Fecteau, 2014; Hone-Blanchet et al., 2014; Lee, Kwon, Choi, & Yang, 2007; Marlatt & George, 1984) and public health policy (Monaghan et al., 2008; Nelson, 2001).
ONLINE MATERIAL
You can find the data and R Code to reproduce the analyses here: https://github.com/pransito/PIT_GD_by_release
Authors’ contribution:
AG designed the experiment, collected the data, analyzed the data, and wrote the manuscript. MA implemented the ratings and questionnaire electronically, analyzed the ratings data, and revised the manuscript. KB collected data and revised the manuscript. CM reviewed the machine-learning algorithm and revised the manuscript. AH revised the manuscript, and oversaw manuscript drafting and data analyses. AW revised the manuscript and oversaw implementation of experiment in the lab. NK revised the manuscript and, advised first author. NRS designed and supervised study and experiment, and oversaw manuscript drafting and data analyses.
Footnotes
Funding Sources: This study was funded by a research grant by the Senatsverwaltung für Gesundheit, Pflege und Gleichstellung, Berlin. A.G. was funded by Deutsche Forschungsgemeinschaft (DFG) HE2597/15-1, HE2597/15-2, and DFG Graduiertenkolleg 1519 “Sensory Computation in Neural Systems”.
Conflict of interest: The authors declare no conflict of interest.
Co-Authors’ email addresses: Milan Andrejevic: milan.andrejevic{at}unimelb.edu.au, Katharina Brehm: kathabrehm{at}gmail.com, Caroline Matthis: matthis{at}ni.tu-berlin.de, Andreas Heinz: andreas.heinz{at}charite.de, André Weinreich: a.weinreich{at}psychologie.hu-berlin.de, Norbert Kathmann: kathmann{at}hu-berlin.de, Nina Romanczuk-Seiferth: nina.seiferth{at}charite.de
Remarks To ensure a more convenient reviewing process, we positioned figures and tables at their destined position.