Summary
Social Decision-making is driven by normative influence (leading to public compliance) and informational influence (overwriting private beliefs), but how the brain encodes these modulating forces in probabilistic environments remains unanswered. Using a novel goal-directed learning paradigm in 185 participants, we observed opposite effects of group consensus on choice and confidence: people succumbed to the group when confronted with dissenting information, but increased their confidence when observing confirming information. Leveraging computational modeling and functional neuroimaging we captured the nuanced distinction between normative and informational influence, and identified their unique but interacting neural representations in the right temporoparietal junction (processing social information) and in prefrontal cortices (representing value computations), whose functional coupling instantiates a reward prediction error and a novel social prediction error that modulate behavioral adjustment. These results suggest that a closed-loop network between the brain’s reward hub and social hub supports social influence in human decision-making.
INTRODUCTION
Most of our everyday decisions are made in a social context. This affects both big and small decisions alike: we care about what our family and friends think of which major we choose in college, and we also monitor other peoples’ choices at the lunch counter in order to obtain some guidance for our own menu selection. Behavioral studies have examined social influence as expressed by conformity (Asch, 1956) and have classified two major sources of social influence: normative and informational influence (Cialdini and Goldstein, 2004; Toelch and Dolan, 2015; Fehr and Schurtenberger, 2018). Normative influence leads to public compliance, but individuals may maintain private beliefs, whereas informational influence hypothesizes that social information is integrated into the own valuation process. Neuroscience studies have recently attempted to assess the neurobiological underpinnings of both two types of influence (Klucharev et al., 2009; Campbell-Meiklejohn et al., 2010; Edelson et al., 2011; Zaki et al., 2011; Izuma and Adolphs, 2013; Campbell-Meiklejohn et al., 2017; De Martino et al., 2017; Park et al., 2017). However, results are controversial (Toelch and Dolan, 2015), and more importantly, none of them have addressed the neurocomputational distinction and interaction between normative and informational influence in conjunction with individuals’ own valuation processes. This is largely due to the challenge that most studies (Klucharev et al., 2009; Campbell-Meiklejohn et al., 2010; Zaki et al., 2011; Izuma and Adolphs, 2013) relied on preference judgment tasks where no feedback was given, which hindered the investigation of private belief, and due to a lack of a comprehensive computational model that quantifies and isolates latent determinants relevant for behavioral change. Furthermore, confidence is also crucial alongside individuals’ actions in decision-making (De Martino et al., 2012), however, only a few studies have examined both action and confidence when social influence is presented (Campbell-Meiklejohn et al., 2017; De Martino et al., 2017; Park et al., 2017).
Here we establish a comprehensive account of social influence in decision-making at the behavioral, computational, and neurobiological level identifying distinct, yet interacting brain regions instantiating social decision-making in humans. We ask whether social influence has a distinct neurocomputational representation, and how it is integrated with an individual’s own value computation. To test this, we measured behavioral performance of learning, in combination with computational modeling and functional magnetic resonance imaging (fMRI).
Computational models, especially models rooted in reinforcement learning (Sutton and Barto, 1998), offers a generative framework for approximating the hidden decision processes underlying decision-making, and hence have brought considerable advances in studying the neurocomputational mechanisms (e.g., Daw et al., 2006; Gläscher et al., 2010; den Ouden et al., 2013). Although the specific neural circuitry that is recruited during reward learning through direct experience (Cooper et al, 2012) also contributes to decision-making in social contexts, additional brain networks dedicated to representing other people’s knowledge and mental state are also required for facilitating learning in social contexts (e.g. Behrens, 2008; Hampton, 2008, Boorman, 2012). Given these findings, our computational model integrates direct learning instantiated by individuals’ trial-and-error, together with observational learning instantiated by tracking the others’ performance. This way, our models recapitulates crucial decision variables associated with behavioral adjustments, allowing us to directly probe the network of interacting brain regions.
We hypothesize that normative influence has its basis in mentalizing processes encoded in the right temporal-parietal junction (rTPJ) based on its functional role of representing others in relation to self (Frith and Frith, 1999; Saxe and Kanwisher, 2003; Hampton et al., 2008). Besides, we hypothesize that informational influence involves modulation of social learning signals by the anterior cingulate cortex (ACC), given its relevance to processing vicarious learning (Behrens et al., 2008; Suzuki et al., 2012). In addition, we anticipate that an individual’s own valuation is computed via direct reinforcement learning (RL; Sutton and Barto, 1998) encoded in the ventromedial prefrontal cortex (vmPFC; Bartra et al., 2013). We further propose an interaction of two brain networks related to processing social information (e.g., rTPJ) and to reward information (e.g., striatum), whose coupling is modulated by behavioral adjustment (Hare et al., 2010).
We tested these hypotheses by employing a novel paradigm that allows multiple players to interact with each other in real-time while engaging in a probabilistic reversal learning task (PRL; e.g., Gläscher et al., 2009). Action as well as confidence were recorded before and after receiving social information, and both action and confidence were altered by social influence. We report evidence that direct valuation is integrated with vicarious valuation resulted from informational influence to make decisions, which is instantaneously affected by normative influence. We further identify two distinct networks that separately process reward information and social information, and their functional coupling substantiates a reward prediction error and a social prediction error.
RESULTS
Participants (N = 185) in groups of five performed the “social influence task”, of which, 39 were scanned with the fMRI scanner. The task design utilized a multi-phase paradigm, enabling us to tease apart every crucial behavior under social influence (Figure 1A). Participants began each trial with their initial choice between two abstract fractals with complementary reward probabilities, followed by their first post-decision wager (an incentivized confidence rating, referred to as “bet”; De Martino et al., 2012; Persaud et al., 2007; Dotan et al., 2018; also see Star Methods). After sequentially uncovering their peers’ first decisions in order of their subjective preference, participants had the opportunity to adjust both their choice and bet. The final choice and bet were then multiplied to determine the outcome on that trial. It is worth noting that participants’ actual choices were communicated to every other participant via real-time connection, thus maintaining a high ecological validity of the task. The core of this task is a probabilistic reversal learning paradigm (Gläscher et al., 2009; Figure 1B). This implementation requires participants to learn and continuously update action-outcome associations, thus creating enough uncertainty such that group decisions are likely to influence the choice and bet in the 2nd decision (i.e., inferring normative influence), and examine whether the others’ learning behavior at the end of the trial was integrated into their own learning (i.e., implying informational influence; see Star Methods). These dynamically evolving group decisions also allowed us to parametrically test the effect of group consensus (Figure 1C), although participants were aware that outcomes were only dependent on their own choice and not that of the group, which prevented cooperative and competitive motives.
Social Influence Alters Both Action and Confidence in Goal-directed Learning
Model-free analyses showed that 185 healthy participants indeed altered both their first choice and bet after observing the group decision, but in the opposite direction. Both second choices and bets were modulated by a significant interaction between the relative direction of the group (with vs. against the participant’s 1st choice) and the group consensus (2:2, 3:1, 4:0, view of each participant, Figure 1C). Participants showed an increasing trend to switch their choice toward the group decision when faced with more dissenting social information, whereas, they were more likely to persist when observing agreement with the group (direction x consensus interaction, F1,574 = 55.82, P < 0.001) (Figure 1D and Table S1). Conversely, participants tended to increase their bets as a function of group consensus when observing confirming opinions, but sustained their bets when being contradicted by the group (F1,734 = 4.67, P < 0.05) (Figure 1E and Table S1).
We further verified the benefit of behavior adjustment: social information facilitated learning. Participants’ choice accuracy of the second decision was indeed significantly higher than the first one (F1,2392 = 4.45, p < 0.05; see Figure S1A and Table S2). Similarly, participants’ second bet was significantly higher than their first one (F1,184 = 7.10, p < 0.01; Figure S1B and Table S2). Together, we identified the effect of social influence in behavioral adjustments, and demonstrated that the adjustment is not a result of perceptual salience.
Computational mechanisms of social influence in goal-directed learning
Using computational modeling, we aimed to formally quantify the latent mechanisms by dissociating the two types of social influence at the computational level, and particularly, by unraveling how informational influence was incorporated into one’s own learning process. In addition to existing RL models on social influence (Biele et al., 2011; Diaconescu et al., 2014), our model accommodates multiple players, and is able to simultaneously estimate participants’ two choices and two bets under the hierarchical Bayesian analysis workflow (Gelman et al., 2013; Carpenter et al., 2017). Our efforts to construct these models were guided by two design principles:(1) separating of the individual’s own value (Vself) and the vicarious value of others (Vother) during learning, which were then combined into a choice value for the 1st choice (Vcombined) using linear weighting, and (2) separating instantaneous normative social influence on the second choice and social learning from observing the performance of other players (i.e., informational influence). Crucially, we modeled the second choice as a function of two counteracting influences: (1) the group dissension (Nagainst) representing the instantaneous normative influence and (2) the difference between the participants’ action values in the 1st choice (Vchosen – Vunchosen) representing the distinctiveness of the current value estimates,
Following this construction, for instance, when the value difference on their first choice is large, participants are less likely to succumb to social influence from dissenting information on their second choice, and vice versa. Lastly, when all outcomes were delivered at the end of the trial, both own and vicarious value were updated on a trial-by-trial basis: Vself was updated with a reward prediction error (RPE; Schultz et al., 1997), meanwhile, Vother was updated through tracking a preference-weighted discounted reward history (i.e., their performance in the recent past) of all four other co-players (Figure 2A; see also Star Methods),
Arguably, instead of tracking each co-player’s performance, individuals may simulate an RL-like algorithm to update this vicarious value through observational learning from the co-players – effectively, learning “for” the others. However, using four independent RL algorithms to update learning signals for the other four co-players is cognitively demanding – participants had to track and update each other’s individual learning processes together with their own valuation to make further decisions. Given that an RL update requires both action and reward, a simpler vicarious learning mechanism may rely on either of them. In other words, participants may utilize either others’ choice preference history or their performance history to approximate the value update. We tested all these hypotheses by constructing learning models with the corresponding value update rule. Model comparison first verified the necessity of the social learning component, further ruled out these alternative learning process, and therefore, confirmed that vicarious values were updated by maintaining others’ discounted reward history (Table 1; see also Star Methods). Additionally, Bayesian model averaging using Bayesian bootstrap (Yao et al., 2018) indicated that the probability of this winning model being the best model over the others was 99.8%, which substantiated the model comparison result.
We further verified our winning model using two rigid validation approaches. First, we carried out a parameter recovery analysis. Although the hierarchical Bayesian approach increases the complexity of the parameter space, all parameters of our winning model could be accurately and selectively recovered (Figure S2). Second, as model comparison provided relative model performance, we noted the importance to conduct a posterior predictive check (e.g., Frank, et al., 2015). Indeed, our winning model provided the best out-of-sample predictive power, and its posterior prediction well captured behavioral findings of our model-free analyses (Figure 1D).
Next, we sought to establish the functional association between model parameters and the model-free behaviors. Parameter results (Figure S1C-F) hinted that the extent to which participants learned from themselves and from the others was on average comparable, suggesting that an integrated value computed from one’s direct learning and the informational influence to guide future decisions. Furthermore, parameters related to normative influence were well-capable of predicting the individual difference of participant’s behavioral adjustment. If the model-derived signal was in high accordance with the corresponding model-free feature, we ought to anticipate a strong association between them. Indeed, we observed a significant positive correlation between β(w.Nagainst) and the slopes of choice switch probability in the against condition (Pearson’s R = 0.64, p < 0.001; Figure 2B). Similarly, we observed a positive correlation between β(w.Nwith) and slope derived from the bet difference in the “with” condition (Pearson’s R = 0.33, p < 0.001; Figure 2B).
Together, our computational modeling analyses suggested that (1) participants indeed learned both from their own valuation using an RPE to update their own values and from others by maintaining the others’ reward history that was subsequently integrated it into their own decision process; (2) participants’ behavioral adjustment was instantaneously affected by the group consensus: the number of co-players who made the opposite choice prompted participants to switch their choice towards the direction of the group, where the number of co-players who decided on the same option drove participants to increase their bet. Once we had uncovered those latent elements of the decision processes under social influence, we were then able to test how they were computed and implemented at the neural level using model-based fMRI (Gläscher and O’Doherty, 2010).
Neural substrates of dissociable self- and other value
The first part of our imaging analyses focused on how distinctive decision variables (Figure 2A) were represented in the brain (GLM1; see Star Methods). Second-level results were obtained using non-parametric methods with threshold-free cluster enhancement (TFCE; Smith and Nichols, 2009). Our model distinguished between two value signals and suggested that an integrated value signal was associated with participants’ initial action and bet. Consequently, we now aimed to test the hypothesis that distinct and dissociable brain regions were recruited to implement these computational signals. Indeed, we observed that the vmPFC (peak: x = 4, y = 46, z = −14; all coordinates reported in the MNI space) activity was positively scaled with Vself, and the ACC (peak: x = 2, y = 10, z = 36) activity was positively scaled with Vother (Figure 3A; Table S3). To test whether the two value signals (i.e., Vself, Vother) are distinctively and respectively associated with vmPFC and ACC, we engaged in a double-dissociation approach (e.g., Shamay-Tsoory et al., 2009; Kennerley et al., 2011), and we found that Vself was exclusively encoded in the vmPFC but not in the ACC, whereas Vother was exclusively represented in the ACC but not in the vmPFC (Figure 3B). In addition, the medial prefrontal cortex (mPFC; peak: x = 10, y = 40, z = 10) was functionally coupled with both vmPFC and ACC (Figure S5 and Table S5), suggesting a neural encoding for the integrated value signal (e.g., Rouault et al., 2019). Besides the value signals, an RPE signal was firmly associated with activities in the nucleus accumbens (NAcc; left peak: x = −10, y = 8, z = −10; right peak: x = 12, y = 10, z = −12; Figure 3D; Table S3), a region that is well-studied in the literature (e.g., Schultz et al., 1997). However, a closer look at the two theoretical subcomponents of RPE is necessary to assess its neural substrates (e.g., Behrens et al., 2008; Jocham et al., 2014). To qualify as a region encoding an RPE signal, activities in the NAcc ought to covary positively with the actual outcome (i.e., reward) and negatively with the expectation (i.e., value). Notably, this property thus provides a common framework to test the neural correlates of any error-like signal (Behrens et al., 2008). Under this framework, we indeed found that the activities in the NAcc showed a positive correlation with the reward outcome (p < 0.0001, permutation test; Figure 3E, green line), and a negative effect of the value signal (p = 0.021, permutation test; Figure 3E, red line).
Neural correlates of dissenting social information and behavioral adjustment
We next turned to disentangle the neural substrates of the instantaneous social influence (GLM1; see Star Methods) and the subsequent behavioral adjustment (GLM2; see Star Methods). As we have validated enhanced learning using such social information (Figure S1), we reasoned that participants might process other co-players’ intentions relative to their own first decision to make subsequent adjustments, and this might be related to the mentalizing network. Based on this reasoning, we assessed the parametric modulation of preference-weighted normative influence (w.Nagainst), and indeed found that activity in TPJ (left peak: x = −48, y = −62, z = 30; right peak: x = 50, y = −60, z = 34), among other regions (e.g., ACC, anterior insula; see Table S4), was positively correlated with the dissenting social information (Figure S3; Table S4). In addition, the resulting choice adjustment (i.e., switch vs. stay) covaried with activity in bilateral dorsolateral prefrontal cortex (dlPFC; left peak: x = −32, y = 48, z = 16; right peak: x = 26, y = 42, z = 32; Figure S4; Table S4), commonly associated with executive control and behavioral flexibility (Gläscher et al., 2009; Burke et al., 2010). In contrast, the vmPFC (peak: x = 6, y = 44, z = −16) was more active during stay trials (i.e., stay > switch) trials, reminiscent of its representation of one’s own valuation (Bartra et al., 2013; Gläscher et al., 2009; Figure S4; Table S4). In summary, our model-based fMRI analyses (Gläscher and O’Doherty, 2010; Cohen et al., 2017) revealed two distinct brain networks representing social information and reward and value processing.
A network between Brain’s Reward Circuit and Social Circuit
So far, we have shown how key decision variables are implemented at distinct nodes at the neural level. In a next step, we sought to establish how these network nodes are functionally connected to bring about socially-induced behavioral change and to uncover additional latent computational signals that would otherwise be undetectable by conventional general linear models. We first conducted a psycho-physiological interaction (PPI, Friston et al., 1997; O’Reilly et al., 2012) to examine the context-dependent connectivity, and then we performed a physio-physiological interaction (PhiPI; Friston et al., 1997) to further interrogate the functional coupling at the physiological level.
Using a PPI seeded in rTPJ (see Star Methods), we investigated how behavioral change at the 2nd decision modulated the functional coupling between the social information represented in rTPJ and other brain regions. This analysis identified the left putamen (lPut; peak: x = −20, y = 12, z = −4; Figure 4A and 4B; Table S5). Closer investigations into the computational role of lPut revealed that it did not correlate with the two components of an RPE: activities in the lPut only positively correlated with reward (p < 0.0001, permutation test), but not negatively correlated with value, (p = 0.4854, permutation test). Instead, as the choice adjustment resulted from social information, we reasoned that lPut might encode a social prediction error (SPE). Follwing this reasoning, we conducted a similar analysis as we did for the RPE, and we found that activity in the lPut was positively correlation with the actual agreement (approximated by 1-Nagainst%; p = 0.040, permutation test) and negatively correlated with the expected agreement (approximated by the value difference Vchosen - Vunchosen; as individuals who maintain a larger value difference may expect more agreement; Zhu et al., 2012; p = 0.014, permutation test) (Figure 4C). This pattern confirmed that lPut was effectively encoding a hitherto uncharacterized social prediction error. Taken together, these analyses demonstrate that functional coupling between neural representations of social information and an SPE is enhanced, when this social information is leading to a behavioral change.
In the last step, using a PhiPI we investigated how the neural representation of switching at 2nd decision in the left dlPFC modulated the functional coupling of rTPJ and other brain regions. This analysis revealed that activity in rTPJ positively modulated the coupling between vmPFC (peak: x = 0, y = 48, z = −12) and ACC (peak: x = 0, y = 0, z = 40), which strikingly overlapped with the regions that represented the two value signals (Figure 4D-F, Table S5). Therefore, it seems that the interplay of neural representations of social information and the propensity for behavioral change leads to the updating of both values signals obtained via both direct learning and observational learning.
DISCUSSION
Social influence is a powerful modulator of individual choices (Ruff and Fehr, 2014). Although accumulating studies have investigated the neural representations of social influence and attempted to identify potential mechanisms, there is little direct evidence for the dissociation between normative influence and informational influence and how the distinct computations are represented in the brain and how these brain regions interact with one another. Here, we addressed this gap with a novel social decision-making paradigm that allowed us to dissociate the two sources of social influence. In a comprehensive neurocomputational approach to social decision-making, we were not only able to identify a network of brain regions that represents and integrates social information of others, but also characterize the computational role of each node in this network in detail (Figure 5), suggesting the following process model: one’s own decision is guided by a combination of value signals from direct learning (Vself) represented in vmPFC (Figure 3A-B; Bartra et al., 2013) and from observational learning (Vother) represented in a section of ACC (Figure 3A-B) that is also closely related to estimates of the volatility of others’ choices (Behrens et al., 2008) and to error detection and response conflict resolution (e.g., Carter, 1998). The decisions of others are encoded with respect to the own choice in rTPJ (Figure S4), an area linked, but not limited to representations of social information and agents in a variety of tasks (Saxe and Kanwisher, 2003; Hampton et al., 2008; Suzuki et al., 2015). In fact, rTPJ is also related to Theory of Mind (Frith and Frith, 1999) and other integrative computations such as multisensory integration (Tsakiris et al., 2010) and attentional processing (Corbetta and Shulman, 2002). Moreover, dissenting social information gives rise to a novel and hitherto uncharacterized social prediction error (difference between actual and expected agreement with group decision) represented in lPut (Figure 4A, 4C; Figure S5), unlike the more medial NAcc, which exhibits the neural signature of a classic RPE (Figure 3D-E; O’Doherty et al., 2003; O’Doherty et al., 2004). Notably, the interplay of lPut and rTPJ affects behavioral change toward the group decision (Figure 4A-B) in combination with its neural representation of choice switching in left dlPFC (Figure 4D-F). These functionally connected neural activities trigger the update of direct learning in vmPFC (Vself) and observational learning in ACC (Vother), thus closing the loop of decision-related computations in social contexts.
Our result that self-valuation is encoded in vmPFC is firmly in line with previous evidence from learning and decision-making in non-social contexts (Plassmann et al., 2007; Levy and Glimcher, 2012; Bartra et al., 2013), and extends it into a social context. On top of individuals’ own value update, we further show that ACC is responding to the value signals updated from observational learning, which is aligned with previous studies that have implicated the ACC in tracking the volatility of social information (Behrens et al., 2008; Behrens et al., 2009). In particular, given that the social information in the current study is represented by the cumulative reward history of the others as inferred by our computational model, the dynamics of how well the others were performing in the recent past somewhat reflects their volatility in the same learning environment as in Behrens et al. (2008). Moreover, this distinct neural coding of direct values and vicarious values in the current study fundamentally differs from previous studies on social influence in decision-making. In a recent study, for instance, Apps and Ramnani (2017) reported that neural activities in vmPFC and ACC were respectively associated with subjective values and normative values in an intertemporal economic game. It should be noted that participants in this study were asked to separately and explicitly make intertemporal decisions either for themselves or for another group. In the current study, however, because the two value signals were modeled at the same time point, and no instruction was given to track self and other differently, we argue that the learning processes from one’s own valuation and from the others’ reward history were implemented in parallel; let alone our winning model has indicated the extent to which individuals were relying on their own and the others were effectively comparable (Figure S1C). Collectively, these results demonstrate concurrent yet distinct value computations in vmPFC and ACC when social information is presented during goal-directed learning.
Apart from the value dissociation, we were interested in how direct value and vicarious value were integrated to guide future decisions. As shown by our functional connectivity analyses, the mPFC covaried with activities in both the vmPFC and the ACC. According to a recent meta-analysis (Bartra et al., 2013), this region is particularly engaged during the decision stage when individuals are representing the choice options and selecting actions, especially in value-based and goal-directed decision-making (Rangel and Hare, 2010). Hence, it suggests that beyond the dissociable neural underpinnings, the directed value and vicarious value are further combined to make subsequent decisions (e.g., Rouault et al., 2019).
Furthermore, we replicated previous reports that identified the NAcc was associated with the RPE computation instead of mere outcome representation (Behrens et al., 2008; Jocham et al., 2014; Klein et al., 2017). That is, if a brain region encodes the RPE, its activity should be positively correlated with the actual outcome (e.g., reward), and negatively correlated with the expected outcome (e.g., value). Using this property of the RPE signal, our data identify a hitherto uncharacterized social prediction error (SPE) encoded in a section of the putamen, resulting from a psychophysiological interaction seeded at rTPJ. This suggests that the SPE signal may trigger a re-computation of expected values and give rise to the subsequent behavioral adjustment, which is partially in line previous reports showing that an SPE was signaled by increased striatal activity (Behrens et al., 2008; Meshi et al., 2012). In addition, these functional connectivity results somewhat concur with previous reports that demonstrated the rTPJ has functional links with the reward network, of which the striatal region is a central hub (Hare et al., 2010).
In addition, our results complement and extend previous neuroimaging work on social influence. Consistent with the large body of studies on social influence and conformity (Klucharev, et al., 2009; Berns et al., 2010; Tomlin et al., 2013), the ACC and the aINS are more activated when observing conflicting social information, with the ACC being relevant to general error monitoring and conflict detecting (Ridderinkhof; et al., 2004; Diedrichsen et al., 2005) and the aINS being associated with affective emotion and negative arousal (Craig, 2002, 2003). This body of evidence suggests when observing the other co-players choosing the alternative option and thus contradicting an individuals’ own first choice, a conflict monitoring process may be initiated, and such conflict between individuals’ prior decision and the group opinion may be accompanied by increased affective arousal, such as worry and anxiety. However, such an interpretation remains speculative as we did not collect psychophysiological measures of arousal (such as skin conductance responses). Nevertheless, it should be noted that conflict monitoring is not necessarily triggered by dissenting social information; other forms of perceptual mismatch may provoke a similar neural response in the ACC and aINS. Yet in the current study, our behavioral results have shown that switching towards the direction of the group was not due to perceptual mismatch; instead, social information was utilized to facilitate learning (Figure S1–2).
It is perhaps surprising and interesting that we did not find significant neural correlates with post-decision confidence (i.e., “bet” in the current study). This might be due to the fact that events in our current design (i.e., first choice and first bet, second choice and second bet) were not constructed far apart in time, such that even carefully specified GLMs are not able to capture the variance related to the bets. More importantly, bets in the current design were closely tied to the corresponding choice valuation. In other words, when individuals were sure that one option would lead to a reward, they tend to place a high bet. In fact, this relationship was well reflected by our winning model and related model parameters (Figure S1E). That said, the bet was positively correlated with value signals, thus inevitably resulting in co-linear regressors and diminishing the statistical power. These caveats aside, our results nonetheless shed light on the change in confidence after incorporating social information in decision-making, which largely extends evidence from previous studies that neither directly addressed the difference in confidence before and after exposing the social information, nor examined the interface between choice and confidence (De Martino et al., 2017; Park et al., 2017; Campbell-Meiklejohn et al., 2017).
In summary, our results provide behavioral and computational evidence that normative social influence alters individuals’ actions and confidence, and informational social influence is incorporated into their own valuation processes. Moreover, we found a network of distinct, yet interacting brain regions substantiating specific computational variables. Such a network is in a prime position to process decisions of the sorts mentioned in the beginning, where – as in the example of a lunch order – we have to balance our own experienced-based reward expectations with the expectations of congruency with others and use the resulting error signals to flexibly adapt our choice behavior in social contexts.
STAR METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and MRI data should be directed to and will be fulfilled by the Lead Contact, Jan P. Gläscher (glaescher{at}uke.de), Institute for Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Forty-one groups of five healthy, right-handed participants were invited to participate in the study. No one had any history of neurological and psychiatric diseases, nor current medication except contraceptives or any MR-incompatible foreign object in the body. To avoid gender bias, each group consisted of only same-gender participants. Forty-one out of 205 participants (i.e., one of each group) were scanned with fMRI while undergoing the experimental task. The remaining 164 participants were engaged in the same task via an intranet connection, while being seated in the adjacent behavioral testing room outside the scanner. Twenty participants out of 205 who had only switched once or had no switch at all were excluded, including two fMRI participants. This was to ensure that the analysis was not biased by these non-responders (Tomlin et al., 2013). The final sample consisted of 185 participants (95 females; mean age: 25.56 ± 3.98 years; age range: 18-37 years), and among them, 39 participants belonged to the fMRI group (20 females; mean age: 25.59 ± 3.51 years; age range: 20-37 years). All participants gave informed written consent before the experiment. The study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the Medical Association of Hamburg (PV3661).
METHOD DETAILS
Task
Underlying probabilistic reversal learning paradigm
The task structure of our social influence task was a probabilistic reversal learning (PRL) task. In our two-alternative forced choice PRL (Figure 1B), each choice option was associated with a particular reward probability (i.e., 70% and 30%). After a variable length of trials (i.e., 8-12 trials), the reward contingencies reversed, such that individuals who were undergoing this task needed to re-adapt to the new reward contingencies so as to maximize their outcome. Given that there was always a “correct” option, which led to more reward than punishment, alongside an “incorrect” option, which caused otherwise, a higher-order anticorrelation structure thus existed to represent the underlying reward dynamics (Gläscher et al., 2009).
We used the PRL task rather than tasks with constant reward probability (e.g., being always 70%) because the PRL task structure requires participants continuously pay attention to the reward contingency, in order to adapt to the potentially new state of the reward structure, and to ignore the (rare) probabilistic punishment from the “correct” option. As a result, the PRL task assures constant learning throughout the entire experiment (Figure S1A-B). In fact, one of our early pilot studies used a fixed reward probability. In this pilot, participants quickly learned the reward contingency and neglected the social information; in this version, we thus could not tease apart the contributions between reward-based influence and socially-based influence.
Breakdown of the social influence task
For each experimental session, a group of five participants were presented with and engaged in the same PRL via an intranet connection without experimental deception. For a certain participant, portrait photos of other four same-gender co-players were always displayed within trials. This manipulation further increased the ecological validity of the task, at the same time created a more engaging situation for the participants.
The social influence task contained six phases. Phase 1. Initial choice (1st choice). Upon the presentation of two choice options using abstract fractals, participants were asked to make their 1st choice. A yellow frame was then presented to highlight the chosen option. Phase 2. Initial bet (1st bet). After making the 1st choice, participants were asked to indicate how confident they were in their choice, 1 (not confident), 2 (reasonably confident) or 3 (very confident). Notably, the confidence ratings also serve as post-decision wagering metric (an incentivized confidence rating, Persaud, et al., 2007); namely, the ratings would be multiplied by their potential outcome on each trial. For instance, if a participant won on a particular trial, the reward unit (i.e., 20 cent current setting) was then multiplied with the bet (e.g., 2) to obtain the final outcome (20 * 2 = 40 in the cent). Therefore, the confidence rating in the current paradigm was referred to as “bet”. A yellow frame was presented to highlight the chosen bet. Phase 3. Preference giving. Once all participants had provided their choices and bets, the choices (but not the bets) of the other co-players were revealed. Crucially, instead of seeing all four other choices at the same time, participants had the opportunity to sequentially uncover their peer’s decisions. In particular, participants could decide whom to uncover first and whom to uncover second, depending on their preference. The remaining two choices were then displayed automatically. This manipulation is essential, because, in studies of decision-making, individuals tend to assign different credibility to their social peers based on their performance (e.g., Behrens et al., 2008; Boorman et al., 2013). In this study that there were four other co-players in the same learning environment, it is likely that they had various performance levels, and therefore shall receive difference credibility. Phase 4. Choice adjustment (2nd choice). When all four other choices were presented, participants were able to adjust their choices given the social information. The yellow frame was shifted accordingly to highlight the adjusted choice. Phase 5. Bet adjustment (2nd bet). After the choice adjustment, participants might adjust their bet as well. Additionally, participants also observed other co-players’ second choices (on top of the first choices) once they had submitted their adjusted bets. Presenting other co-players’ choices after the bet adjustment rather than the choice adjustment prevented a biased bet adjustment by the additional social information. The yellow frame was shifted accordingly to highlight the adjusted bet. Phase 6. Outcome delivery. Finally, the outcome was determined by the combination of participants’ 2nd choice and 2nd bet. Outcomes of the other co-players were also displayed, but shown only as the single reward unit (i.e., 20 cent) without multiplying their 2nd bet. This was to provide participants with sufficient yet not overwhelming information about their peer’s performance.
Procedure
To ensure a complete understanding of the task procedure, this study was composed of a two-day procedure: pre-scanning training (Day1), and main experiment (Day2).
Pre-scanning training (Day1)
One to two days prior to the MRI scanning (Day2), five participants came to the behavioral lab to participate in the pre-scanning training. Upon arrival, they received the written task instruction and the consent form. After returning the written consent, participants were taken through a step-by-step task instruction by the experimenter. Notably, participants were explicitly informed (1) that an intranet connection was established so that they would observe real responses from the others, (2) what probabilistic reward meant by receiving examples, (3) that there was neither cooperation nor competition in this experiment, and (4) that the reward probability could reverse multiple times over the course of the experiment, but participants were not informed about when and how often this reversal would take place. Importantly, to shift the focus of the study away from social influence and conformity, we stressed the experiment as a multi-player decision game, where the goal was to detect the “good option” so as to maximize their personal payoff in the end. Given this uncertainty, participants were instructed that they may either trust their own learning experience through trial-and-error, or take decisions from their peers into consideration, as some of them might learn faster than the others. Participants’ explicit awareness of all possible alternatives was crucial for the implementation of our social influence task. To further enhance participants’ motivation, we informed them that the amount they would gain from the experiment would be added to their base payment (see Reward Payment below). After participants had fully understood the task, we took portrait photos of them. To avoid emotional arousal, we asked participants to maintain a neutral facial expression as in typical passport photos. To prevent potential confusion before the training task, we further informed participants that they would only see photos of the other four co-players without their own.
The training task contained 10 trials and differed from the main experiment in two aspects. Firstly, it used a different set of stimuli than used in the main experiment to avoid any learning effect. Secondly, participants were given a longer response window to fully understand every step of the task. Specifically, each trial began with the stimuli presentation of the two choice alternatives (4000ms), followed by the 1st bet (3000ms). After the two sequential preference ratings (3000ms each), all 1st choices from the others were displayed below their corresponding photos (3000ms). Participants were then able to adjust their choice (4000ms) and their bet (3000ms). Finally, outcomes of all participants were released (3000ms), followed by a jittered inter-trial interval (ITI, 2000 – 4000ms). To help participants familiarize themselves, we orally instructed them what to expect and what to do on each phase for the first two to three trials. The procedure during Day1 lasted about one hour.
Main experiment (Day2)
On the testing day, the five participants came to the MRI building. After a recap of all the important aspects of the task instruction, the fMRI participant gave the MRI consent and entered the scanner to perform the main social influence task, while the remaining 4 participants were seated in the same room adjacent to the scanner to perform the same task. All computers were interconnected via the intranet. They were further instructed not to make any verbal or gestural communications with other participants in the experiment.
The main experiment contained 100 trials and used a different pair of stimuli from the training task. It followed the exact description detailed above (see Breakdown of the social influence task). Specifically, each trial began with the stimuli presentation of the two choice alternatives (2500ms), followed by the 1st bet (2000ms). After the two sequential preference ratings (2000ms each), all 1st choices from the others were displayed below their corresponding photos (3000ms). Participants were then able to adjust their choice (3000ms) and their bet (2000ms). Finally, outcomes of all participants were released (3000ms), followed by a jittered inter-trial interval (ITI, 2000 – 4000ms). The procedure during Day2 lasted about 1.5 hours.
Reward payment
All participants were compensated with a base payment of 35 Euro plus the reward they had achieved during the main experiment. In the main experiment, to prevent participants from careless responses on their 1st choice, they were explicitly instructed that on each trial, either the 1st choice or the 2nd choice would be used to determine the final payoff. However, this did not affect the outcome delivery on the screen. Namely, although on some trials participants’ 1st choice was used to determine their payment, only outcomes that corresponded to the 2nd choice appeared on the screen. Additionally, when their total outcome was negative, no money was deducted from their final payment. Overall, participants gained 4.48 ± 4.41 Euro after completing the experiment. Finally, the experiment ended with an informal debriefing session.
Behavioral data acquisition
Stimulus presentation, MRI pulse triggering, and response recording were accomplished with Matlab R2014b (www.mathworks.com) and Cogent2000 (www.vislab.ucl.ac.uk/cogent.php). In the behavioral group (as well as during the pre-scanning training), buttons “V” and “B” on the keyboard corresponded to the left and right choice options, respectively; and “V”, “B”, and “N” corresponded to the bets “1”, “2”, and “3”, respectively. As for the MRI group, a four-button MRI-compatible button box with a horizontal button arrangement was used to record behavioral responses. To avoid motor artifacts, the position of the two choices options was counterbalanced for all the participants.
FMRI data acquisition and pre-processing
MRI data collection was conducted on a Siemens Trio 3T scanner (Siemens, Erlangen, Germany) with a 32-channel head coil. Each brain volume consisted of 42 axial slices (voxel size, 2 x 2 x 2 mm, with 1 mm spacing between slices) acquired using a T2*-weighted echoplanar imaging (EPI) protocol (TR, 2510ms; TE, 25ms; flip angle, 40°; FOV, 216mm) in descending order. Orientation of the slice was tilted at 30° to the anterior commissure-posterior commissure (AC-PC) axis to improve signal quality in the orbitofrontal cortex (Deichmann et al., 2003). Data for each participant were collected in three runs with total volumes ranging from 1210 to 1230, and the first 3 volumes of each run were discarded to obtain a steady-state magnetization. In addition, a gradient echo field map was acquired before EPI scanning to measure the magnetic field inhomogeneity (TE1 = 5.00ms, TE2 = 7.46ms), and a high-resolution anatomical image (voxel size, 1 x 1 x 1 mm) was acquired after the experiment using a T1-weighted MPRAGE protocol.
fMRI data preprocessing was performed using SPM12 (Statistical Parametric Mapping; Wellcome Trust Center for Neuroimaging, University College London, London, UK). After converting raw DICOM images to NIfTI format, image preprocessing continued with slice timing correction using the middle slice of the volume as the reference. Next, a voxel displacement map (VDM) was calculated from the field map to account for the spatial distortion resulting from the magnetic field inhomogeneity (Jezzard and Balaban, 1995; Andersson et al., 2001; Hutton et al., 2002). Incorporating this VDM, the EPI images were then corrected for motion and spatial distortions through realignment and unwarping (Andersson et al., 2001). The participants’ anatomical images were manually checked and corrected for the origin by resetting it to the AC-PC. The EPI images were then coregistered to this origin-corrected anatomical image. The anatomical image was skull stripped and segmented into gray matter, white matter, and CSF, using the “Segment” tool in SPM12. These gray and white matter images were used in the SPM12 DARTEL toolbox to create individual flow fields as well as a group anatomical template (Ashburner, 2007) The EPI images were then normalized to the MNI space using the respective flow fields through the DARTEL toolbox normalization tool. A Gaussian kernel of 6 mm full-width at half-maximum (FWHM) was used to smooth the EPI images.
After the preprocessing, we further detected brain volumes that (1) excessively deviated from the global mean of the BOLD signals (> 1 SD), (2) showed excessive head movement (movement parameter / TR > 0.4), or (3) largely correlated with the movement parameters and the first derivative of the movement parameters (R2 > 0.95). This procedure was implemented with the “Spike Analyzer” tool (https://jan-glaescher.squarespace.com/s/spike_analyzer.m) which returned indices of those detected volumes. We then constructed them as additional participant-specific nuisance regressors of no interest across all our first-level analyses. This implementation detected 3.41 ± 4.79% of all volumes. As this procedure was done per participant, the total number of regressors for each participant may differ.
QUANTIFICATION AND STATISTICAL ANALYSIS
Behavioral data analysis
We tested for behavioral adjustment after observing social information in Phase 3, by assessing the choice switch probability in Phase 4 (how likely participants switched to the opposite option) and the bet difference in Phase 5 (2nd bet magnitude minus the 1st bet magnitude) as a measurement of how choice and confidence were modulated by the social information. Neither group difference (fMRI vs. behavioral) nor gender difference (male vs. female) was observed for choice switch probability (group: F1,914 = 0.14, p = 0.71; gender: F1,914 = 0.24, p = 0.63) and bet difference (group: F1,914 = 0.09, p = 0.76; gender: F1,914 = 1.20, p = 0.27). Thus, we pulled data altogether to perform the subsequent analysis. Additionally, trials where participants did not give valid responses on either the 1st choice or the 1st bet in time were excluded from the sample. On average, 7.9 ± 7.3% of the entire trials were excluded.
We first tested how choice switch probability (Figure 1D, left) and bet difference (Figure 1D, right) varied as a function of the direction of the group (with and against, with respect to each participant’s 1st choice) and the consensus of the group (2:2, 3:1, 4:0, view of each participant, Figure 1C). To this end, we submitted the choice switch probability and the bet difference to an unbalanced 2 (direction) x 3 (consensus) repeated measures ANOVAs. The unbalance was due to the fact that data in the 2:2 condition could only be used once, and we grouped it into the “against” condition, resulting in three consensus levels in the “against” condition and two consensus levels in the “with” condition. Grouping it into the “with” condition did not alter the results. We also sought to account for the random effect in this analysis. We constructed five mixed effect models (Table S1) with different random effect specifications, and selected the best one for the subsequent statistical analysis.
We then tested if there was a linear trend within each direction condition as a function of the group consensus. That is, we tested whether the choice switch probability in the “against” (or “with”) condition showed a significant increase (or decrease) trend as the group consensus. To this aim, we first dummy coded the consensus 2:2, 3:1, 4:0 as 1, 2 and 3, then performed a simple 1st-order polynomial fit using the choice switch probability as a function of the newly coded consensus. We concluded the linear trend when the slope term was significant. Similarly, the linear trend in the bet difference was also tested as a function of the group consensus for each direction.
Given that participants’ interest solely lay in maximizing their personal payoffs, we then tested whether it was beneficial for the participants to adjust their choice after receiving the social information. If so, participants were expected to perform better (choosing the “good” option more often) on their 2nd choices than on their 1st choices. To this aim, we assessed the accuracy of both choices (whether selecting the more rewarding option) as well as both bets’ magnitude (i.e., 1, 2 and 3). We selected a window of three trials to perform this analysis: three trials before the reversal and three trials after the reversal, with the reversal included. We then stacked the data with respect to the reversal (i.e., time-lock) and averaged them per participants. Similar to the above analysis, here we submitted the data to a 2 (1st accuracy / 1st bet vs. 2nd accuracy / 2nd bet) x 7 (relative trial position, −3, −2, −1, 0, +1, +2, +3) ANOVAs with five difference random effect specifications, respectively (Table S2). If the main effect of position was significant, we then submitted the data to a post-hoc comparison with Tukey’s HSD correction.
All repeated measures ANOVA mixed-effect models were analyzed with the “lme4” package (Bates et al., 2014) in R (v3.3.1; www.r-project.org). The 1st-order polynomial fit was performed with Matlab R2014b. Results were considered statistically significant at the level p < 0.05.
Computational modeling
We developed three categories of models to uncover the latent computational mechanisms (Figure 2A) when participants were performing the social influence task. We based all our computational models on the simple reinforcement learning model (RL, Sutton and Barto, 1998), and progressively add components (Table 1).
First, given the structure of the PRL task, we sought to evaluate whether a fictitious update RL model that incorporates the anticorrelation structure (see Underlying probabilistic reversal learning paradigm) outperformed the simple RL model that only updated the value of the chosen option. Thus, we constructed both the simple RL model and the fictitious update RL model, and both of them did not consider social information (Category 1: M1a and M1b). On top of Category 1 models, we then included the instantaneous social influence (reflecting the normative influence) in the non-social models to construct social models (Category 2: M2a and M2b). Finally, we further considered the component of observational learning (reflecting the informational influence) with competing predictions (Category 3: M3, M4, M5, M6a, M6b). In all models, we simultaneously estimated participants choice and bet using hierarchical Bayesian analysis. The remainder of this section explains the technical details regarding the model specification.
Choice model specifications
In all models, the 1st choice was estimated using a softmax function (Sutton and Barto, 1998) that converted action values into action probabilities. On trial t, the action probability of choosing the option A (between A and B) was defined as follows:
For the 2nd choice, because we coded it as a “switch” (1) or a “stay” (0), it was modeled as logistic regression with a switch value (V(switch)). On trial t, the probability of switch given the switch value was defined as follows: where Φ was the inverse logistic linking function:
It is worth noting that, in both action probability model specifications, we did not include a commonly-used inverse softmax temperature parameter τ. This was because we explicitly constructed both the option values in the 1st choice and the switch value in the 2nd choice in a design-matrix fashion (e.g., Eq. 5; and see the text below). Therefore, including the inverse softmax temperature parameter would inevitably give rise to a multiplication term, which, as a consequence, would cause unidentifiable parameter estimation (Gelmam et al., 2013). For completeness, we also assessed models with the τ parameter, and they performed consistently worse than our models specified here.
The Category 1 models (M1a and M1b) did not consider any social information. In the simplest model (M1a), a Rescorla-Wagner model (Rescorla and Wagner, 1972) was used to model the 1st choice, with only the chosen value being updated via the RPE (δ), and the unchosen value remaining the same as the last trial.
An effect weight was then multiplied by the values before being submitted to Eq. 1, as in:
Because there was no social information in M1a, the switch value of 2nd choice was comprised merely of the value difference of the 1st choice and a switch bias:
In M1b we tested whether the fictitious update could improve the model performance, as the fictitious update has been successful in PRL tasks in non-social contexts (e.g., Hampton et al., 2007; Gläscher et al., 2009). In M1b, both the chosen value and the unchosen value were updated, as in:
Our Category 2 models (M2a and M2b) tested the role of instantaneous social influence on the 2nd choice, namely, whether observing choices from the other co-players in the same learning environment contributed to the choice switching. As compared with M1 (M1a and M1b), only the switch value of the 2nd choice was modified, as follows: where w.Nagainst denoted the preference-weighted number of against relative to participants’ 1st choice. This is to reflect the ordering effect based on participants’ preference. Note that the preference weight were fixed parameters based on each participant’s preference towards the others when uncovering their choices (see Experimental design): the 1st favored co-player received a weight of 0.75, the 2nd favored co-player received a weight of 0.5, and the rest two co-players received a weight of 0.25, respectively. Of note, estimating these preference weights as free parameters would cause unidentifiable model estimate behavior, thus this is beyond the scope of this paper. Moreover, this term (w.Nagainst) was normalized to lie between 0 and 1 before entering Eq. 8. All other specifications of M2a and M2b were identical to M1a and M1b, respectively.
Next, we assessed whether participants learned from their social peers and whether they updated vicarious action values through observational learning using Category 3 models (M3, M4, M5, M6a, M6b). It is worth noting that, models belonging to Category 2 solely considered the instantaneous social effect on the 2nd choice, whereas models in Category 3 tested several competing hypotheses of the observational learning effect that may contribute to the 1st choice on the following trial, in combination with individuals’ own valuation processes. In all models within this category, the choice value of the 1st choice was specified by a weighted sum between Vself updated via direct learning and Vother updated via observational learning:
M3 tested whether individuals recruited a similar RL algorithm to their own, and therefore constructed the other co-players as independent RL agents to update their action values respectively. To be specific, in the principle of M3, participants were assumed to update values “for” the others using the fictitious update as described above (Eq. 7), and others’ action values on each C1 were determined by a preference-weighted sum between one’s own value updated via direct learning (Vself) and the vicarious value updated through the observational learning (Vother). That is, observing the performance of the other group members was also influencing the learning (i.e., updating) of expected values from trial-to-trial. The values of each choice option from each co-player were weighted (by the preference weight w) and then summed to formulate Vother, as follows: where s denoted the index of the four other co-players. Vother was afterward normalized to lie between −1 and 1, using Eq. 3:
This normalization was to ensure that the numerical magnitude of Vother was comparable to Vself, and it, therefore, made better sense to compare the size of the corresponding value-related parameters (βself and βother in Eq. 3).
One may argue that having four independent RL agents as in M3 was cognitively demanding: in order to accomplish so, participants had to track and update each other’s individual learning processes together with their own valuation. We, therefore, constructed three additional models that employed simpler but distinct valuation pathways to update values through observational learning. In essence, M3 considered both choices and outcomes to determine the action value. We then asked if using either choices or outcomes alone may perform as well as, or even better than, M3. Following this assumption, M4 updated Vother using only the others’ action preference, whereas M5 considered the others’ current outcome to resemble the value update via observational learning.
In M4, other players’ action preference is derived from the choice sequence over the last three trials using the cumulative distribution function of the beta distribution at the value of 0.5. For instance, if one co-player chose option A twice and option B once in the last three trials, then the action preference of choosing A for him/her was: betacdf (0.5, frequency of B + 1, frequency of A + 1) = betacdf (0.5, 1 + 1, 2 + 1) = 0.6875. Those action preferences (ρ) were then used to update Vother: where C2 denoted the 2nd choice. Note that, in this specification, only when C2s,t-1=A, the action preference ρs,t-1 was used to update Vother(A). Vother(B) was updated in the same fashion. The values were then normalized using Eq. 11.
Likewise, M5 tested whether participants updated Vother using only each other’s reward (R):
These values were then normalized using Eq. 11.
Moreover, we did not rule out the possibility that participants maintained a cumulated reward history over the last a few trials instead of monitoring only the most recent outcome of the others. In fact, a discounted reward history over the recent past (e.g., the last three trials) has been a relatively common implementation in other RL studies in non-social contexts (e.g., Kennerley et al., 2006; Scholl et al., 2017). By testing four lengths of trial windows (3, 4, or 5) and using a nested model comparison, we decided on a window of three past trials to accumulate other co-players’ performance, and constructed such a model as M6a: where i denoted the trial index from T−3 to T−1, and γ denoted the decay factor. The values were then normalized using Eq. 11.
Lastly, given that M6a was the winning model among all the models above (M1 – M6a) indicated by model comparison (see below Model selection and posterior predictive check), we assessed in M6b whether the 1st bet contributed to the choice switching on the 2nd choice as well, as follows:
Bet model specifications
In all models, both the 1st bet and 2nd bet were modeled as an ordered-logistic regression that is often used for quantifying discrete variables, like Likert-scale questionnaire data (Greene, 2003; Greene and Hensher, 2010). We applied the ordered-logistic model because the bets in our study indeed inferred an ordering effect. Namely, betting on 3 was higher than betting on 2, and betting on 2 was higher than betting on 1. However, the difference between the bets 3 and 1 (i.e., a difference of 2) was not necessarily twice as the difference between the bets 3 and 2 (i.e., a difference of 1). Hence, we needed to model the distance (decision boundary) between them. Moreover, despite the fact that the bets in our study could only be 1, 2, or 3, we hypothesized a continuous mental process when individuals were placing bets, which satisfied the general assumption of the ordered-logistic regression model (Greene, 2003).
There were two key components in the ordered-logistic model, the continuous utility U, and the set of thresholds θ. As discussed above, we hypothesized a continuous strength of bet utility, Ubet, which varied between the thresholds to predict the bets. In addition, a set of K−1 thresholds (θ1, 2,…, K-1) was introduced to quantify the decision boundaries, where K was the level of the discrete categories. As there were three bet levels (K = 3), we introduced two decision thresholds, θ1 and θ2, (θ2 > θ1). As such, the predicted bets (bêt) on trial t were then represented as follows: where i indicated either the 1st bêt or the 2nd bêt. Because there were only two levels of threshold, for simplicity, we set θ1 = 0, and θ2 = θ,(θ> 0). To model the actual bets, a logistic function (Eq. 3) was used to obtain the action probability of each bet, as follows:
In our model specification of the 1st bet, the utility Ubet1 was comprised of a bet bias and the value difference between the chosen option and the unchosen option. The rationale was that, the larger the value difference, the more confident individuals were expected to be, hence placing a higher bet. This utility Ubet1 was kept identical across all models (M1a – M6b), as follows:
Note that although the formula was the same as Eq. 6, the βs were independent of each other. To model the 2nd bet, we were interested in the bet change relative to the 1st bet. Therefore, the utility Ubet2 was constructed on top of Ubet1. In all non-social models (M1a, M1b), the change term was represented by an intercept parameter, as follows:
Moreover, in all social models (M2a – M6b), regardless of the observational learning effect, the change term was specified by the instantaneous social information, as follows:
It should be noted that, however, despite the anticorrelation between w.Nwith and w.Nagainst, the parameter estimation results showed that the corresponding effects (i.e., βwith and βagainst) did not rely on each other (Pearson’s R = 0.04, p > 0.05). In fact, as shown in Figure S1F, w.Nwith predicted bet increase, whereas w.Nagainst predicted bet decrease, suggesting their independent contributions to the bet change during the adjustment. Additionally, we constructed two other models using either w.Nwith or w.Nagainst, but the model performance was dramatically worse than including both of them (ΔLOOIC > 1000).
Model estimation with hierarchical Bayesian analysis
In all models, we simultaneously estimated both choices (C1, C2) and bets (B1, B2). Model estimations of all aforementioned candidate models were performed with hierarchical Bayesian analysis (HBA) (Gelman et al., 2013) using a newly developed statistical computing language Stan (Carpenter et al., 2017) in R. Stan utilizes a Markov Chain Monte Carlo (MCMC) sampling scheme to perform full Bayesian inference and obtain the actual posterior distribution. We performed HBA rather than maximum likelihood estimation (MLE) because HBA provides much more stable and accurate estimates than MLE (Ahn et al., 2011). Following the approach in the “hBayesDM” package (Ahn et al., 2017), we assumed, for instance, that a generic individual-level parameter φ was drawn from a group-level normal distribution, namely, φ ~ Normal (μφ, σφ), with μφ and σφ. being the group-level mean and standard deviation, respectively. Both these group-level parameters were specified with weakly-informative priors (Gelman et al., 2013): μφ ~ Normal (0, 1) and σφ.~ half-Cauchy (0, 5). This was to ensure that the MCMC sampler traveled over a sufficiently wide range to sample the entire parameter space. All parameters were unconstrained except for η / γ (both [0 1] constraint, with inverse probit transform) and θ (positive constraint, with exponential transform).
In HBA, all group-level parameters and individual-level parameters were simultaneously estimated through the Bayes’ rule by incorporating behavioral data. We fit each candidate model with four independent MCMC chains using 1000 iterations after 1000 iterations for the initial algorithm warmup per chain, which resulted in 4000 valid posterior samples. Convergence of the MCMC chains was assessed both visually (from the trace plot) and through the Gelman-Rubin Statistics (Gelman and Rubin, 1992). values of all parameters were close to 1.0 (at most smaller than 1.1 in the current study), which indicated adequate convergence.
Model selection and posterior predictive check
For model comparison and model selection, we computed the Leave-One-Out information criterion (LOOIC) score per candidate model (Vehtari et al., 2016). The LOOIC score provides the point-wise estimate of out-of-sample predictive accuracy in a fully Bayesian way, which is more reliable compared to point-estimate information criterion (e.g., Akaike information criterion, AIC; deviance information criterion, DIC). By convention, lower LOOIC score indicates better out-of-sample prediction accuracy of the candidate model. Plus, a difference score of 10 on the information criterion scale is considered decisive (Burnham and Anderson, 2004). We selected the model with the lowest LOOIC as the winning model. We additionally performed Bayesian model averaging (BMA) with Bayesian bootstrap (Yao et al., 2018) to compute the probability of each candidate model being the best model. Conventionally, the BMA probability of 0.9 (or higher) is a decisive indication.
Given that model comparison provided merely relative performance among candidate models (Palminteri et al., 2017), we then tested how well our winning model’s posterior prediction was able to replicate the key features of the observed data (a.k.a., posterior predictive check, PPC). To this end, we applied a post-hoc absolute-fit approach (Steingroever et al., 2014) that factored in participants’ actual action and outcome sequences to generate predictions with the entire posterior MCMC samples. Namely, we let the model generate choices and bets as many times as the number of samples (i.e., 4000 times) per trial per participants and we asked whether the generated data could reproduce the behavioral pattern in our behavioral analysis.
Lastly, we tested how specific model parameters linked with model-free behavior to assess individual differences (Figure 2B). In the choice model, we tested the simple Pearson’s correlation between β(w.Nagainst) and the 1st-order polynomial slope derived from the choice switch probability as a function of the group consensus in the “against” condition (see above Behavioral analysis). Likewise, in the bet model, we tested the simple Pearson’s correlation between β(w.Nwith) and the 1st-order polynomial slope derived from the bet difference as a function of the group consensus in the “with” condition (see above Behavioral analysis).
Parameter recovery
Considering that there were multiple free parameters in the winning model, we verified whether parameters were identifiable using parameter recovery after the model fitting. In the first step, we randomly drew a set of group-level parameters from the joint posterior group-distribution of M6b. Next, we simulated 80 synthetic participants, whose parameters were randomly drawn from this set of group-level parameters. Then, we used the model (M6b) as a generative tool to simulate behavioral data for our social influence task, namely, to simulate 1st choice, 2nd choice, 1st bet, and 2nd bet for 100 trials per participant. Once having the behavioral data, we fit M6b to the simulated data in the same way as we did for the real data. And finally, we compared whether the posterior group-distribution given the simulated data recovered the actual group-level parameters that were used to simulate those data (Figure S2).
MRI data analysis
Deriving internal computational signals
Based on the winning model (Table 1) and its parameter estimation (Figure S2C-F), we derived the trial-by-trial computational signals for each individual MRI participant using the mean of the posterior distribution of the parameters. We used the mean rather than the mode (i.e., the peak) because in MCMC, especially HMC implemented in Stan, the mean is much more stable than the mode to serve as the point estimate of the entire posterior distribution (Carpenter et al., 2017). In fact, as we modeled all parameter as normal distributions, the posterior mean and the posterior mode are highly correlated (Pearson’s R = 0.99, p < 0.001).
First-level analysis
fMRI data analysis was performed using SPM12. We conducted model-based fMRI analysis (Gläscher et al., 2009; Gläscher and O’Doherty, 2010) containing the computational signals described above (Table S6). We set up two event-related general linear models (GLM1 and GLM2) to test our hypotheses.
GLM1 assessed the neural representations of valuation resulted from participants’ direct learning and observational learning in Phase 1, as well we the instantaneous social influence in Phase 3. The first-level design matrix in GLM1 consisted of constant terms, nuisance regressors detected by the “Spike Analyzer”, plus the following 22 regressors: 5 experimentally measured onset regressors for each cue (cue of the 1st choice, cue of the 2nd choice, cue of the 1st bet, cue of the 1st bet, and cue of the outcome); 6 parametric modulators (PM) of each corresponding cue (, belonging to the cue of the 1st choice; w. Nagainst belonging to the cue of the 2nd choice; Ubet1, Ubet2, belonging to the cue of the 1st bet and the 2nd bet, respectively; and RPE belonging to the cue of the outcome); 5 nuisance regressors accounted for all of the “no-response” trials for each cue; and 6 movement parameters. Note that for the two value signals, was orthogonalized with respect to . This allowed to obtain as much variance as possible on the regressor, and then any additional (explainable) variance was accounted for by the regressor (Mumford et al., 2015; Norbury et al., 2018). Also, we intentionally did not include the reward outcome at the outcome cue. This was because (1) the RPE and the reward outcome are known to be correlated in goal-directed learning studies using model-based fMRI (e.g., Chien et al., 2016), and (2) we sought to explicitly verify RPE signals by its hallmarks using the ROI time series extracted from each participant given the second-level RPE contrast (see below ROI time series analysis below).
GLM2 was set up to examine the neural correlates of choice adjustment in Phase 4. To this end, GLM2 was identical to GLM1, except that the PM regressor of w. Nagainst under the cue of the 2nd choice was replaced by the PM regressor SwSt.
Second-level analysis
The resulting β images from each participant’s first-level GLM were then used in a random-effects group analyses at the second level, using one-sample two-tailed t-tests for significant effects across participants. To correct for multiple comparisons of the functional imaging data, we employed the threshold-free cluster enhancement (TFCE; Smith and Nichols, 2009) implemented in the TFCE Toolbox (dbm.neuro.uni-jena.de/tfce/). TFCE is a cluster-based thresholding method that aims to overcome the shortcomings of choosing an arbitrary cluster size (e.g., p < 0.001, cluster size k = 20) to form a threshold. The TFCE takes the raw statistics from the second-level analysis and performs a permutation-based non-parametric test (i.e., 5000 permutations in the current study) to obtain robust results. According to previous work on the direct value signal in the vmPFC (Bartra et al., 2013) and vicarious value of the social information in the ACC (e.g., Behrens et al., 2008; Boorman et al., 2013), we performed small volume corrections (SVC) using 10-mm search volumes around the peak MNI coordinates of the vmPFC and the ACC in the corresponding studies with the TFCE correction at p < 0.05, FWE (family-wise error) corrected. For the otherwise whole-brain analysis, we performed whole-brain TFCE correction at p < 0.05, FWE corrected.
Follow-up ROI analysis
Depending on the hypotheses, the research question, and the corresponding PM regressors, we employed two types of follow-up ROI analyses, the time series estimates and percent signal change (PSC) estimates. In both types of ROI analyses, participant-specific masks were created from the second-level contrast. For each participant, we first defined a 10-mm search volume around the peak coordinate of the second level contrast (threshold: p < 0.001, uncorrected); within this search volume, we then searched for each participant’s individual peak and created a new 10-mm sphere around this individual peak as the ROI mask. Finally, supra-threshold voxels in the new participant-specific ROI were used for the ROI analyses.
First, the ROI time series estimates were applied when at least two PMs were associated with each ROI. Namely, we were particularly interested in how the time series within a specific ROI correlated with all the PM regressors. In the current studies, we defined 3 ROIs to perform the time series estimates, the vmPFC, the ACC, and the VS/NAcc.
We followed the procedure established by previous studies (Behrens et al., 2008; Jocham et al., 2014; Klein et al., 2017) to perform the ROI time series estimates. We first extracted raw BOLD time series from the ROIs. The time series of each participant was then time-locked to the beginning of each trial with a duration of 30s, where the cue of the 1st choice was presented at 0s, the cue of the 1st bet was presented at 2.92s, the cue of the 2nd choice was displayed at 12.82s, the cue of the 2nd bet was displayed at 16.25s, and the outcome was presented at 21.71s. All these time points corresponded to the mean onsets for each cue across trials and participants. Afterward, time series were up-sampled to a resolution of 250ms (1/10 of TR) using 2D cubic spline interpolation, resulting in a data matrix of size m x n, where m is the number of trials, and n is the number of the up-sampled time points (i.e., 30s / 250ms = 120 time points). A linear regression model containing the PMs was then estimated at each time point (across trials) for each participant. It should be noted that, although the linear regression here took a similar formulation as the first-level GLM, it did not model any specific onset; instead, this regression was fitted at each time point in the entire trial across all the trials. The resulting time courses of effect sizes (regression coefficients) were finally averaged across participants. Because both the time series and the PMs were normalized, these time courses of effect sizes, in fact, reflected the partial correlation between the ROI time series and PMs.
To test group-level significance, we employed a permutation procedure. For the time sources of effect sizes for each ROI, we defined a time window of 3-7s after the corresponding event onset, during which the BOLD response was expected to peak. In this time window, we randomly flipped the signs of the time courses of effect sizes for 5000 repetitions to generate a null distribution, and asked whether the mean of the generated data from the permutation procedure was smaller or larger than 97.5% of the mean of the empirical data.
Second, the Percent signal change (PSC) estimates were applied when only one PM was associated with each ROI. Particularly, we asked whether there was a linear trend of the PSC for each ROI as a function of the PM. In the current study, we defined 7 ROIs to perform the PSC estimates. Among them, four ROIs were associated with the PM regressor of w.Nagainst, being the rTPJ, the ACC/pMFC, the right aINS and the FPC; two ROIs were associated with the PM regressor of SwSt, being the left dlPFC and the ACC; and one ROI was associated with the inverse contrast of SwSt (i.e., StSw, stay vs. switch), being the vmPFC.
To compute the PSC, we used the “rfxplot” toolbox (Gläscher, 2009) to extract the time series from the above ROIs. The “rfxplot” toolbox further divided the corresponding PMs into different bins (e.g., 2 bins, the 1st 50% of the PM and the 2nd 50% of the PM) and computed the PSC for each bin, which resulted in a p x q PSC matrix, where p is the number of participants, and q is the number of bins. To test for significance, we performed a simple 1st-order polynomial fit using the PSC as a function of the binned PM, and asked whether the slope of this polynomial fit was significantly different from zero.
Connectivity analysis
We employed two types of connectivity analyses (Friston, et al., 1997) in the current study, the psychophysiological interaction (PPI) and the physiophysiological interaction (PhiPI) to test the functional network using fMRI (O’Reilly et al., 2012).
The psychophysiological interaction (PPI) analysis aims to uncover how the functional connectivity between BOLD signals in a particular ROI (seed region) and BOLD signals in the (to-be-detected) target region(s) is modulated by a psychological variable. We used as a seeded the entire BOLD time series from a 10-mm spherical ROI in the rTPJ, centered at the peak coordinates for w.Nagainst (threshold: p < 0.001, uncorrected), which was detected at the onset cue of the second choice. Next, we constructed the PPI regressor by combining the rTPJ ROI signals with the SwSt variable that took place after the cue of the 2nd choice (Figure 4A-B). The first-level PPI design matrix consisted of three PPI regressors (the BOLD time series of the seed region, the modulating psychological variable, and their interaction) and all the same nuisance regressors as the above first-level GLMs. The first-level interaction regressor was then submitted to a second-level t-test to establish the group-level connectivity results, with TFCE correction p < 0.05, FWE corrected.
The Physiophysiological interaction (PhiPI) analysis follows the same principles as the PPI analysis, except that the psychological variable in the PPI regressors is replaced by the BOLD time series from a second seed ROI. We performed two PhiPI analyses. In the first PhiPI, we used as seeds the entire BOLD time series in two 10-mm spherical ROIs in the vmPFC and the ACC, both of which were detected at the cue of the 1st choice (Figure S6). In the second PhiPI, we seeded with the entire BOLD time series from an identical 10-mm spherical ROI in the rTPJ as described in our PPI, and from a 10-mm spherical ROI in the left dlPFC, which was detected at the cue of the 2nd choice (Figure 4D-F). The setup of the first-level PhiPI design matrix and the statistical test procedure on the second-level were the same as for the PPI analysis.
DATA AND SOFTWARE AVAILABILITY
Raw behavioral data and custom code to perform analyses can be accessed on the GitHub repository: https://github.com/lei-zhang/zhang_glaescher_socialinfluence.
AUTHOR CONTRIBUTIONS
J.G. conceived the research idea. L.Z. and J.G. designed and programmed experiments. L.Z. acquired data. L.Z. and J.G. designed computational models. L.Z. and J.G. performed analyses and wrote the paper. J.G. supervised the project.
DECLARATION OF INTERESTS
The authors declare no competing financial interests.
ACKNOWLEDGMENTS
We thank Anne Bert, Kiona Weisel, Julia Spilcke-Liss, Julia Majewski, and all radiographers for help with data acquisition; Nathaniel Daw for help in developing the computational models; and Christian Büchel for helpful feedback on earlier versions of the manuscript. J.G. was supported by the Bernstein Award for Computational Neuroscience (BMBF 01GQ1006), the Collaborative Research Center “Cross-modal learning” (DFG TRR 169), and the Collaborative Research in Computational Neuroscience (CRCNS) grant (BMBF 01GQ1603). L.Z. was supported by the International Research Training Groups “CINACS” (DFG GRK 1247), and the Research Promotion Fund (FFM) for young scientists of the University Medical Center Hamburg-Eppendorf.
Footnotes
↵3 Lead contact