Abstract
Why groups of individuals sometimes exhibit collective ‘wisdom’ and other times maladaptive ‘herding’ is an enduring conundrum. Here we show that this conflict is regulated by the social learning strategies deployed. We examined the patterns of human social learning through an interactive online experiment with 699 participants, varying both task uncertainty and group size, then used hierarchical Bayesian model-fitting to identify the individual learning strategies exhibited by participants. Challenging tasks elicit greater conformity amongst individuals, with rates of copying increasing with group size, leading to high probabilities of maladaptive herding amongst large groups confronted with uncertainty. Conversely, the reduced social learning of small groups, and the greater probability that social information would be accurate for less-challenging tasks, generated ‘wisdom of the crowd’ effects in other circumstances. Our model-based approach provides novel evidence that the likelihood of swarm intelligence versus herding can be predicted, resolving a longstanding puzzle in the literature.
Understanding the mechanisms that account for accurate collective decision-making amongst groups of animals has been a central focus of animal behaviour research (Bonabeau et al., 1999; Camazine et al., 2001; Sumpter, 2010). There are a large number of biological examples showing that collectives of poorly informed individuals can achieve a high performance in solving cognitive problems under uncertainty (Krause et al., 2010). Examples of such ‘swarm intelligence’ - the emergent wisdom of interactive crowds - have been found in a broad range of biological systems (Table 1). Although these findings suggest fundamental cognitive benefits of grouping (Krause and Ruxton, 2002), there is also a long-standing recognition, especially for humans, that interacting individuals may sometimes be overwhelmed by the ‘extraordinary popular delusions and madness of crowds’ (Mackay, 1841). Herd behaviour (i.e. an alignment of thoughts or behaviours of individuals in a group) occurs because individuals imitate each others (Kameda and Hastie, 2015; Le Bon, 1896; Raafat et al., 2009), and it is thought to be a cause of financial bubbles (Chari and Kehoe, 2004; Mackay, 1841), ‘groupthink’ (Janis, 1972) and volatility in cultural markets (Muchnik et al., 2013; Salganik et al., 2006). More generally, herding is known to undermine the wisdom of crowds effect (Lorenz et al., 2011), whilst maladaptive aspects of information transfer are well-recognised in the biological literature (e.g. Giraldeau et al., 2002). It seems that information transmission among individuals, and making decisions collectively, is a double-edged sword: combining decisions may provide the benefits of swarm intelligence, but at the same time, increase the risk of maladaptive herding. Collectively, an understanding of whether and, if so, how it is possible to prevent or reduce the risk of maladaptive herd behaviour, while concurrently keeping or enhancing swarm intelligence, is largely lacking.
A balance between using individual and social information may play a key role in determining the trade-off between collective wisdom and maladaptive herding (List et al., 2009). If individuals are too reliant on copying others’ behaviour, any ideas, even a maladaptive one, can propagate in the social group (i.e. the ‘informational cascade’; Bikhchandani et al., 1992; Giraldeau et al., 2002; Richerson and Boyd, 2005). On the other hand, however, if individuals completely ignore social information so as to be independent, they will fail to exploit the benefits of aggregating information through social interactions. The extent to which individuals should use social information should fall between these two extremes. Theoretical models predict that the balance between independence and interdependence in collective decision-making may be changeable, contingent upon the individual-level flexibility and inter-individual variability associated with the social learning strategies deployed in diverse environmental states (e.g. Arbilly et al., 2011; Boyd and Richerson, 1985; Feldman et al., 1996; Laland, 2004).
Animals (including humans) are reported to increase their use of social information as returns from asocial learning become more unreliable (e.g. Kameda and Nakanishi, 2002; Kendal et al., 2004; Morgan et al., 2012; Toyokawa et al., 2017; Webster and Laland, 2008, 2011). In addition, individuals are predicted to be more likely to rely on social learning larger the number of individuals that share information (Boyd and Richerson, 1989; Bond, 2005; Kline and Boyd, 2010; Morgan et al., 2012; Muthukrishna et al., 2014; Street et al., 2017). This selectivity in the predicted use of social information may have a substantial impact on collective decision-making because only a slight difference in the parameter values of social information use is known to be able to alter qualitatively the collective behavioural dynamics (e.g. Bonabeau et al., 1999; Camazine et al., 2001; Nicolis and Deneubourg, 1999; Pratt and Sumpter, 2006). Therefore, researchers should expect populations to exhibit a higher risk of being trapped with maladaptive behaviour with increasing group size and decreasing reliability of asocial learning (and concomitant increased reliance on social learning).
From the viewpoint of the classic wisdom of crowds theory, increasing group size may increase collective accuracy (List, 2004; King and Cowlishaw, 2007; Wolf et al., 2013; Becker et al., 2017; Laan et al., 2017). The relative advantage of the collective over solitary individuals may also be highlighted by increased task difficulty, because there would be more room in the performance to be improved compared to easier tasks in which high accuracy can already be achieved by asocial learning only (Cronin, 2016). To understand the potential conflict between swarm intelligence and the risk of maladaptive herding requires fine-grained quantitative studies of social learning strategies and their relations to collective dynamics, linked to sophisticated computational analysis.
The aims of this study were twofold. First, we set to test the hypothesis that the circumstances under which collective decision making will generate ‘wisdom’ can be predicted with knowledge of the precise learning strategies individuals deploy, through a combination of experimentation and theoretical modelling. The choice of an abstract decision-making task allowed us to implement a computational modelling approach, which has been increasingly deployed in quantitative studies of animal social learning strategies (Ahn et al., 2014; Aplin et al., 2017; Barrett et al., 2017; McElreath et al., 2005,2008; Toyokawa et al., 2017). In particular, computational modelling allowed us to conduct a parametric description of different information-gathering processes and to estimate these parameter values at an individual-level resolution. This approach allows us to characterize the complex relationship between individual-level decision, learning strategies and collective-level behavioural dynamics. Second, we added resolution to our analyses by manipulating both task uncertainty and group size in our experiments with adult human subjects, predicting that these factors would induce heavier use of social information in humans, and thereby alter the balance between swarm intelligence and the risk of maladaptive herding. To do this, we focused on human groups exposed to a simple gambling task, where both asocial and social sources of information were available. Through development of an interactive, web-based collective decision-making task (i.e. multi-player multi-armed bandit), and use of hierarchical Bayesian statistical methods in fitting our computational model to the experimental data, we identify the individual-level learning strategies of participants as well as quantify variation in different learning parameters, allowing us to conduct an informed exploration of the population-level outcomes. The results provide clear evidence that the conflict between swarm intelligence and maladaptive herding can be predicted with knowledge of human social learning strategies.
Below, we firstly described our experimental task and summarise the computational model. Then, we deploy agent-based simulation to illustrate how the model parameters relating to social learning can in principle affect the collective-level behavioural dynamics. The simulation provides us with precise, quantitative predictions on the complex relationship between individual behaviour and group dynamics. Finally, we present the findings of a multi-player web-based experiment with human participants that utilises the gambling task framework. Applying a hierarchical Bayesian statistical method, we estimated the model’s parameters for each of 699 different individuals, allowing us to (i) examine whether and, if so, how social information use is affected by different group size and task uncertainty, and (ii) whether and how social-information use affects the balance between swarm intelligence and maladaptive herding.
Task overview: To study the relationship between social information use and collective behavioural dynamics, we focused on a well-established learning-and-decision problem called a ‘multi-armed bandit’ task, represented here as repeated choices between three slot machines (Figure S1, Video 1, for detail see Materials and methods). Individuals play the task for 70 rounds. The slots paid off money noisily, varying around two different means during the first 40 rounds such that there was one ‘good’ slot and two other options giving poorer average returns. From the round 41st, however, one of the ‘poor’ slots abruptly increased its mean payoff to become ‘excellent’ (i.e. superior to ‘good’). The purpose of this environmental change was to observe the effects of maladaptive herding by potentially trapping groups in the out-of-date suboptimal (good) slot, as individuals did not know whether or how an environmental change would occur. Through making choices and earning a reward from each choice, individuals could gradually learn which slot generated the highest rewards.
In addition to this asocial learning, we provided social information for each member of the group specifying the frequency with which group members chose each slot. All group members played the same task with the same conditions simultaneously, and all individuals had been instructed that this was the case, and hence understood that the social information would be informative.
Task uncertainty was experimentally manipulated by changing the difference between the mean payoffs for the slot machines. In the task with the least uncertainty, the distribution of payoffs barely overlapped, whilst in the task with the greatest uncertainty the distribution of payoffs overlapped considerably (Figure S3).
Overview of the computational learning-and-decision-making model: We modelled individual behavioural processes by assuming that individual i makes a choice for option m at round t, in accordance with the choice-probability Pi,t(m) that is a weighted average of social and asocial influences: where σi,t is the social learning weight (0 ≤ σi,t ≤ 1).
For the social influence, we assumed a frequency-dependent copying strategy by which an individual copies others’ behaviour in accordance with the distribution of social frequency information (McElreath et al., 2005, 2008; Aplin et al., 2017; Barrett et al., 2017): where frequencym,t−1 is a number of choices made by other individuals for the option m in the preceding round t − 1 (t ≥ 2). The exponent θi is individual i’s conformity exponent (−∞ ≤ θi ≤ +∞). When this exponent is larger than zero (θi > 0), higher social influence is afforded to an option chosen by more individuals (i.e. positive frequency bias), with conformity bias arising when θi > 1, such that disproportionally more social influence is given to the most common option (Boyd and Richerson, 1985). When θi < 0, on the other hand, higher social influence is afforded to the option that fewest individuals chose in the preceding round t − 1 (i.e. negative frequency bias). Note, there is no social influence when θi = 0 because in this case the ‘social influence’ favours an uniformly random choice, i.e., , independent of the social frequency distribution.
For the asocial influence, we used a standard ‘softmax’ choice rule well-established in the reinforcement-learning literature (Sutton and Barto, 1998) and widely applied in human social learning studies (e.g. McElreath et al., 2005, 2008; Toyokawa et al., 2017).
In summary, the model has two key social learning parameters, the social learning weight σi,t and the conformity exponent θi, with σi,t a time-dependent variable (i.e. individuals could modify their reliance on social learning as the task proceeded). Varying these parameters systematically, we conducted an individual-based simulation so as to establish quantitative predictions concerning the relationship between social information use and collective behaviour. We then fitted this model to our experimental data using a hierarchical Bayesian approach. This method allows us to specify with precision how each individual subject learns (i.e. which learning strategy or strategies they deploy), and thereby to describe the range and distribution of learning strategies deployed across the sample, and to investigate their population-level consequences.
1 Results
1.1 The relationship between social information use and the collective behaviour
Figure 1 shows the relationship between the average decision accuracy and individual-level social information use obtained from our individual-based model simulations. Figure 1a and 1c show that individuals in larger groups perform better both before and after the environmental change when the mean conformity exponent is small (i.e. ). In the absence of conformity, even when the average social learning weight is very high (i.e. ), larger groups are still able to recover the decision accuracy after the location of the optimal option has been switched.
On the other hand, when the mean conformity exponent is large (i.e. ; strong conformity bias), the group dynamics become less flexible, and become vulnerable to getting stuck on a suboptimal option after environmental change. Here, the recovery of performance after environmental change takes more time in larger compared to smaller groups (Figure 1b). When both the conformity exponent and the social learning weight are large (Figure 1d), performance is no longer monotonically improving with increasing group size, and it is under these circumstances that the strong herding effect becomes prominent. Figure 2c and 2d indicate that when both and are large the collective choices converged either on the good option or on one of the poor options almost randomly, regardless of the option’s quality, and that once individuals start converging on an option the population gets stuck. As a result, the distribution of the groups’ average performance over the replications becomes a bimodal ‘U-shape’. Interestingly, however, the maladaptive herding effect remains relatively weak in smaller groups (see Figure 2c; the black histograms). This is because the majority of individuals in smaller groups (i.e. two individuals out of three) are more likely to break the cultural inertia by simultaneously exploring for another option than the majority in larger groups (e.g. six out of ten). As expected, herding does not occur in the absence of conformity (Figure 2a, 2b).
In summary, the model simulation suggests an interaction between social learning weight and conformity exponent on decision accuracy and the risk of maladaptive herding: When the conformity exponent is not too large, swarm intelligence is prominent across a broad range of the mean social learning weights (i.e. increasing group size can increase decision accuracy while concurrently retaining decision flexibility). When the conformity bias becomes large, however,the risk of maladaptive herding arises, and, when both social learning parameters are large, swarm intelligence is rare and maladaptive herding dominates.
1.2 Estimation of human social information use
Table 2 reveals how the social learning weight σi,t and conformity exponent θi were influenced by task uncertainty in our behavioral experiment. It gives posterior estimation values for each of the global means of the learning model parameters, obtained by the hierarchical Bayesian model fitting method applied to the experimental data (see the Materials and methods). The fitted global variance parameters (i.e. v) are shown in the Supporting Table S1.
We were able to categorize the participants as deploying three different learning strategies based on their fitted conformity exponent values; namely, the ‘positive frequency-dependent copying’ strategy (θi ≫ 0), the ‘negative-frequency dependent copying’ strategy (θi ≪ 0) and the ‘random choice’ strategy (θi ≈ 0). Note that we could not reliably detect the ‘weak positive’ frequency-dependent strategy (0 < θi ≤ 1) due to the limitation of statistical power (Figure S10 and S17). Some individuals whose ‘true’ conformity exponent fell between zero and one would have been categorised as exhibiting a random choice strategy (Figure S10). Individuals identified as exhibiting a positive frequency-dependent copiers were mainly those whose conformity exponent was larger than one (θi > 1).
Figure 3a-c show the estimated frequencies of different learning strategies. Generally speaking, participants were more likely to utilize a positive frequency-dependent copying strategy than the other two strategies (the 95% Bayesian CI of the intercept of the GLMM predicting the probability to use the positive frequency-dependent copying strategy is above zero, [1.05, 2.50]; Table S2). We found that positive frequency-dependent copying decreased with increasing task uncertainty (the 95% Bayesian CI of task uncertainty effect is below zero, [−1.88, −0.25]; Table S2). We found no clear effects of either the group size, age or gender on adoption of the positive frequency-dependent copying strategy, except for the negative interaction effect between age and task uncertainty (the 95% Bayesian CI of the age × uncertainty interaction = [−1.46, −0.15]; Table S2).
We also investigated the effects of group size and task uncertainty on the fitted individual parameter values. We found that the individual mean social learning weight parameter (i.e. ) increased with group size (the 95% Bayesian CI = [0.15, 0.93]; Figure 3d-f; Table S3), and decreased with uncertainty (the 95% Bayesian CI = [−0.98, −0.22]), and age of subject (the 95% Bayesian CI = [−0.36, −0.02]). However, the negative effects of task uncertainty and age disappeared when we focused only on of the positive frequency-dependent copying individuals, and only the positive effect of the group size was confirmed (Table S4; Figure S16). It is worth noting that the meaning of the social learning weight is different between these three different strategies: The social learning weight regulates positive reactions to the majorities’ behaviour for positive frequency-dependent copiers, whereas it regulates avoidance of the majority for negative-frequency dependent copiers, and determines the probability of random decisionmaking for the random choice strategists.
The individual conformity exponent parameter θi increased with task uncertainty (the 95% Bayesian CI = [0.38, 1.41]), but we found no significant effects of group size, age, gender or interactions (Figure 3g-i; Table S5). These results were qualitatively unchanged when we focused only on the positive frequency-dependent copying individuals (Table S6; Figure S16).
We observed extensive individual variation in social information use. The greater the task’s uncertainty, the larger were individual variances in both the mean social learning weight and the conformity exponent (the 95% Bayesian CI of the GLMM’s variation parameter for was [1.11, 1.62] (Table S3) and for θi was [1.07, 1.54] (Table S5)). This was confirmed when focusing only on the positive frequency-dependent copying individuals: The Bayesian 95% CIs were [1.14, 1.80] (Table S4) and [0.71, 1.10] (Table S6), respectively.
The manner in which individual variation in social-information use of positive frequency-dependent copying individuals changes over time is visualised in Figure 4a-c. The social learning weights generally decreased with experimental round. However, some individuals in the Moderate- and the High-uncertain conditions accelerated rather than decreased their reliance on social learning over time. Interestingly, those accelerating individuals tended to have a larger conformity exponent (Figure S18). In addition, the time-dependent θi,t in our alternative model generally increased with experimental round in the Moderate- and the High-uncertainty conditions (see the appendix; Figure S26), although the fitting of θi,t in the alternative model was relatively unreliable (Figure S20). These findings suggest that conformists tended to use asocial learning at the outset but increasingly started to conform as the task proceeded.
Extensive variation in the temporal dynamics of the social learning weight σi,t was also found for the negative-frequency dependent copying individuals but not found for the random choice individuals (Figure S14). Individuals deploying a random choice strategy exhibited a σi,t that approached to zero, indicating that their decision-making increasingly relied exclusively on asocial reinforcement learning as the task proceeded.
No significant fixed effects were found in other asocial learning parameters such as the learning rate αi and the mean inverse temperature (Table S7, Table S8 and Figure S15).
In summary, our experiments on adult humans revealed asymmetric influences of increasing task uncertainty and increasing group size on the social learning parameters. The conformity exponent increased with task uncertainty on average but the proportion of positive frequency-dependent copying individuals showed a corresponding decrease, due to the extensive individual variation emerging in the High-uncertain condition. Conversely, group size had a positive effect on the mean social learning weight, but did not affect conformity (Figure 3, 4a-c).
1.3 A balance between the collective decision accuracy and the herding effect
Figure 4d-f show the change over time in performance with different group sizes and different uncertainty conditions, generated by the post-hoc simulations of the parameter-fitted model. The mean decision accuracies of the experimental groups are shown in the inner windows. Because the post-hoc simulations were run for 5,000 replications for each group size, which should generate more robust pattern than the raw experimental data basing only on a limited number of experimental replications, and given the correspondence between simulations and data, below we concentrate our interpretation on the simulated results.
Prior to the environmental change (Round 1 to 40), larger groups performed better on average than did both smaller groups and lone individuals across all the uncertainty levels, suggesting swarm intelligence was operating. However, after the environmental change (i.e. from Round 41) performance differed between the conditions. In the Low-uncertain condition, where we found that the participants were most likely to have a relatively weak positive frequency-dependence (i.e. ), large groups again made more accurate decisions than small groups (Figure 4d, from Round 41). However, in the Moderate- and the High-uncertain condition, where we found that participants were most likely to have strong positive frequency dependence ( and 2.67, c.f. 1.65 in the Low-uncertainty condition), the large groups seemed to get stuck on the suboptimal option after the change (Figure 4e and 4f, from Round 41), although the decision accuracy did not substantially differ with group size in the High-uncertain condition.
Lone individuals in the Low-uncertain condition recovered performance more quickly than did both the small and large groups even though the lone individuals performed worse in the first-half of the task (Figure 4d), suggesting that asocial learners are more capable of detecting the environmental change than individuals in groups. This might be due to the higher exploration rate of lone individuals (both and μϵ of solitary individuals were smaller than those of grouping individuals; Table 2).
Overall, the pattern of results was broadly consistent with our predictions (Figure 1). We confirmed that in the Low-uncertainty condition, where individuals have weaker positive frequency bias, larger groups were more accurate than smaller groups while retaining flexibility in their decision-making (i.e. swarm intelligence dominates). However, in the Moderate- and the High-uncertain conditions, larger groups performed better prior to environmental change but were vulnerable to getting stuck with an out-dated maladaptive option due to the larger estimated conformity exponent, thereby generating the conflict between swarm intelligence and maladaptive herding.
2 Discussion
We investigated whether and how human social learning strategies regulate the conflict between swarm intelligence and herding behaviour using a collective learning-and-decision-making task combined with simulation and model fitting. We examined whether manipulating the reliability of asocial learning and group size would affect the use of social information, and thereby alter the collective decision dynamics, as suggested by our computational model simulation. Although a theoretical study has suggested that reliance on social learning and conformity bias would play a role in collective dynamics (Kandler and Laland, 2013), thus far no empirical studies have quantitatively investigated the population-level consequences of these two different social learning processes. Our high-resolution, model-based behavioural analysis using a hierarchical Bayesian statistics enabled us to identify individual-level patterns and variation of different learning parameters and to explore their population-level outcomes. The results provide strong support for our hypothesis that the conflict between the swarm intelligence effect and maladaptive herding can be predicted with knowledge of human social learning strategies.
Consistent with previous empirical findings (e.g., Morgan et al., 2012; Muthukrishna et al., 2014), adult human participants were increasingly likely to make a conformity-biased choice as the uncertainty of the task went up (i.e. as it became more difficult to determine the best option. Figure 3g-i). The fitted global mean values of the conformity exponent parameters were 3.0 and 2.7 in the Moderate- and the High-uncertain conditions, respectively (Table 2), and these values were sufficiently high to cause larger populations to get stuck on a suboptimal option following environmental change (Figure 1b; Figure 4e, 4f). Conversely, in the Low-uncertain condition individuals exhibited relatively weak conformity (i.e. ), allowing larger groups to escape the suboptimal option, and retain their swarm intelligence (Figure 1a; Figure 4d). Although the social learning weight was also found to be contingent upon the environmental factors, the estimated mean value was (Figure 3d-f; Figure S14). This implies a weaker social than asocial influence on decision-making as reported in several other experimental studies (e.g. Efferson et al., 2008; McElreath et al., 2005; Mesoudi, 2011; Toyokawa et al., 2017). Thanks to this relatively weak reliance of social learning, the kind of herding that would have blindly led a group to any option regardless of its quality (like the ‘symmetry breaking’ known in social insect collective foraging systems. Figure 2c,d; Camazine et al., 2001; Sumpter, 2010), did not occur. Research that explores the factors that can induce higher social learning weights in humans, in order to understand under which circumstances herd behaviour would dominate, would be valuable.
Individual differences in exploration might also play a crucial role in shaping collective decision dynamics. Although a majority of participants adopted a positive frequency-dependent copying strategy, some individuals exhibited negative frequency dependent or random decisionmaking strategy (Figure 3a-c). It is worth noting that the random choice strategy was associated with more exploration than the other strategies, because it led to an almost random choice at a rate σi irrespective of the options’ quality. In addition, negative-frequency dependent copying individuals could also be highly exploratory. These individuals tended to avoid choosing an option upon which the other people had converged and would explore the other two ‘unpopular’ options. Interestingly, in the High-uncertain condition the mean social learning weights of the negative-frequency dependent copying individuals were larger than that of the other two strategies (, Figure S14), indicating that these individuals engaged in such majority-avoiding exploration relatively frequently. Such high exploratory tendencies would prevent individuals from converging on a better option, leading to a diminishing of swarm intelligence in high-uncertainty circumstances (Figure 4f).
Individual differences have received increasing attention in both collective behaviour and animal social learning studies (e.g. Jolles et al., 2018; Michelena et al., 2010; Planas-sitja et al., 2015), and across the human behavioural sciences (e.g. Gray et al., 2017; Mesoudi et al., 2016). Our finding that the effects of individual variation depend on uncertainty implies that human subjects’ use of social learning strategies is deployed plastically, and is not a fixed propensity (i.e. personality trait), that differs rigidly between individuals (Dingemanse et al., 2010; Toyokawa et al., 2017). Our approach of combining with individual-based simulation and experimentation could potentially prove a powerful tool with which to explore decision-making in other animals.
Another methodological advantage of using computational models to study social learning strategies is its explicitness of assumptions about the temporal dynamics of behaviour. It has been argued that just observing the final frequencies of learned behaviour does not provide enough information to determine what asocial and/or social learning processes might have been used because multiple learning-and-decision mechanisms are equally likely to produce the same population-level patterns (Barrett, 2018; Hoppitt and Laland, 2013). For example, very exploitative asocial reinforcement learners (i.e. exploitation parameter βi,t is large and the social learning weight σi,t is nearly zero) and conformity-biased social learners (conformity exponent θi is large and σi,t is positive) would eventually converge on the same option, resulting in the same final behavioural steady state. However, how they explored the environment, as well as how they reacted to the other individuals in the same group, are significantly different and they could produce qualitatively different collective temporal dynamics. A time-depth perspective is crucially important in order to model the relationship between individual behavioural mechanisms and group behavioural dynamics (Biro et al., 2016).
The Internet-based experimentation allowed us to conduct a real-time interactive behavioural task with larger subject pools than a conventional laboratory-based experiment. This enabled us not only to quantify the individual-level learning-and-decision processes (e.g. Ahn et al., 2014; Daw et al., 2006) but also to map these individual-level processes on to the larger-scale collective behaviour (Raafat et al., 2009; Salganik et al., 2006; Sumpter, 2010). Although there are always questions about the validity of participants’ behaviour when deploying the web-based method, we believe that the computational modelling approach coupled with higher statistical power due to the large sample size, compensates for any drawbacks. The fact that our learning model could approximate the participants’ decision trajectories effectively suggest that most of the participants engaged seriously with solving the task. An increasing body of evidence supports the argument that web-based behavioural experiments are as reliable as results from the laboratory (e.g. Dandurand et al., 2008; Hergueux and Jacquemet, 2015).
The diverse effects of social influence on the collective wisdom of a group has been drawing substantial attention (e.g. Becker et al., 2017; Jayles et al., 2017; Lorenz et al., 2011; Lorge et al., 1958; Muchnik et al., 2013). The bulk of this literature, including many jury models and election models (Hastie and Kameda, 2005; List, 2004), has focused primarily on the static estimation problem, where the ‘truth’ is fixed from the outset. However, in reality, there are many situations under which the state of the true value is changing over time so that monitoring and tracking the pattern of change is a crucial determinant of decision performance (Payzan-Lenestour and Bossaerts, 2011). In such temporally dynamic environments, decision-making and learning are coordinated to affect future behavioural outcomes recursively (Sutton and Barto, 1998). Our experimental task provides a simple vehicle for exploring collective intelligence in a dynamic situation, which encompasses this learning-and-decision-making feedback loop. Potentially, integrating the wisdom of crowds with social learning and collective dynamics research will facilitate the more tractable use of swarm intelligence in a temporary changing world.
In summary, a powerful combination of experimentation and theoretical modeling sheds new light on when groups of individuals will exhibit the wisdom of the crowds and when maladaptive herding. Our analysis implies that herding is most likely amongst individuals in large groups exposed to challenging tasks. That is because challenging tasks lead to greater uncertainty and thereby elicit greater conformist learning amongst individuals, whilst rates of copying increase with group size. Difficult tasks, by definition, render identification of the optimal behavior more challenging, allowing groups sometimes to converge on maladaptive outcomes. Conversely, the reduced conformity levels of individuals in small groups, and the greater probability that social information would be accurate for less-challenging tasks, generated ‘wisdom of the crowd’ effects in most other circumstances. Our findings provide clear evidence that the conflict between swarm intelligence and maladaptive herding can be predicted with knowledge of human social learning strategies.
3 Material and methods
3.1 Computational learning-and-decision model
We modelled a learning and decision process based on standard reinforcement-learning theory (Sutton and Barto, 1998). Following previous empirical studies of social learning strategies in humans (e.g. McElreath et al., 2005, 2008; Toyokawa et al., 2017), our model consists of two steps. First, an individual i updates the estimated average reward associated with an option m at round t, namely Q-value (Qi,t(m)), according to the Rescorla-Wagner rule (Trimmer et al., 2012) as follows: where αi (0 ≤ αi ≤ 1) is a learning rate parameter of individual i determining the weight given to new experience and ri,t(m) is the amount of monetary reward obtained from choosing the option m in round t. is the binary action-indicator function of individual i, given by Therefore, Qi,t(m) is updated only when the option m was chosen; when the option m was not chosen, Qi,t(m) is not updated (i.e. Qi,t+1(m) = Qi,t(m)). Note that, only in the first round t = 1, all Q-values are updated by using the chosen option’s reward ri,1(m), so that the individual can set a naive ‘intuition’ about the magnitude of reward values she/he would expect to earn from a choice in the task; namely, Qi,t=2(1) = Qi,t=2(2) = Qi,t=2(3) = αi,ri,t=1(m). In practical terms, this prevents the model from being overly sensitive to the first experience. Before the first choice, individuals had no prior preference for either option (i.e. Qi,1(1) = Qi,1(2) = Qi,1(3) = 0).
Second, a choice is made for an option m by individual i at the choice probability Pi,t(m) that is determined by a weighted average of social and asocial influences: where σi,t is the social learning weight (0 ≤ σi,t ≤ 1), and Si,t(m) and Ai,t(m) are social and asocial influences on the choice probability, respectively (0 ≤ Si,t(m) ≤ 1 and 0 ≤ Ai,t(m) ≤ 1). Note that the sum of choice probabilities, the sum of social influences and the sum of asocial influences are all equal to 1; namely, ∑k∈options Pi,t(k) = 1,∑k Si,t(k) = 1 and ∑kAi,t(k) = 1.
As for the asocial influence Ai,t, we assumed the so-called softmax (or logit choice) function,which is widely used in the reinforcement-learning literature: where βi,t, called inverse temperature, manipulates individual i’s sensitivity to the Q-values (in other words, controlling the proneness to explore). As βi,t goes to zero, asocial influence approximates to a random choice (i.e. highly explorative). Conversely, if βi,t → +∞, the asocial influence leads to a deterministic choice in favour of the option with the highest Q-value (i.e. highly exploitative). For intermediate values of βi,t, individual i exhibits a balance between exploration and exploitation (Daw et al., 2006; Toyokawa et al., 2017). We allowed for the possibility that the balance between exploration-exploitation could change as the task proceeds. To depict such time dependence in exploration, we used the equation: . If the slope ϵi is positive (negative), asocial influence Ai,t becomes more and more exploitative (explorative) as round t increases. For a model fitting purpose, the time-dependent term ϵit is scaled by the total round number 70.
We modelled the social influence (i.e. the frequency-dependent copying) on the probability that individual i chooses option m at round t as follows (McElreath et al., 2005,2008; Aplin et al., 2017; Barrett et al., 2017): where Ft−1(m) is a number of choices made by other individuals (excluding her/his own choice) for the option m in the preceding round t − 1 (t ≥ 2). θi is individual i’s conformity exponent, −∞ ≤ θi ≤ +∞. When this exponent is larger than zero, higher social influence is given to an option which was chosen by more individuals (i.e. positive frequency bias). When θi < 0, on the other hand, higher social influence is given to an option that fewer individuals chose in the preceding round t − 1 (i.e. negative frequency bias). To implement the negative frequency dependence, we added a small number 0.1 to F so that an option chosen by no one (i.e. Ft−1 = 0) could provide the highest social influence when θi < 0. Note, there is no social influence when θi = 0 because in this case the ‘social influence’ favours an uniformly random choice, Si,t(m) = 1/(1 + 1 + 1) = 1/3, independent of the social frequency distribution. Note also that, in the first round t = 1, we assumed that the choice is only determined by the asocial softmax function because there is no social information available.
We considered that the social learning weight σi,t could change over time as assumed in the inverse temperature βi,t. To let σi,t satisfy the constraint 0 < σi,t < 1, we used the following sigmoidal function: If the slope δi is positive (negative), the social influence increases (decreases) over time. We set the social learning weight equal to zero when group size is one (i.e. when an individual participated in the task alone and/or when ∑k∈options Ft−1(k) = 0).
We modelled both the inverse temperature βi,t and the social learning weight σi,t as a time function since otherwise it would be challenging to distinguish different patterns of learning in this social learning task (Barrett, 2018). The parameter recovery test confirmed that we were able to differentiate such processes under these assumptions (Figure S8-S12). While we also considered the possibility of the conformity exponent being time-dependent (i.e. ), the parameter recovery test suggested that the individual slope parameter γi was not reliably recovered (Figure S20 and S21), and hence we concentrated our analysis on the time-independent θi model. We confirmed that instead using the alternative model where both social learning parameters were time-dependent (i.e. σi,t and θi,t) did not qualitatively change our results (Figure S25 and S26).
In summary, the model has six free parameters that were estimated for each individual human participant; namely, αi, , ϵi, , δi and θi. To fit the model, we used a hierarchical Bayesian method (HBM), estimating the global means (μα, , μϵ, , μδ, and μϵ) and the global variations (vα, , vϵ, , vδ, and vθ) for each of the three experimental conditions (i.e. the Low-, Moderate- and High-uncertain condition), which govern overall distributions of individual parameter values. It has become recognised that the HBM can provide more robust and reliable parameter estimation than conventional maximum likelihood point estimation in complex cognitive models (e.g. Ahn et al., 2014), a conclusion with which our parameter recovery test agreed (Figure S10-S12).
3.2 Agent-based model simulation
We ran a series of individual-based model simulations assuming that a group of individuals play our three-armed bandit task (under the Moderate-uncertainty condition) and that individuals behave in accordance with the computational learning-and-decision model. We varied the group size (n ∈ {3,10,30}), the mean social learning weight and the mean conformity exponent , running 10,000 replications for each of the possible parameter × group size combinations. As for the other parameter values (e.g. the asocial reinforcement learning parameters; α, , ϵ), here we used the experimentally fitted global means (Table 2 and Table S1). Relaxation of this assumption (i.e. using a different set of asocial learning parameters) does not qualitatively change our story (e.g. Figure S4-S7). Note that each individual’s parameter values were randomly drawn from the distributions centred by the global mean parameter values fixed to each simulation run. Therefore, the actual composition of individual parameter values were different between individuals even within the same social group.
3.3 Participants in the online experiment
A total of 755 subjects (354 females, 377 males, 2 others and 22 unspecified; mean age (1 s.d.) = 34.33 (10.9)) participated in our incetivised economic behavioural experiment (Figure S2). The experimental sessions were conducted in December 2015 and in January 2016. We excluded subjects who disconnected to the online task before completing at least the first 30 rounds from our learning model fitting analysis, resulted in 699 subjects (573 subjects entered the group (i.e. n ≥ 2) condition and 126 entered the solitary (i.e. n =1) condition). The task was advertised using Amazon’s Mechanical Turk (AMT; https://www.mturk.com; see Video S1; Video S2), so that the participants could enter anonymously through their own internet browser window. Upon connecting to the experimental game web page, the participants might be required to wait on other participants at the virtual ‘waiting room’ for up to 5 minutes or until the requisite number of participants arrived, whichever was sooner, before the task starts. The participants were payed 25 cents for a show-up fee plus a waiting-bonus at a rate of 12 cents per minute (i.e. pro rata to 7.2 USD per hour) and a game bonus (mean ± 1s.d. = 1.7 ± 0.79 USD) depending on their performance in the task. The total time, including net time spent in the waiting room, tended to be less than 10 minutes.
3.4 The online three-armed bandit task
The participants performed a three-armed bandit task for 70 rounds. Each round started with the choice stage at which three slot machines appeared on the screen (Figure S1; Video 1). Participants chose a slot by clicking the mouse pointer (or tapping it if they used a tablet computer). Participants had a maximum of 8 seconds to make their choices. If no choice was made during the choice stage, a ‘TIME OUT’ message appeared in the centre of the screen without a monetary reward (average number of missed rounds per participant was 0.18 out of 70 rounds). Participants were able to know the rest of the choice time by seeing a ‘count-down bar’ shown at the top of the experimental screen.
Each option yielded monetary rewards randomly drawn from a normal probability distribution unique to each slot, rounded up to the next integer, or truncated to zero if it would have been a negative value (Figure S3). The standard deviations of the probabilistic payoff distributions were identical for all slots and did not change during the task (the s.d. = 0.55; although it actually was slightly smaller than 0.55 due to the zero-truncation). The mean values of the probabilistic payoff were different between the options. ‘Poor’, ‘good’ and ‘excellent’ slots generated the lowest, intermediate and the highest rewards on average, respectively. In the first 40 rounds, there were two poor and one good options. After the round 40th, one of the poor option abruptly changed to an excellent option (i.e. environmental change), and from the 41st round there were poor, good and excellent options.
Once all the participants in the group made a choice (or had been time-outed), they proceeded to the feedback stage in which they could see their own payoff from the current choice for two seconds (‘0’ was shown if they had been time-outed), while they could not see others’ reward values. After this feedback stage, subjects proceeded to the next round’s choice stage. From the second round, a distribution of choices made by all participants in the group at the preceding round (i.e. the social frequency information) was shown below each slot.
Before the task started, participants had read an illustrated instruction which told them that they would play 70 rounds of the task, that the payoff would be randomly generated per choice but associated with a probability distribution unique to each slot machine, i.e. the profitability of the slot might be different from each other, that the environment might change during the task so that the mean payoff from the slots might secretly change during the task, and that their total payout were decided based on the sum of all earnings they achieved in the task. We also explicitly informed subjects that all participants in the same group played the identical task so that they could infer that the social information was informative. However, we did not specify either the true mean payoff values associated with each option, or when and how the mean payoff would actually change. After reading these instructions, participants proceeded to a ‘tutorial task’ without any monetary reward and without the social frequency information, so as to become familiar with the task.
After they completed the behavioural task or were excluded from the task due to a bad internet connection or due to opening another browser window during the task (see the ‘Reducing the risk of cheating’ section in the appendix), subjects proceeded to a brief questionnaire page asking about demographic information, which were skippable. Finally, the result screen was shown, informing the total monetary reward she/he earned as well as a confirmation code unique for each participant. Participants could get monetary reward through AMT by inputting the confirmation code into the form at the AMT’s task page.
3.5 Manipulating the group size and uncertainty
To manipulate the size of each group, we varied the capacity of the waiting room from 10 to 30. Because the task was being advertised on the Worker website at AMT for approximately 2 hours, some participants occasionally arrived after the earlier groups had already started. In that case the participant entered the newly opened waiting room which was open for the next 5 minutes. The number of participants arriving declined with time because newly posted alternative tasks were advertised on the top of the task list, which decreased our task’s visibility. This meant that a later-starting session tended to begin before reaching maximum room capacity, resulting in the smaller group size. Therefore, the actual size differed between groups.
To investigate the effect of the task uncertainty, we manipulated the closeness of each option’s mean payoff value, setting three different conditions in a between-group design. The three conditions were: Low-uncertainty condition (differences between mean payoffs were 1.264; N = 113), Moderate-uncertainty condition (differences between mean payoffs were 0.742; N = 132) and High-uncertainty condition (differences between mean payoffs were 0.3; N = 454). The mean payoff associated with the ‘excellent’ slot in all three conditions was fixed to 3.1 cents (Figure S3). These conditions were randomly assigned for each experimental session. However, we recruited more participants in the High-uncertainty condition compared to the other two because we expected that larger group sizes would be needed to generate the collective wisdom in noisier environments.
3.6 Statistical analysis
We used a hierarchical Bayesian method (HBM) to estimate the free parameters of our statistical models, including the computational learning-and-decision-making model. The HBM allows us to estimate individual differences, while ensures these individual variations are bounded by the group-level global parameters. The HBM was performed under Stan 2.16.2 (http://mc-stan.org) in R 3.4.1 (https://www.r-project.org) software. The models contained at least 4 parallel chains and we confirmed convergence of the MCMC using both the Gelman-Rubin statistics and the effective sample sizes. Full details of the model fitting procedure and prior assumptions are shown in the appendix.
3.6.1 Parameter recovery test
To check the validity of our model-fitting method, we conducted a ‘parameter recovery test’ so as to examine how well our model fitting procedure had been able to reveal true individual parameter values. To do this, we generated synthetic data by running a simulation with the empirically fitted global parameter values, and then re-fitted the model with this synthetic data using the same procedure. The parameter recovery test showed that the all true global parameter values were fallen into the 95% Bayesian credible interval (Figure S8), and at least 93% of the true individual parameter values were correctly recovered (i.e. 96% of αi, 93% of , 95% of ϵi 97% of , 96% of δi and 97% of θi values were fallen into the 95% Bayesian CI. Figure S9-S12).
3.6.2 Categorisation of individual learning strategies
Based on the 50% CI of the individual conformity exponent parameter values θi, we divided the participants into the following three different social learning strategies. If her/his 50% CI of θi fell above zero (θlower > 0), below zero (θupper < 0) or including zero (θlower ≤ 0 ≤ θupper), she/he was categorised as a ‘positive frequency-dependent copier’, a ‘negative frequency-dependent copier’, or a ‘random choice individual’, respectively. We used the 50% Bayesian CI to conduct this categorisation instead of using the more conservative 95% CI because the latter would cause much higher rates of ‘false negatives’, by which an individual who applied either a positive frequency-dependent copying or a negative-frequency dependent copying strategy was falsely labelled as an asocial random choice individual (Figure S10d). Four hundred agents out of 572 (≈ 70%) were falsely categorised as a random choice learner in the recovery test when we used the 95% criterion (Figure S10d). On the other hand, the 50% CI criterion seemed to be much better in terms of the false negative rate which was only 18.5% (i.e. 106 agents), although it might be slightly worse in terms of ‘false positives’: Thirty-seven agents (6.5%) were falsely labelled as either a positive frequency-dependent copier or a negative-frequency dependent copier by the 50% CI, whereas the false positive rate of the 95% CI was only 0.2% (Figure S10e). To balance the risk of false positives and false negatives, we decided to use the 50% CI which seemed to have more strategy detecting power.
3.6.3 Generalised linear mixed models
To examine whether increasing group size and increasing task uncertainty affected individual use of the positive frequency-dependent copying strategy, we used a hierarchical Bayesian logistic regression model with a random effect of groups. The dependent valuable was whether the participant used the positive frequency-dependent copying (1) or not (0). The model includes fixed effects of group size (standardised), task uncertainty (0: Low, 0.5: Moderate, 1: High), age (standardised), gender (0: male, 1: female, NA: others or unspecified), and possible two-way interactions between these fixed effects.
We also investigated the effects of both group size and the task’s uncertainty on the fitted values of the learning parameters. We used a hierarchical Bayesian gaussian regression model predicting the individual fitted parameter values. The model includes effects of group size (standardised), task uncertainty (0: Low, 0.5: Moderate, 1: High), age (standardised), gender (0: male, 1: female, NA: others or unspecified), and two-way interactions between these fixed effects. We assumed that the variance of the individual parameter values might be contingent upon task uncertainty because we had found in the computational model-fitting result that the fitted global variance parameters (i.e. , vδ and vθ) were larger in more uncertain conditions (Table S1).
3.6.4 Post-hoc model simulation for Figure 4d-f
So as to evaluate how accurately our model can generate observed decision pattern in our task setting, we ran a series of individual-based model simulation using the fitted individual parameter values (i.e. means of the individual posterior distributions) for each group size for each uncertainty condition. At the first step of the simulation, we assigned a set of fitted parameters of a randomly-chosen experimental subject from the same group size and the same uncertain condition to an simulated agent, until the number of agents reaches the simulated group size. We allowed duplicate choice of experimental subject in this parameter assignment. At the second step, we let this synthetic group of agents play the bandit task. We repeated these steps 5,000 times for each group size, task uncertainty.
3.7 Code and data availability
The browser based online task was built by Node.js (https://nodejs.org/en/) and socket.io (https://socket.io), and the code are available on a GitHub repository (https://github.com/WataruToyokawa/MultiPlayerThreeArmedBanditGame). Analyses were conducted in R (https://www.r-project.org) and simulations of individual based models were conducted in Mathematica (https://www.wolfram.com), both of which including data are available on an online repository (https://github.com/WataruToyokawa/ToyokawaWhalenLaland2018).
4 Ethics statement
This study was approved by University of St Andrews (BL10808).
5 Competing interest
We have no competing interest.
6 Authors’ contributions
WT, AW and KNL planned the study and built the computational model. WT ran simulations. WT and AW made the experimental material, ran the web-base experiment, and collected the experimental data. WT, AW and KNL analysed the data and wrote the manuscript.
Funding
This research was supported by The John Templeton Foundation (KNL; 40128) and Suntory Foundation Research Support (WT; 2015-311). WT was supported by JSPS Overseas Research Fellowships (H27-11).
Footnotes
↵† e-mail address: wt25{at}st-andrews.ac.uk