Abstract
Real-world agents, such as humans, animals and robots, observe each other during interactions and choose their own actions taking the partners’ ongoing behaviour into account. Yet, classical game theory assumes that players act either strictly sequentially or strictly simultaneously (without knowing the choices of each other). To account for action visibility and provide a more realistic model of interactions under time constraints, we introduce a new game-theoretic setting called transparent game, where each player has a certain probability to observe the choice of the partner before deciding on its own action. Using evolutionary simulations, we demonstrate that even a small probability of seeing the partner’s choice before one’s own decision substantially changes evolutionary successful strategies. Action visibility enhances cooperation in a Bach-or-Stravinsky game, but disrupts cooperation in a more competitive iterated Prisoner’s Dilemma. In both games, strategies based on the “Win–stay, lose–shift” and “Tit-for-tat” principles are predominant for moderate transparency, while for high transparency strategies of “Leader-Follower” type emerge. Our results have implications for studies of human and animal social behaviour, especially for the analysis of dyadic and group interactions.
One of the most interesting questions in economics, biological, and social sciences is the emergence and maintenance of cooperation. A popular framework for studying cooperation (or the lack thereof) is Game Theory, which is frequently used to model interactions between “rational” decision-makers. In particular, a model for repeated interactions is provided by iterated games; two settings were previously used [1]:
Simultaneous games: players act at the same time without having any information about the current choice of the partners. Consequently, all players must make a decision under uncertainty concerning the choices of others.
Sequential games: players act in a certain order and the player acting later in the sequence is guaranteed to see the choices of the preceding players. Here the burden of uncertainty only applies to the first player or – if there are more than two players – becomes lighter with every turn in the sequence.
Both settings place a simplifying restriction on the decisional context: either all players have no information about the choices of the partners (simultaneous game), or some players always have more information than others (sequential game). This simplification might be disadvantageous for modelling certain behaviours, since humans and animals usually act neither strictly simultaneously nor sequentially, but observe the choices of each other and adjust their actions accordingly [2]. Indeed, the visibility of the partner’s actions plays a crucial role in social interactions, both in laboratory experiments [3–6] and in natural environments [7–11].
For example, in soccer the penalty kicker must decide where to place the ball and the goalkeeper must decide whether to jump to one of the sides or to stay in the centre. Since the ball reaches the target in 0.2-0.3 s [12], the goalkeeper cannot postpone the decision until the trajectory of the ball is clear, and must make the choice while opponent is preparing the shot. Thus, a simultaneous game could be used as a crude model for such interactions (see, for instance, [13, 14]). However, in practice, both players observe each other’s behaviour and try to anticipate the direction of the kick or of the goalkeeper’s jump from subtle preparatory cues [6]. Thanks to these observations, professional goalkeepers manage to use their tiny temporal advantage and predict the direction of the shot better than chance [12–14]. The advantage of a professional goalkeeper over an amateur kicker would result in even better prediction of the shooting direction. Similar considerations might apply to a wide range of interactions in real life; however, a framework for the treatment of such cases is missing in the classical game theory.
To better predict and explain the outcomes of interactions between agents by taking the visibility factor into account, we introduce the concept of transparent games, where players can monitor actions of each other. The access to the information about choices of other players is therefore probabilistic; in particular, for a game between two players at each round three cases are possible:
Player 1 knows the choice of Player 2 before making its own choice.
Player 2 knows the choice of Player 1 before making its own choice.
Neither of players knows the choice of the partner.
Which of these cases applies depends on the reaction times of the players. If they act nearly at the same time, neither is able to use the information about partner’s action; but a player who waits before making the choice has a higher probability to see the choice of the partner. Setting a time constraint (which is always present, either explicitly or implicitly, both in natural and in experimental situations) prevents players from waiting indefinitely for the partner’s choice. Then, given the reaction time distributions for the players, one can infer the probability of Player i to see the choice of the partner before making own choice.
Transparent games provide a general framework that also includes classical game-theoretical settings: simultaneous games correspond to , while sequential games result in for a fixed order of turns in each round (Player 1 always moves first, Player 2 – second), and in for a random order of turns.
The main question is whether the probabilistic access to the information in transparent games leads to the success of same or different behavioural strategies as compared to classic games. In other words, the possibility to see the choice of the partner on some occasions, to be observed by the partner on others, or to act under mutual uncertainty, may favour behaviours qualitatively different from those that yield the best performance in games with either full unidirectional transparency (sequential games) or with no transparency (simultaneous games).
To answer this question, here we study transparent versions of two classical two-player games: the iterated Prisoner’s dilemma (iPD) [15] and the iterated Bach-or-Stravinsky game (iBoS, also known as Battle of the Sexes and as Hero) [16]. We selected iPD and iBoS because they are counted among the most interesting games where cooperation is possible (non-zero-sum games) [16, 17], and because they require two distinct types of cooperative behaviour [18, 19]. While iPD is traditionally used for studying cooperation [15], iBoS is sometimes considered as a more suitable model [20, 21]. We employ evolutionary simulations, which allow evaluating optimal strategies using principles of natural selection, and consider memory-one strategies [22,23] that take into account own and partner’s choices at the previous round of the game.
We find that even a small probability of seeing the choice of the partner before one’s own decision changes the optimal behaviour in the iPD and iBoS games. The possibility to see the partner’s choice enhances cooperation in the generally cooperative iBoS, but disrupts cooperation in the more competitive iPD. Different transparency levels also bring qualitatively different strategies to success. In particular, we show that strategies based on the “Win–stay, lose–shift” and “Tit-for-tat” principles are the most successful in both games for low and moderate transparency, while for high transparency a new class of strategies, which we term “Leader-Follower” strategies, evolves. Although frequently observed in humans and animals (see, for instance, [24]), these strategies have up to now remained beyond the scope of game-theoretical studies, but naturally emerge in our transparent games frame-work.
Results
Evolutionary simulations for transparent games
We used evolutionary simulations [23] to investigate strategies evolving in transparent versions of iPD and iBoS. Payoff matrices for these games are shown in Fig. 1. In both games, evolution results in equal mean reaction times for all players (see “Methods” section). Then the probability psee to see the choice of the partner is equal for all players, which in a dyadic game results in psee ≤ 0.5.
We studied an infinite population of players using the methods described in [22, 23]. The population consists of “species” of players, each defined by a strategy vector si and frequency xi(t) in the population with . A strategy determines the probability of a player to choose one of two actions, A1 and A2 (corresponding to cooperation and defection in iPD or to insisting and accommodating in iBoS, respectively). For each species i the strategy is represented by a vector , where k enumerates the 12 different situations in which the player can be when making the choice. These depend on the outcome of the previous round, whether or not the player can see the current choice of the partner, and what the choice is if it is visible. The thus represent the conditional probabilities to select action A1, specifically
are probabilities to select A1 without seeing partner’s choice, given that in the previous round the joint choice of the player and the partner was A1A1, A1A2, A2A1, and A2A2 respectively;
are probabilities to select A1, seeing partner selecting A1 and given the outcome of the previous round (as before).
are probabilities to select A1, seeing partner selecting A2 and given the outcome of the previous round.
Probabilities to select A2 are represented by , respectively. To ensure numerical stability of the simulations, it is common to introduce a minimal possible error ε in the strategies such that , with ε = 0.001, see [22, 23]. The fact that players cannot have pure strategies and are prone to errors is also closely related to the “trembling hand” effect [22]. Note that in iPD no rational player would cooperate seeing that partner defects; thus we simplify iPD strategies by setting .
For every value of psee = 0.0, 0.1,…,0.5 we performed 80 runs of evolutionary simulations tracing 109 generations in each run. We began each run of simulations with five species having equal initial frequencies x1(1) = … = x5(1) = 0.2 and random strategies si. The frequency of the strategies xi(t) evolved in time according to the replicator dynamics equation (see “Methods” section). If xi(t) dropped below 0.001, the species was assumed to die out. On average every 100 generations new species with random strategies emerged in the population. Details of our simulations can be found in the “Methods” section.
Since the strategies in the evolutionary simulations were generated randomly, convergence to the theoretical optimum may take many generations and the observed successful strategies may deviate from the optimum. Therefore, we provide a coarse-grained description of strategies using the following notation: symbol 0 for , symbol 1 for , symbol * is used as a wildcard character to denote an arbitrary probability.
Let us exemplify this notation for the well-known strategies in the iPD. For instance, the Generous tit-for-tat (GTFT) strategy is encoded by (1a1b;1***;0000), where 0.1 < a, b < 0.9. Indeed, GTFT cooperates with cooperators and forgives defectors. To satisfy the first property, the probability to cooperate after the partner cooperated in previous round should be rather high, say above 0.9, thus the corresponding entries of the strategy are encoded by 1. To satisfy the second property, probability to cooperate after partner defected should be somewhere between zero and one with the optimal value 1/3 [22]. Since evolving towards this optimum may take long, we allow a broad range of values for and , for instance [0.1, 0.9]. We leave arbitrary since for low values of psee these entries have little influence on the strategy performance, meaning that their evolution towards optimal values may take especially long. Finally, as stated above, no sensible agent would cooperate in the iPD if aware that the partner is defecting, leading us to encode to by 0. Further we omit these predefined zero entries when referring to the iPD strategies. Thus, we encode GTFT by (1a1b;1***), where 0.1 < a, b < 0.9. The Always Defect strategy (AllD) is encoded by (0000;****), meaning that the probability to cooperate when not seeing partner’s choice is below 0.1, and behaviour when seeing partner’s choice in not specified. Win – stay, lose – shift (WSLS) is encoded by (1001;1***), and Firm-but-fair (FbF) by (101b;1***), where 0.1 < b < 0.9.
Transparency suppresses cooperation in Prisoner’s Dilemma
Simulation results for the transparent iPD are presented in Table 1. Most of the effective strategies were known from earlier studies on non-transparent games, but for high transparency (psee → 0.5) a new previously unknown strategy emerged. We dub this strategy “Leader-Follower” (L-F); theoretically it is represented by s = (1, 1, 1, 1; 0, 0, 0, 0), that is the player cooperates when it does not see the choice of the partner and defects otherwise. In the simultaneous iPD (psee = 0) L-F behaves as unconditional cooperator and is easily beaten, but it becomes predominant for psee = 0.5. In the latter case, when two L-F players meet, the player acting first (the Leader) makes a “self-sacrificing” decision to cooperate, while the second player (the Follower) sees this and defects (note that for the next round the roles of the individuals may switch, thus ensuring a certain balance when reaping the benefits of exploiting a sacrificial first move). We classified as L-F all strategies with profile (*11*; *00*) since behaviour after mutual cooperation or mutual defection is only relevant when L-F is playing against another strategy, and success for different types of behaviour depends on the composition of the population. For instance, (111*; 000*) is optimal in a cooperative population, while (*110; *000) is more robust against defectors. Note that L-F did not emerge for sequential iPD in [25–27], since in these studies, players were bound to the same strategy regardless of whether they made their choice before or after the partner. In contrast, transparent games allow different sub-strategies (s1, …, s4) and (s5, …, s8) for these situations.
Similar to the classic simultaneous iPD, WSLS was predominant in the transparent iPD for low and moderate psee, which is reflected by the clearly visible WSLS profiles in the final strategies of the population (Fig. 2). Note that GTFT, another successful strategy in the simultaneous iPD, disappeared completely for psee > 0. For psee ≥ 0.4, the game resembled the sequential iPD and the results changed accordingly. Similar to the sequential iPD [25–27], the frequency of WSLS waned, the FbF strategy emerged, cooperation became less frequent and took longer to establish itself (Fig. 3a). For psee = 0.5 the population was taken over either by L-F, WSLS or (rarely) by FbF, which is reflected by the mixed profile in Fig. 2. Pairwise comparison of strategies in iPD (Supplementary Fig. 1) helps to explain the superiority of WSLS for psee < 0.5, the disappearance of GTFT for psee > 0.0, and the drastic increase of L-F frequency for psee = 0.5.
For psee ≤ 0.3 cooperation evolved relatively quickly thanks to the predominance of WSLS. Fig. 3a shows that further increase of psee apparently undermined cooperation in iPD, this is why in the realistic iPD-prototype a face-to-face interrogation would be used. However, Leader-Follower is in a sense a cooperative strategy for iPD: it alternates between cooperation and defection instead of using a synchronized cooperation.
Cooperation emergence in the transparent Bach-or-Stravinsky game
Our simulations revealed that four memory-one strategies are most effective in iBoS for various levels of transparency. In contrast to iPD there exist only few studies of iBoS strategies, therefore we describe the observed strategies in detail.
Turn-taker aims to enter a fair coordination regime, where players alternate between IA (Player 1 insists and Player 2 accommodates) and AI (Player 1 accommodates and Player 2 insists) states. In the simultaneous iBoS, this strategy takes the form (q, 0, 1, q), where q = 5/8 guarantees maximal reward in a non-coordinated play against a partner with the same strategy for the payoff matrix in Fig. 1b. We classify as Turn-takers all strategies encoded by (*01*;*0**;**1*). Turn-taking was shown to be successful in the simultaneous iBoS for a finite population of agents with pure strategies (i.e., having 0 or 1 entries only, with no account for mistakes) and a memory spanning three previous rounds [19].
Challenger takes the form (1, 1, 0, 1) in the simultaneous iBoS. When two players with this strategy meet, they initiate a “challenge”: both insist until one of the players makes a mistake (that is, accommodates). Then, the player making the mistake (looser) submits and continues accommodating, while the winner continues insisting. This period of unfair coordination beneficial for the winner ends when the next mistake of either player (the winner accommodating or the loser insisting) triggers a new “challenge”. This strategy is encoded by (11b*;****;*1**) and has two variants: Challenger “obeys the rules” and does not initiate the challenge after losing (b ≤ 0.1), while Aggressive Challenger may switch to insisting (0.1 < b ≤ 1/3). Challenging strategies were theoretically predicted to be successful in simultaneous iBoS [28, 29].
The Leader-Follower (L-F) strategy s = (1, 1, 1, 1; 0, 0, 0, 0; 1, 1, 1, 1) was not considered previously. In a game between two players with this strategy, the faster player insists and the slower player accommodates. In simultaneous game, this strategy lapses into inefficient stubborn insisting since all players consider themselves leaders, but in transparent settings with high psee this strategy provides an effective and fair cooperation (because of the, on average, equal reaction times). When the whole population adopts an L-F strategy, most entries of the strategy vector become irrelevant since (i) only IA and AI states are visited and (ii) the faster player never accommodates. Therefore, we classify all strategies encoded by (*11*;*00*;****) as L-F.
Challenging Leader-Follower is simply a hybrid of Challenger and L-F strategies encoded by (11b*;0c0*;*1**), where b > 1/3, c ≤ 1/3.
The results of the simulations are presented in Table 2. The entries of the averaged over all runs strategy established in the population (Fig. 4) show considerably different profiles for various values of psee. Challengers, Turn-takers, and Leader-Followers succeeded for low, medium and high probabilities to see partner’s choice, respectively.
To provide additional insight into the results of the iBoS simulations, we studied how various strategies perform against each other (Supplementary Fig. 2). As with the iPD, this analysis helps to understand why different strategies were successful at different transparency levels.
In contrast to iPD, for iBoS high visibility results in a more effective cooperation, which is consistent with the notion that cooperation in the iBoS game rests on effective coordination (rather than trust in the good intentions of the partner). Indeed, for psee ≥ 0.3 non-cooperative Challengers no longer constituted the majority of the population. The break of cooperation at psee = 0.4 was caused by a transition from turn-taking to leader-following. Note that for psee = 0.5 cooperation thrives and is established much faster than for lower transparency (Fig. 3b) thanks to the Leader-Follower strategy.
Discussion
In this paper, we introduced the concept of transparent games which integrates the visibility of the partner’s actions into a game-theoretic settings. Specifically, we considered iterated dyadic games where players have probabilistic access to the information about the partner’s choice in the current round. When reaction times for both players are equal on average, the probability psee of accessing this information can vary from psee = 0.0 (corresponding to the canonical simultaneous games) to psee = 0.5 (corresponding to sequential games with random order of choices).
The value of psee strongly affects the evolutionary success of strategies. In particular, for the iterated Prisoner’s Dilemma (iPD) we have shown that for psee > 0 the Generous tit-for-tat strategy is unsuccessful and Win–stay, lose–shift becomes an unquestionable evolutionary winner. For psee = 0.5, a new strategy, Leader-Follower triumphs. In the Bach-or-Stravinsky game (iBoS) even moderate psee helps to establish cooperative turn-taking, while high psee again brings the Leader-Follower strategy to success.
Despite the clear differences between the two games, predominant strategies evolving in iPD and iBoS for various levels of transparency have some striking similarities. First of all, in both games, Leader-Follower appears to be the most successful strategy for high psee. This can be explained as follows: in a group where the behaviour of each agent is visible to the others and can be correctly interpreted, group actions hinge upon agents initiating these actions. The exact role of the initiators can vary: in some cases, these agents reap special benefits (for instance, dominant male baboons despotically initiate group movements to the foraging locations that are beneficial for themselves [30]), but in other cases they also carry the burden. Accordingly, in our study, Leaders enjoy maximal payoffs in the transparent iBoS game, but have to sacrifice their own payoff for the mutual success in the transparent iPD. Although counter-intuitive at first glance, the cooperativeness of Leaders in the L-F strategy corresponds to the behaviour of individuals that agree to do a necessary but risky or unpleasant job without immediate benefit. Examples include volunteering in human societies and acting as sentries in animal groups – watching out for predators while conspecifics forage for food [31, 32], see [33] for further examples. Note, however, that it is still debated how altruistic sentinel behaviour actually is [31, 32, 34, 35]. Such situations are formalized in game theory by a Volunteer’s Dilemma [33, 36, 37], but here we emphasize the aspect of visibility: the L-F strategy becomes dominant only when the probability that one of the players sees the choice of the other is close to one (that is for psee close to 0.5). Thus self-sacrificing behaviour is only useful when others can interpret and utilize it, which is the case both for sentry animals and for human volunteers. Our results for the transparent iPD demonstrate that altruistic behaviour for the sake of the species success may evolve in a population even without direct reciprocity.
For low and moderate values of psee the similarities of the two games are less obvious. However, the Challenger strategy in iBoS follows the same principle of “Win – stay, lose – shift” as the pre-dominant strategy WSLS in iPD, but with modified definitions of “win” and “lose”. For Challenger winning is associated with any outcome better than the minimal payoff corresponding to the mutual accommodation. Indeed, Challenger accommodates until mutual accommodation takes place and then switches to insisting. Such behaviour is described as “modest WSLS” in [29, 38] and is in-line with the interpretation of the “Win – stay, lose – shift” principle observed in animals [39].
The third successful principle in the transparent iPD is “Tit-for-tat”, embodied in Generous tit-for-tat (GTFT) and Firm-but-fair (FbF) strategies. This principle also works in both games since turn-taking in iBoS is nothing else but giving tit for tat. In particular, the FbF strategy, which occurs frequently in iPD for psee ≥ 0.4, is partially based on taking turns and is similar to the Turn-Taker strategy in iBoS. The same holds to a lesser extent for the GTFT strategy.
The success of specific strategies for different levels of psee makes sense if we understand psee as a species’ ability to signal intentions and to interpret these signals when trying to coordinate (or compete). The higher psee, the better (more probable) is the explicit coordination. This could mean that a high ability to explicitly coordinate actions leads to coordination based on observing the leader’s behaviour. In contrast, moderate coordination ability results in some form of turn-taking, while low ability leads to simple strategies of WSLS-type. In fact, an agent utilizing the WSLS principle does not even need to comprehend the existence of the second player, since WSLS “embodies an almost reflex-like response to the pay-off” [22]). The ability to cooperate may also depend on the circumstances, for example, on the physical visibility of partner’s actions. In a relatively clear situation, following the leader can be the best strategy. Moderate uncertainty requires some (implicit) rules of reciprocity embodied in turn-taking. High uncertainty makes coordination difficult or even impossible, and may result in a seemingly irrational “challenging behaviour” as we have shown for the transparent BoS. However, when players can succeed without coordination (which was the case in iPD), high uncertainty about the other players’ actions does not cause a problem.
By taking the visibility of agent’s actions into account, transparent games can provide a simple explanation for certain biological, sociological and psychological phenomena. Here, we illustrate the potential of this approach with two examples. The first concerns authoritarianism, a personality trait that manifests as uncritical acceptance of authority and is often associated with conformity. The most prominent example of how it manifests is the Milgram experiment [40]. In a series of studies presented as learning experiments, participants were tasked with punishing mistakes of another participant with increasingly painful electric shocks, under the premise of helping to learn more effectively. Some participants were willing to essentially electrocute the learner (who was a confidant of the experimenter) by applying shocks of up to 400 V. In one particular version of this experiment, where participants were urged to continue applying electro-shocks by a perceived authority figure (i.e., the experimenter), the proportion of participants who were willing to go to maximal voltage rose to about two thirds. Importantly, this conformity with authority occurred in a similar fashion across gender and ages, suggesting that is may be a universal human trait. Most people wonder why so many individuals show uncritical obedience to authorities, especially when considering how it can lead to unethical behaviour. The transparent iBoS results hint towards a provocative answer: a disposition for conformity might provide an evolutionary advantage because it allows for effective coordination. Thus, the sometimes extreme conformity observed in social psychology [41, 42], might – at least partially – rest in the evolutionary superiority of a Leader-Follower strategy.
Another application of transparent games is related to the burgeoning experimental research of social interactions, including the emergent field of social neuroscience that seeks to uncover the neural basis of social signalling and decision-making using neuroimaging and electrophysiology in humans and animals [43–46]. So far, most studies have focused on sequential [47, 48] or simultaneous games [49]. One of the main challenges in this field is extending these studies to direct real-time interactions that would entail a broad spectrum of dynamic competitive and cooperative behaviours. In line with this, several recent studies also considered direct social interactions in humans and non-human primates [3–5, 50–55] during dyadic games where players can monitor actions and outcomes of each other. Transparent games allow modelling the players’ access to social cues, which is essential for the analysis of experimental data in the studies of this kind [21]. This might be especially useful when behaviour is explicitly compared between “simultaneous” and “transparent” game settings, as in [3, 5, 50, 55]. In particular, the enhanced cooperation in the transparent iBoS for high psee provides a theoretical explanation for the empirical observations in [5], where humans playing an iBoS-type game demonstrated a higher level of cooperation and a fairer payoff distribution when they were able to observe the actions of the partner while making their own choice. In view of the argument that true cooperation should benefit from enhanced communication [21], the transparent iBoS can in certain cases be a more suitable model for studying cooperation than the iPD (see also [56,57] for a discussion of studying cooperation by means of iBoS-type games).
In summary, transparent games provide a theoretically attractive link between classical concepts of simultaneous and sequential games, as well as a computational tool for modelling real-world interactions. We thus expect that the transparent games framework can help to establish a deeper understanding of social behaviour in humans and animals.
Methods
Transparent games between two players
In this study, we focus on iterated two-player two-action games: in every round both players choose one of two possible actions and get a payoff depending on the mutual choice according to the payoff matrix (Fig. 1). A new game setting, transparent game, is defined by a payoff matrix and probabilities of Player i to see the choice of the other player, . Note that , and is the probability that neither of players knows the choice of the partner because they act sufficiently close in time so that neither players can infer the other’s action prior to making their own choice. The probabilities can be computed from the distributions of reaction times for the two players, as shown in Supplementary Fig. 3 for reaction times modelled by exponentially modified Gaussian distribution [58, 59]. In this figure, reaction times for both players have the same mean, which results in symmetric distribution of reaction time difference (Supplementary Fig. 3b) and . Here we focus only on this case since for both games considered in this study, unequal reaction times provide a strong advantage to one of the players (see below). However, in general
To illustrate how transparent, simultaneous and sequential games differ, let us consider three setups for an iterated Prisoner’s Dilemma (iPD):
If prisoners write their statements and put them into envelopes, this case is described by simultaneous iPD.
If prisoners are questioned in the same room in a random or pre-defined order, this case is described by sequential iPD.
Finally, in a case of a face-to-face interrogation where prisoners are allowed to answer the questions of prosecutors in any order (or even to talk simultaneously) the transparent iPD comes into play. Here prisoners are able to monitor each other and interpret inclinations of the partner in order to adjust their own choice accordingly.
While the transparent setting can be used both in zero-sum and non-zero-sum games, here we concentrate on the latter class where players can cooperate to increase their joint payoff. For the purposes of this work, we define cooperation simply as joint actions towards mutually beneficially outcomes. In various areas more specific definitions of cooperation are used (see, for example, [7,21] for a discussion of cooperation in animals). We consider the transparent versions of two classical games, the iPD and the iterated Bach-or-Stravinsky game (iBoS). We have selected iPD and iBoS as representatives of two distinct types of symmetric non-zero-sum games [18, 19]: maximal joint payoff is awarded when players select the same action (cooperate) in iPD, but different actions in iBoS (one insists, and the other accommodates). The games of iPD type are known as synchronization games; other examples of synchronization games include Stag Hunt and Game of Chicken [19]. Games with two optimal mutual choices are called alternation games [18,19]; as one of these choices is more beneficial for Player 1, and the other for Player 2, to achieve fair cooperation players should alternate between these two states.
Another important difference between the two considered games is that in iBoS it is better to act before the partner, while in iPD – after the partner. Indeed, in iPD defection is less beneficial if it can be discovered by the opponent. Meanwhile in iBoS the player acting first has good chances to get the maximal payoff of 4 by insisting: when the second player knows that the partner insists, it is better to accommodate and get a payoff of 3, than to insist and get 2. Therefore, the optimal behaviour in iPD is to wait as long as possible, while in iBoS a player should react as quickly as possible. Consequently, evolution in these games favours species with marginal mean reaction times: maximal allowed reaction time in iPD and minimal allowed reaction time in iBoS. Species with different behaviour are easily invaded. Therefore we assumed in all simulations that the mean reaction times are constant, that is is the same for all species and all players have equal chances to see the choices of each other.
Evolutionary simulations for transparent games
For our evolutionary simulations we adopt the methods described in [22, 23]. Consider an infinite population of players evolving in generations. For any generation t = 1, 2, … the population consists of n(t) “species” defined by their strategies and their frequencies xi(t) in the population, . Besides, the probability of a player from species i to see the choice of a partner from species j is given by (in our case for all species i and j, but in this section we use the general notation).
Consider a player from species i playing an infinitely long iterated game against a player from species j. Since both players use memory-one strategies, this game can be formalized as a Markov chain with states being the mutual choices of the two players and a transition matrix M given by where the matrices M0, M1 and M2 describe the cases when neither player sees the choice of the partner, Player 1 sees the choice of the partner before making own choice, and Player 2 sees the choice of the partner, respectively. These matrices are given by
The gain of species i when playing against species j is given by the expected payoff Eij, defined by where Pij are the entries of the payoff matrix (P11 = 3, P12 = 0, P21 = 5, P22 = 1 for iPD and P11 = 2, P12 = 4, P21 = 3, P22 = 1 for iBoS, see Fig. 1), and y1, y2, y3, y4 represent the probabilities of getting to the states associated with the corresponding payoffs by playing si against sj. This vector is computed as a unique left-hand eigenvector of matrix M associated with eigenvalue one [23]:
The evolutionary success of species i is encoded by its fitness fi(t): if species i has higher fitness than the average fitness of the population then xi(t) increases with time, otherwise xi(t) decreases and the species is dying out. This evolutionary process is formalized by the replicator dynamics equation, which in discrete time takes the form
The fitness fi(t) is computed as the expected payoff for a player of species i when playing with a random player from the current population: where Eij is given by (2).
Each run of simulations starts with five species having equal initial frequencies: n(1) = 5, x1(1) =… = x5(1) = 0.2. Following [22], probabilities with k = 1, …, 12 for these species are randomly drawn from the distribution with U-shaped probability density: for y ∈ (0, 1). Additionally, we require , where ε = 0.001 accounts for the minimal possible error in the strategies [22]. The frequencies of strategies xi(t) change according to the replicator dynamics equation (3). If xi(t) < ∊, the species is assumed to die out and is removed from the population; we follow [22,23] in taking E = 0.001. Occasionally (every 100 generations on average), new species emerge in the population. The strategies for the new species are drawn from (4) and the initial frequencies are set to xi(t0) = 1.1∊ [22].
Evolutionary dynamics of two strategies
To provide an example of evolutionary dynamics and introduce some useful notation, we consider a population consisting of two species playing iPD with strategies: s1 = (1−ε, ε, ε, 1−ε; 1−ε, ε, ε, 1−ε), s2 = (ε, ε, ε, ε; ε, ε, ε, ε) (recall that for iPD ) and initial conditions x1(1) = x2(1) = 0.5. That is, the first species plays WSLS, and the second uses AllD. We set . Note that since and , it holds . Given psee we can compute a transition matrix of the game using (1) and then calculate the expected payoffs for all possible pairs of players ij using (2). For instance, for psee = 0 we have
This means that a player from the WSLS-species on average gets a payoff E11 = 2.995 when playing against a conspecific partner, and only E12 = 0.504, when playing against an AllD-player. The fitness for each species is given by
Since f2(t) > f1(t) for any 0 < x1(t), x2(t) < 1, the AllD-players take over the whole population after several generations. Dynamics of the species frequencies xi(t) computed using (3) shows that this is indeed the case (Fig. 5a). Note that since E21 > E11 and E22 > E12, AllD is garanteed to win over WSLS for any initial frequency of WSLS-players x1(1). In this case one says that AllD dominates WSLS and can invade it for any x1(1).
As we increase psee, the population dynamics changes. While for psee = 0.2 AllD still takes over the population, for psee = 0.4 WSLS wins (Fig. 5a). This can be explained by computing the expected payoff for psee = 0.4:
Hence f1(t) > f2(t) for 0 ≤ x2(t) ≤ 0.5 ≤ x1(t) ≤ 0, which explains the observed dynamics. Note that here E11 > E21, while E12 < E22, that is a conspecific partner wins more than a partner from a different species when playing against WSLS- and AllD-players alike. In this case one says that WSLS and AllD are bistable and there is an unstable equilibrium fraction of WSLS players given by
We call hi an invasion threshold for species i, since it takes over the whole population for xi(t) > hi, but dies out for xi(t) < hi. To illustrate this concept, we plot in Fig. 5b the invasion threshold h1 for WSLS species playing against AllD as a function of psee.
One more possible type of two-species dynamics is coexistence, which takes place when E11 < E21, E12 > E22, that is when playing against a player from any of the species is less beneficial for a conspecific partner than for a partner from a different species. In this case the fraction of a species given by (5) corresponds to a stable equilibrium meaning that the frequency of the first species x1(t) increases for x1(t) < h1, but decreases for x1(t) > h1. We refer to [23] for more details.
Data availability
The empirical datasets generated during the current study and the source code used for this are available from the corresponding author on reasonable request.
Contributions
A.U. conceived the original idea and performed simulations with the help and advice of S.E. and F.W.; T.S., I.K. and S.M. contributed to the interpretation of the results. All authors contributed to writing and revision of the manuscript.
Competing interests
The authors declare no competing financial interests.
Acknowledgements
We acknowledge funding from the Ministry for Science and Education of Lower Saxony and the Volks-wagen Foundation through the program “Niedersächsisches Vorab”. Additional support was provided by the Leibniz Association through funding for the Leibniz ScienceCampus Primate Cognition and the Max Planck Society.
References
- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].↵
- [6].↵
- [7].↵
- [8].
- [9].
- [10].
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].
- [45].
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].
- [52].
- [53].
- [54].
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵