Abstract
The robustness of scholarly peer review has been challenged by evidence of disparities in publication outcomes based on author gender and nationality. To address this, we examined the peer review outcomes of 23,876 initial submissions and 7,192 full submissions that were submitted to the biosciences journal eLife between 2012 and 2017. Women and authors from nations outside of North America and Europe were underrepresented both as gatekeepers (editors and peer reviewers) and authors. We found evidence of a homophilic relationship between the demographics of the gatekeepers and authors in determining the outcome of peer review; that is, gatekeepers favored manuscripts from authors of the same gender and from the same country. The acceptance rate for manuscripts with male last authors was higher than for female last authors, and this gender inequity was greatest when the team of reviewers was all male; mixed-gender gatekeeper teams lead to more equitable peer review outcomes. Homogeny between the country affiliation of the gatekeeper and the corresponding author also lend to improved acceptance rates for many countries. We conclude with a discussion of mechanisms that could contribute to this effect, directions for future research, and policy implications. Code and anonymized data have been made available at https://github.com/murrayds/elife-analysis
Author summary Peer review, the primary method by which scientific work is evaluated and developed, is ideally a fair and equitable process in which scientific work is judged solely on its own merit. However, the integrity of peer review has been called into question based on evidence that outcomes often differ between between male and female authors, and for authors in different countries. We investigated such a disparity at the biosciences journal eLife by analyzing the demographics of authors and gatekeepers (editors and peer reviewers), and peer review outcomes of all submissions between 2012 and 2017. We found evidence of disparity in outcomes that favored women and those affiliated within North America and Europe, and that these groups were over-represented among authors and gatekeepers. The gender disparity was greatest when reviewers were all male; mixed-gender reviewer teams lead to more equitable outcomes. Similarly, for some countries manuscripts were more likely to be accepted when reviewed by a gatekeeper from the same country as the author. Our results indicate that author and gatekeeper characteristics are associated with the outcomes of scientific peer review. We discuss mechanisms that could contribute to this effect, directions for future research, and policy implications.
Introduction
Peer review is foundational to the development, gatekeeping, and dissemination of research, while also underpinning the professional hierarchies of academia. Normatively, peer review is expected to follow the ideal of “universalism” [1], whereby scholarship is judged solely on its intellectual merit. However, confidence in the extent to which peer review accomplishes the goal of promoting the best scholarship has been eroded by questions about whether social biases [2], based on or correlated with the characteristics of the scholar, could also influence outcomes of peer review [3–5]. This challenge to the integrity of peer review has prompted an increasing number of funding agencies and journals to assess the disparities and potential influence of bias in their peer review processes.
Several terms are often conflated in the discussion of bias in peer review. We use the term disparities to refer to unequal composition between groups, inequities to characterize unequal outcomes, and bias to refer to the degree of impartiality in judgment. Disparities and inequities have been widely studied in scientific publishing, most notably in regards to gender and country of affiliation. Globally, women account for about only 30 percent of scientific authorship [6] and are underrepresented in the scientific workforce, even when compared to the pool of earned degrees [7, 8]. Articles authored by women are most underrepresented in the most prestigious and high-profile scientific journals [9–14]. Moreover, developed countries dominate the production of highly-cited publications [15, 16].
The under-representation of authors from certain groups may reflect differences in submission rates, or it may reflect differences in success rates during peer review (percent of submissions accepted). Analyses of success rates have yielded mixed results in terms of the presence and magnitude of such inequities. Some analyses have found lower success rates for female-authored papers [17, 18] and grant applications [19, 20], while other studies have found no gender differences in review outcomes (for examples, see [21–25]). Inequities in journal success rates based on authors’ nationalities have also been documented, with reports that authors from English-speaking and scientifically-advanced countries have higher success rates [26, 27]; however, other studies found no evidence that the language or country of affiliation of an author influences peer review outcomes [27–29]. These inconsistencies could be explained by several factors, such as the contextual characteristics of the studies (e.g., country, discipline) and variations in research design and sample size.
The nature of bias and its contribution to inequities in scientific publishing is highly controversial. Implicit bias—the macro-level social and cultural stereotypes that can subtly influence everyday interpersonal judgments and thereby produce and perpetuate status inequalities and hierarchies [30, 31]—has been suggested as a possible mechanism to explain differences in peer review outcomes based on socio-demographic and professional characteristics [3]. When faced with uncertainty—which is quite common in peer review—people often weight the social status and other ascriptive characteristics of others to help make decisions [32]. Hence, scholars are more likely to consider particularistic characteristics (e.g., gender, institutional prestige) of an author under conditions of uncertainty [33, 34], such as at the frontier of new scientific knowledge [35]. However, given the demographic stratification of scholars within institutions and across countries, it can be difficult to pinpoint the nature of a potential bias. For example, women are underrepresented in prestigious educational institutions [36–38], which conflates gender and prestige biases. These institutional differences can be compounded by gendered differences in age, professional seniority, research topic, and access to top mentors [39]. Another potential source of bias is what [40] is dubbed cognitive particularism, whereby scholars harbor preferences for work and ideas similar to their own [41]. Evidence of this process has been reported in peer review in the reciprocity (i.e., correspondences between patterns of recommendations received by authors and patterns of recommendations given by reviewers in the same social group) between authors and reviewers of the same race and gender [42] (see also [43, 44]). Reciprocity can exacerbate or mitigate existing inequalities in science. If the work and ideas favored by gatekeepers are unevenly distributed across author demographics, this could be conducive to Matthew Effects [1], whereby scholars accrue accumulative advantages via a priori status privileges. Consistent with this, inclusion of more female reviewers was reported to attenuate biases that favor men in the awarding of Health RO1 grants at the National Institute of Health [18]. However, an inverse relationship was found by [45] in the evaluation of candidates for professorships: when female evaluators were present, male evaluators became less favorable toward female candidates. Thus the nature and potential impact of cognitive biases during peer review are multiple and complex.
Another challenge is to disentangle the contribution of bias during peer review from factors external to the review process that could influence success rates. For example, there are gendered differences in access to funding, domestic responsibilities, and cultural expectations of career preferences and ability [46, 47] that may adversely impact manuscript preparation and submission. On the other hand, women have been found to hold themselves to higher standards [48] and be less likely to compete [49], hence they may self-select a higher quality of work for submission to prestigious journals. At the country level, disparities in peer review outcomes could reflect structural factors related to a nation’s scientific investment [15, 50], publication incentives [51, 52], local challenges [53], and research culture [54], all of which could influence the actual and perceived quality of submissions from different nations. There are also several intersectional issues: there are, for example, differences in sociodemographic characteristics across countries—e.g., more women from some countries and disproportionately less professionally-senior women in others [6]. Because multiple factors external to the peer review process can influence peer review outcomes, unequal success rates for authors with particular characteristics do not necessarily reflect bias in the peer review process itself; conversely, equal success rates do not necessarily reflect a lack of bias.
Here, we use an alternative approach to assess the extent to which gender and national disparities manifest in peer review outcomes at eLife—an open-access journal in the life and biomedical sciences. In particular, we study the extent to which the magnitude of these disparities vary across different gender and national compositions of gatekeeper teams, focusing on the notion of homophily between the reviewers and authors. Peer review at eLife differs from other traditional forms of peer review used in the life sciences in that it is done through deliberation between reviewers (usually three in total) on an online platform. Previous studies have shown that deliberative scientific evaluation is influenced by social dynamics between evaluators [55, 56]. We examine how such social dynamics manifest in eLife’s deliberative peer review by assessing the extent to which the composition of reviewer teams relates to peer review outcomes. Using all research papers (Research Articles, Short Reports, and Tools and Resources) submitted between 2012 and 2017 (n=23,876), we investigate the extent to which a relationship emerges between the gender and nationality of authors (first, last, and corresponding) and gatekeepers (editors and invited peer reviewers), extending the approach used by [2]. Inequity in success rates could result from a variety of factors unrelated to the peer review process (e.g., authors from certain groups having more funding). Such external factors should yield peer review outcome inequities that are consistent, regardless of who is conducting the peer review. In contrast, if inequities based on author characteristics vary based on the demographic characteristics of the reviewers, this would suggest potential bias in the peer review process.
Consultative peer review and eLife
Founded in 2012 by the Howard Hughes Medical Institute (United States), the Max Planck Society (Germany), and the Wellcome Trust (United Kingdom), eLife is an open-access journal that publishes research in the life and biomedical sciences. Manuscripts submitted to eLife progress through several stages. In the first stage, the manuscript is assessed by a Senior Editor, who may confer with one or more Reviewing Editors and decide whether to reject the manuscript or encourage the authors to provide a full submission. When a full manuscript is submitted, the Reviewing Editor recruits a small number of peer reviewers (typically two or three) to write reports on the manuscript. The Reviewing Editor is encouraged to serve as one of the peer reviewers. When all individual reports have been submitted, both the Reviewing Editor and peer reviewers discuss the manuscript and their reports using a private online discussion system hosted by eLife. At this stage the identities of the Reviewing Editor and peer reviewers are known to one another. If the consensus of this group is to reject the manuscript, all the reports are usually sent to the authors. If the consensus is that the manuscript requires revision, the Reviewing Editor and additional peer reviewers agree on the essential points that need to be addressed before the paper can be accepted. In this case, a decision letter outlining these points is sent to the authors (the original reports are not usually released in their entirety to the authors). When a manuscript is accepted, the decision letter and the authors’ response are published along with the manuscript. The name of the Reviewing Editor is also published. Peer reviewers can also choose to have their name published. This process has been referred to as consultative peer review (see [57, 58] for a more in-depth description of the eLife peer-review process).
Data and methods
Data
Metadata for research papers submitted to eLife between its inception in 2012 and mid-September, 2017 (n=23,876) were provided to us by eLife for analysis. As such, these data were considered a convenience sample. Submissions fell into three main categories: 20,948 Research Articles (87.7 percent), 2,186 Short Reports (9.2 percent), and 742 Tools and Resources (3.1 percent). Not included in this total were six Scientific Correspondence articles, which were excluded because they followed a distinct and separate review process. Each record potentially listed four submissions—an initial submission, full submission, and up to two revision submissions (though in some cases manuscripts remained in revision even after two revised submissions). Fig 1 depicts the flow of all 23,876 manuscripts through each review stage. The majority, 70.0 percent, of initial submissions for which a decision was made were rejected. Only 7,111 manuscripts were encouraged to submit a full submission. A total of 7,192 manuscripts were submitted as a full submission; this number was slightly larger than encouraged initial submissions due to appeals of initial decisions and other special circumstances. Most full submissions, 52.4 percent (n = 3,767), received a decision of revise, while 43.9 percent (n = 3,154) were rejected. A small number of full submissions (n = 54) were accepted without any revisions. On average, full submissions that were ultimately accepted underwent 1.23 revisions and, within our dataset, 3,426 full submissions were eventually accepted to be published. A breakdown of the number of revisions requested before a final decision was made, by gender and nationality of the last author, is provided in S1 Fig. On the date that data were collected (mid-September, 2017), a portion of initial submission (n = 147) and full submissions (n = 602) remained in various stages of processing and deliberation (without final decisions). Another portion of initial and full submissions (n = 619) appealed their decision, causing some movement from decisions of “Reject” to decisions of “Accept” or “Revise”; counts of revisions by the gender of author and gatekeepers is shown in S2 Fig.
The review process at eLife is highly selective, and became more selective over time. Fig 2 shows that while the total count of manuscripts submitted to eLife has rapidly increased since the journal’s inception, the count of encouraged initial submissions and accepted full submissions has grown more slowly. The encourage rate (percentage of initial submissions encouraged to submit full manuscripts) was 44.6 percent in 2012, and dropped to 26.6 percent in 2016. The overall acceptance rate (percentage of initial submissions eventually accepted) began at 27.0 percent in 2012 and decreased to 14.0 percent in 2016. The acceptance rate (the percentage of accepted full submissions) was 62.4 percent in 2012 and decreased to 53.0 percent in 2016. While only garnering 307 submissions in 2012, eLife accrued 8,061 submissions in 2016.
In addition to authorship data, we obtained information about the gatekeepers involved in the processing of each submission. In our study, we defined gatekeepers as any Senior Editor or Reviewing Editor at eLife or invited peer reviewer involved in the review of at least one initial or full submission between 2012 and mid-September 2017. Gatekeepers at eLife often served in multiple roles; for example, acting as both a Reviewing Editor and peer reviewer on a given manuscript, or serving as a Senior Editor on one manuscript, but an invited peer review on another. In our sample, the Reviewing Editor was listed as a peer reviewer for 58.9 percent of full submissions. For initial submissions, we had data on only the corresponding author of the manuscript and the Senior Editor tasked with making the decision. For full submissions we had data on the corresponding author, first author, last author, Senior Editor, Reviewing Editor, and members of the team of invited peer reviewers. Data for each individual included their stated name, institutional affiliation, and country of affiliation. A small number of submissions were removed, such as those that had a first but no last author and those that did not have a valid submission type. Country names were manually disambiguated (for example, normalized names such as “USA” to “United States” and “Viet Nam” to “Vietnam”). To simplify continent-level comparisons, we also excluded one submission for which the corresponding author listed their affiliation as Antarctica.
Full submissions included 6,669 distinct gatekeepers, 5,694 distinct corresponding authors, 6,691 distinct first authors, and 5,581 distinct last authors. Authors were also likely to appear on multiple manuscripts and may have held a different authorship role in each: whereas our data included 17,966 distinct combinations of author name and role, this number comprised only 12,059 distinct authors. For 26.5 percent of full submissions the corresponding author was also the first author, whereas for 71.2 percent of submissions the corresponding author was the last author. We did not have access to the full authorship list that included middle authors. Note that in the biosciences, the last author is typically the most senior researcher involved [59] and responsible for more conceptual work, whereas the first author is typically less senior and performs more of the scientific labor (such as lab work, analysis, etc.) to produce the study [60–62].
Gender assignment
Gender variables for authors and gatekeepers were coded using an updated version of the algorithm developed in [6]. This algorithm used a combination of the first name and country of affiliation to assign each author’s gender on the basis of several universal and country-specific name-gender lists (e.g., United States Census). This list of names was complemented with an algorithm that searched Wikipedia for pronouns associated with names.
We validated this new list by applying it to a dataset of names with known gender. We used data collected from RateMyProfessor.com, a website containing anonymous student-submitted ratings and comments for professors, lecturers, and teachers for professors at universities in the United States, United Kingdom, and Canada. We limited the dataset to only individuals with at least five comments, and counted the total number of gendered pronouns that appeared in their text; if the total of one gendered-pronoun type was at least the square of the other, then we assigned the gender of the majority pronoun to the individual. To compare with pronoun-based assignment, we assigned gender using the previously detailed first-name based algorithm. In total, there were 384,127 profiles on RateMyProfessor.com that had at least five comments and for whom pronouns indicated a gender. Our first name-based algorithm assigned a gender of male or female to 91.26 percent of these profiles. The raw match-rate between these two assignments was 88.6 percent. Of those that were assigned a gender, our first name-based assignment matched the pronoun assignment in 97.1 percent of cases, and 90.3 percent of distinct first names. While RateMyProfessor.com and the authors submitting to eLife represent different populations (RateMyProfessor.com being biased towards teachers in the United States, United Kingdom, and Canada), the results of this validation provide some credibility to the first-name based gender assignment used here.
We also attempted to manually identify gender for all Senior Editors, Reviewing Editors, invited peer reviewers, and last authors for whom our algorithm did not assign a gender. We used Google to search for their name and institutional affiliation, and inspected the resulting photos and text in order to make a subjective judgment as to whether they were presenting as male or female.
Through the combination of manual efforts and our first-name based gender-assignment algorithm, we assigned a gender of male or female to 95.5 percent (n = 35,511) of the 37,198 name/role combinations that appeared in our dataset. 26.7 percent (n = 9,910) were assigned a gender of female, 68.8 percent (n = 25,601) were assigned a gender of male, while a gender assignment could be not assigned for the remaining 4.5 percent (n = 1,687). This gender distribution roughly matches the gender distribution observed globally across scientific publications [6]. A breakdown of these gender demographics by role can be found in S1 Table and S2 Table
Gender composition of reviewers
To examine the relationship between author-gatekeeper gender homogeny on review outcomes, we analyzed the gender composition of the gatekeepers and authors of full submissions. Each manuscript was assigned a reviewer composition category of all-male, all-female, mixed, or uncertain. Reviewer teams labeled all-male and all-female were teams for which we could identify a gender for every member, and for which all genders were identified as either male or female, respectively. Teams labeled as mixed were those teams where we could identify a gender for at least two members, and which had at least one male and at least one female peer reviewer. Teams labeled as uncertain were those teams for which we could not assign a gender to every member and which were not mixed. A full submission was typically reviewed by two to three peer reviewers, which may or may not include the Reviewing Editor. However, the Reviewing Editor was always involved in the review process of a manuscript, and so we always considered the Reviewing Editor as a member of the reviewing team. Of 7,912 full submissions, a final decision of accept or reject was given for 6,590 during the dates analyzed; of these, 47.7 percent (n = 3,144) were reviewed by all-male teams, 1.4 percent (n = 93) by all-female teams, and 50.8 percent (n = 3,347) by mixed-gender teams; the remaining six manuscripts had reviewer teams classified as uncertain and were excluded from further analysis.
Institutional Prestige
Institutional names for each author were added manually by eLife authors and were thus highly idiosyncratic. Many institutions appeared with multiple name variants (e.g., “UCLA”, “University of California, Los Angeles”, and “UC at Los Angeles”). In total, there were nearly 8,000 unique strings in the affiliation field. We performed several pre-processing steps on these names, including converting characters to lower case, removing stop words, removing punctuation, and reducing common words to abbreviated alternatives (e.g., “university” to “univ”). We used fuzzy-string matching with the Jaro-Winkler distance measure [63] to match institutional affiliations from eLife to institutional rankings in the 2016 Times Higher Education World Rankings. A match was established for 15,641 corresponding authors of initial submission (around 66 percent). Matches for last authors were higher: 5,118 (79 percent) were matched.
Institutions were classed into two levels of prestige: “top” institutions were those within the top 50 universities as ranked by the global Times Higher Education. Institutions which ranked below the top 50, or which were otherwise unranked or which were not matched to a Times Higher Education ranking were labeled as “non-top”. One limitation of the Times Higher Education ranking as a proxy for institutional prestige is that these rankings cover only universities, excluding many prestigious research institutes. To mitigate this limitation, we mapped a small number of well-known and prestigious biomedical research institutes to the “top” category, including: The Max Plank Institutes, the National Institutes of Health, the UT Southwestern Medical Center, the Memorial Sloan Cancer Medical Center, the Ragon institutes, and the Broad Institute.
Geographic distance
Latitude and longitude of country centroids were taken from Harvard WorldMap [64]; country names in the eLife and Harvard WorldMap dataset were manually disambiguated and then mapped to the country of affiliation listed for each author from eLife (for example, “Czech Republic” from the eLife data was mapped to “Czech Rep.” in the Harvard WorldMap data). For each initial submission, we calculated the geographic distance between the centroids of the countries of the corresponding author and Senior Editor; we call this the corresponding author-editor geographic distance. For each full submission, we calculated the sum of the geographic distances between the centroid of the last author’s country and the country of each of the reviewers. All distances were calculated in thousands of kilometers; we call this the last author-reviewers geographic distance.
Analysis
We conducted a series of χ2 tests of equal proportion as well as multiple logistic regression models in order to assess the extent to which the likelihood that an initial submission is encouraged and that a full submission is accepted. We supply p-values and confidence intervals as a tool for interpretation; we generally maintain the convention of 0.05 as the threshold for statistical significance, though we also report and interpret values just outside of this range. When visualizing proportions, 95% confidence intervals are calculated using the definition , where p is the proportion and n is the number of observations in the group. When conducting χ2 tests comparing groups based on gender, we excluded submissions for which no gender could be identified. When conducting tests for gender and country homogeny, we report 95% interval confidence intervals of their difference in proportion—we do not report confidence intervals for tests involving more than two groups. Odds ratios and associated 95% confidence intervals are reported for logistic regression models. Data processing, statistical testing, and visualization was performed using R version 3.4.2 and RStudio version 1.1.383.
Having demonstrated gender and national inequities in peer review with this exploratory univariate analysis, we built a series of logistic regression models to investigate whether these differences could be explained by other factors. In each model, we used the submission’s outcome as the response variable, whether that be encouragement (for initial submissions) or acceptance (for full submissions). For both initial and full submissions, we added control variables for the year of submission (measured from 0 to 5, representing 2012 to 2017, accordingly), the type of the submission (Research Article, Short Report, or Tools and Resources), and the institutional prestige of the author (top vs non-top). For full submissions, we also controlled for the gender of the first author. Mirroring out univariate analysis, we constructed two sets of models. The first set of models investigates the extent of peer review inequities based on author characteristics. We considered predictor variables for the gender and continent of affiliation of the corresponding author (for initial submissions), and the last author (for full submissions). For the second set of models, we investigated whether these inequities differed based on gender or national homogeny between the author and the reviewer or editor. In addition to variables from the first model, we considered several approaches to capture the effect of gender-homogeny between the author and reviewers on peer review inequity (see below). We also included variables for the corresponding author-editor geographic distance (for initial submissions), and last author-reviewers geographic distance (for full submission), and a dummy variable indicating whether this distance was zero; these variables serve as proxies for the degree of national homogeny between the author and the editor or reviewers. There were a small number of Senior Editors in our data—in order to protect their identity we did not include their gender or specific continent of affiliation in any models; we maintained a variable for corresponding author-editor geographic distance.
Several approaches were considered for modeling the relationship between equity in peer review and relationship to the reviewer team. The simplest approach—to examine the interaction between author and reviewer characteristic—does not adequately address the research question as it focuses on individual interactions rather than on compositional effects of the reviewer team. Collapsing these into individual interactions (e.g., all-male, mixed, all-female) also fails to address whether there is a difference between these various interactions: this would require a manual comparison and statistical test of parameter estimates from each interaction. This does not provide parsimonious interpretation of the model outcomes. Therefore, we took two complimentary approaches. The first involves the construction of two separate models—one including only submissions reviewed by all men and another including only those reviewed by mixed-gender teams. We then compared the effect of last author gender between each model. A model for all-female reviewers was excluded due to the small sample size (representing less than 2 percent of all submissions). This approach simplifies interpretation compared to a simple interaction model, but still fails to provide a universal test of the interaction between author demographics and reviewer team demographics. The full model contained a categorical variable which included all six combinations of last author gender (male, female) and reviewer team composition (all-male, all-female, mixed).
Results
Gatekeeper representation
We first analyzed whether the gender and national affiliations of the population of gatekeepers at eLife was similar to that of the authors of initial and full submissions. The population of gatekeepers was primarily comprised of invited peer reviewers, as there were far fewer Senior and Reviewing Editors. A gender and national breakdown by gatekeeper type has been provided in S2 Table, and S3 Table.
Fig 3 illustrates the gender and national demographics of authors and gatekeepers at eLife. The population of gatekeepers at eLife was largely male. Only 21.6 percent (n = 1,440) of gatekeepers were identified as female, compared with 26.6 percent (n = 4,857) of corresponding authors (includes authors of initial submissions), 33.9 percent (n = 2,272) of first authors, and 24.0 percent (n = 1,341) of last authors. For initial submissions, we observed a strong difference between the gender composition of gatekeepers and corresponding authors, χ2(df= 1, n = 17, 119) = 453.9, p ≤ 0.00001. The same held for full submissions, with a strong difference for first authorship, χ2(df= 1, n = 6, 153) = 844.4, p ≤ 0.0001; corresponding authorship, χ2(df= 1, n = 6, 647) = 330.04, p ≤ 0.0001; and last authorship, χ2(df= 1, n = 5, 292) = 17.7, p ≤ 0.00003. Thus, the gender proportions of gatekeepers at eLife was male-skewed in comparison to the authorship profile.
The population of gatekeepers at eLife was heavily dominated by those from North America, who constituted 59.9 percent (n = 3,992) of the total. Gatekeepers from Europe were the next most represented, constituting 32.4 percent (n = 2,162), followed by Asia with 5.7 percent (n = 378). Individuals from South America, Africa, and Oceania each made up less than two percent of the population of gatekeepers. As with gender, we observed differences between the international composition of gatekeepers and that of the authors. Gatekeepers from North America were over-represented whereas gatekeepers from Asia and Europe were under-represented for all authorship roles. For initial submissions, there was a significant difference in the distribution of corresponding authors compared to gatekeepers χ2(df= 5, n = 18, 195) = 6738.5, p ≤ 0.00001. The same held for full submissions, with a significant difference for first authors, χ2(df= 5, n = 6, 674) = 473.3, p ≤ 0.00001, corresponding authors, χ2(df= 5, n = 6, 669) = 330.04, p ≤ 0.00001, and last authors χ2(df= 5, n = 5, 595) = 417.2, p ≤ 0.0001. The international representation of gatekeepers was most similar to first and last authorship (full submissions), and least similar to corresponding authorship (initial submissions) due to country-level differences in acceptance rates (see Fig 4). We also note that the geographic composition of submissions to eLife has changed over time, attracting more submissions from authors in Asia in later years of analysis (see S4 Fig).
Peer Review Outcomes by Author Gender, Nationality
Male authorship dominated eLife submissions: men accounted for 76.9 percent (n = 5,529) of gender-identified last authorships and 70.7 percent (n = 5,083) of gender-identified corresponding authorships of full submissions (see S3 Fig). First authorship of full submissions was closest to gender parity, although still skewed towards male authorship at 58.1 percent (n = 4,179).
We observed a gender inequity favoring men in the outcomes of each stage of the review processes. The percentage of initial submissions encouraged was 2.1 percentage points higher for male corresponding authors—30.83 to 28.75 percent, χ2(df= 1, n = 22, 319) = 8.95, 95% CI = [0.7, 3.4], p = 0.0028 (see S3 Fig). Likewise, the percentage of full submissions accepted was higher for male corresponding authors—53.7 to 50.8 percent χ2(df= 1, n = 6, 188) = 3.95, 95% CI = [0.03, 5.8], p = 0.047. The gender inequity at each stage of the review process yielded higher overall acceptance rates (the percentage of initial submissions eventually accepted) for male corresponding authors (15.6 percent) compared with female corresponding authors (13.8 percent), χ2(df= 1, n = 21, 670) = 10.96, 95% CI = [0.8, 2.9], p = 0.0009.
Fig 4.A shows the gendered acceptance rates of full submissions for corresponding, first and last authors. We observed little to no relationship between the gender of the first author and the percentage of full submissions accepted, χ2(df= 1, n = 5, 971) = 0.34, 95% CI = [−1.8, 3.5], p = 0.56. There however was a significant gender inequity in full submission outcomes for last authors—the acceptance rate of full submissions was 3.5 percentage points higher for male as compared to female last authors—53.5 to 50.0 percent, χ2(df= 1, n = 6, 505) = 5.55, 95% CI = [0.5, 6.4], p = 0.018.
Fig 4.B shows the proportion of manuscripts submitted, encouraged, and accepted to eLife from corresponding authors originating from the eight most prolific countries (in terms of initial submissions). Manuscripts with corresponding authors from these eight countries accounted for a total of 73.9 percent of all initial submissions, 81.2 percent of all full submissions, and 86.5 percent of all accepted publications. Many countries were underrepresented in full and accepted manuscripts compared to their submissions. For example, whereas papers with Chinese corresponding authors accounted for 6.9 percent of initial submissions, they comprised only 3.0 percent of full and 2.4 percent of accepted submissions. The only countries that were over-represented—making up a greater portion of full and accepted submissions than expected given their initial submissions—were the United States, United Kingdom, and Germany. In particular, corresponding authors from the United States made up 35.8 percent of initial submissions, yet constituted 48.5 percent of full submissions and the majority (54.9 percent) of accepted submissions.
Each stage of review contributed to the disparity of national representation between initial, full, and accepted submissions, with manuscripts from the United States, United Kingdom, and Germany more often encouraged as initial submissions and accepted as full submissions. Fig 4.C shows that initial submissions with a corresponding author from the United States were the most likely to be encouraged (39.2 percent), followed by the United Kingdom (31.7 percent) and Germany (29.3 percent). By contrast, manuscripts with corresponding authors from Japan, Spain, and China were comparatively less likely to be encouraged (21.4, 16.7, and 12.6 percent, respectively). These differences narrowed somewhat for full submissions: the acceptance rate for full submissions with corresponding authors from the U.S. was the highest (57.6 percent), though more similar to the United Kingdom and France than encourage rates.
There were gendered differences in submissions by nationality (S5 Fig), but there were insufficient data to test whether gender and national affiliation interacted to affect the probability of acceptance.
Peer Review Outcomes by Author-Gatekeeper Homogeny
Fig 4 illustrated higher acceptance rates for full submissions from male corresponding and last authors (submissions with authors of unidentified gender excluded). Fig 5.A and Fig 5.B show that this disparity manifested largely from instances when the reviewer team was all male. When all reviewers were male, the acceptance rate of full submissions was about 4.7 percentage points higher for male compared to female last authors (χ2 = 4.48(df= 1, n = 3, 110), 95% CI = [0.3, 9.1], p = 0.034) and about 4.4 points higher for male compared to female corresponding authors (S6 Fig; χ2(df= 1, n = 2, 974) = 3.97, 95% CI = [0.1, 8.7]p = 0.046). For mixed-gender reviewer teams, the disparity in author success rates by gender was smaller and non-statistically-significant. All-female reviewer teams were rare (only 81 of 6,509 processed full submissions). In the few cases of all-female reviewer teams, there was a higher acceptance rate for female last, corresponding, and first authors; however, these differences were not statistically significant, though the number of observations was too small to draw firm conclusions. There was no significant relationship between first authorship gender and acceptance rates, regardless of the gender composition of the reviewer team. In summary, we found that full submissions with male corresponding and last authors were more often accepted under the condition of gender homogeny when they were reviewed by a team of gatekeepers consisting only of men; greater parity in outcomes was observed when gatekeeper teams contained both men and women. We refer to this favoring by reviewers of authors sharing their same gender as homophily.
We also investigated the relationship between peer review outcomes and the presence of national homogeny between the last author and reviewer. We defined last author-reviewer national homogeny as a condition for which at least one member of the reviewer team (Reviewing Editor and peer reviewers) listed the same national affiliation as the last author. We only considered the nationality of the last author, since the nationality of the last author was the same as the nationality of the first and corresponding author for 98.4 and 94.9 percent of full submissions, respectively. Outside of the United States, the presence of country homogeny during review was rare. Whereas 88.4 percent of full submissions with last authors from the U.S. were reviewed by at least one gatekeeper from their country, homogeny was present for only 29.3 percent of full submissions with last authors from the United Kingdom and 26.2 percent of those with a last author from Germany. The likelihood of reviewer homogeny fell sharply for Japan and China which had geographic homogeny for only 10.3 and 9.9 percent of full submissions, respectively. More extensive details on the rate of author/reviewer homogeny for each country can be found in S5 Table.
We examined whether last author-reviewer country homogeny tended to result in the favoring of submissions from authors of the same country as the reviewer. We first pooled together all authors from all countries (n = 6,508 for which there was a full submission and a final decision), and found that the presence of homogeny during review was associated with a 10.0 percentage point higher acceptance rate, (Fig 5.C; χ2(1, n = 6, 508) = 65.07, 95% CI = [7.58, 12.47], p ≤ 0.00001). However, most cases of homogeny occurred for authors from the United States, so this result could potentially reflect the higher acceptance rate for these authors (see Fig 4), rather than homophily overall. Therefore we repeated the test, excluding all full submissions with last authors from the United States, and we again found a significant, though statistically less confident homophilic effect, χ2(df= 1, n = 3, 236) = 4.74, 95% CI = [0.52, 10.1], p = 0.029. We repeated this procedure again, excluding authors from both the United States and United Kingdom, (the two nations with the highest acceptance rates, see 4), and we identified no homophilic effect, χ2(df= 1, n = 1, 920) = 0.016, 95% CI = [−4.6, 7.7]p = 0.65. At the level of all countries, the effects of last-author reviewer country-homophily were largely driven by the United States and United Kingdom.
We also examined the effects of homogeny within individual nations and tested for the presence of homophilic effects. Fig 5.D shows acceptance rates for last authors affiliated within the eight most prolific nations submitting to eLife. For the United States, presence of homogeny was affiliated with a 6.9 percentage point higher likelihood of acceptance compared to no homogeny χ2(df= 1, n = 3, 270) = 6.25, 95% CI = [1.4, 12.4], p = 0.0124. Similarly, papers from the United kingdom were 8.0 percentage points more likely to be accepted under the presence of last author-reviewer homogeny χ2(df= 1, n = 739) = 3.65, 95% CI = [−0.1, 16.2], p = 0.056. In contrast, submissions with last authors from France were 23 percentage points less likely to be accepted under the presence of national homogeny χ2(df= 1, n = 204) = 4.34, 95% CI = [−42.8, −3.4], p = 0.037. There was a similar, though non-significant effect for Canada and Switzerland (French-speaking countries). In summary, the presence of national homogeny was rare unless an author was from the United States, but that the effects of last author-reviewer national homogeny was associated with heterogeneous outcomes, depending on the country. However, due to the rarity of national homogeny outside of the U.S., more data is needed to draw firm conclusions on a per-country basis.
Peer review outcomes by author characteristics
Having observed evidence of gender and national inequities in peer review outcomes from our univariate analysis, we further investigated whether these inequities were the result of confounding factors. We first attempted to confirm results from Fig 4) using logistic regression to model peer review outcomes based on the gender and continent of affiliation of the corresponding author (for initial submissions) and the last author (for full submissions). We controlled for the prestige of the author’s institutional affiliation, the year in which the manuscript was submitted, and the submission type (Research Article, Short Report, or Tools and Resources). For full submissions, we also controlled for the gender of the first author. The results of this regression for initial and full submissions are shown in Fig 6.
For initial submissions, the institutional prestige was the largest positive effect on peer review outcomes for initial submissions, (see Fig 6.A; β = 1.726, 95% CI = [1.663, 1.789], p ≤ 0.0001). An increase in the year of submission was associated with a lower odds of acceptance, (β = 0.918, 95% CI = [0.894, 0.942], p ≤ 0.0001), reflecting the increasing selectivity of eLife. We also found that, compared to Research Articles, both Short Reports, (β = 0.742, 95% CI = [0.638, 0.847], p ≤ 0.0001), and Tools and Resources (β = 0.740, 95% CI = [0.567, 0.913], p ≤ 0.0001) were less likely to be accepted. Even when controlling for these variables, there were still inequities by the gender and national affiliation of the corresponding author, affirming findings from Fig 4. An initial submission with a male corresponding author was associated with a 1.12 times increased odds of being encouraged (95% CI = [1.048, 1.182], p = 0.0014). We also found that an initial submission with a corresponding author from a country outside of North America was associated with a lower odds of being encouraged. A submission with a corresponding author from Europe was 0.68 times less likely to be encouraged than an author from North America, (95% CI = [0.3236, 0.783], p ≤ 0.0001). After Europe, a corresponding author from Oceania was 0.56 times less likely to be accepted (95% CI = [0.34, 0.78], p ≤ 0.0001), followed by corresponding authors from Africa (β = 0.53, 95% CI = [−0.18, 1.088], p = 0.027), Asia (β = 0.40, 95% CI = [0.30, 0.49], p ≤ 0.0001), and South America (β = 0.21, 95% CI = [−0.269, 0.679], p ≤ 0.0001).
The same effects also held for full submissions (Fig 6.B), though with smaller effect sizes. Institutional prestige again had a strong positive effect on the odds of a full submission being accepted, (β = 1.379, 95% CI = [1.272, 1.486], p ≤ 0.0001). The submission year was again associated with a lower odds of acceptance (β = 0.888, 95% CI = [0.847, 0.929], p ≤ 0.0001), reflecting that eLife’s increasing selectivity also extended to full submissions. Unlike initial submissions, there was no significant differences between types of submissions. We also controlled for the gender of the first author, though we found no significant difference between submissions with male and female first authors, or between female first authors and those with unknown gender. Controlling for these variables, we used this model (Fig 6.B) to confirm the gender and national inequities in full submission outcomes observed in Fig 4. Full submissions with a male last author were associated with a 1.14 times increased odds of being accepted, compared to submissions with female last authors (95% CI = [1.03, 1.26], p = 0.025)—an effect similar in magnitude to that of the corresponding author gender in initial submissions. Geographic inequities were present, though they were less pronounced compared to initial submissions. A full submission with a last author from Africa was associated with a higher odds of being accepted than a submission with a North America last author (β = 1.48, 95% CI = [0.46, 2.50], p = 0.45), followed by Oceania (β = 0.91, 95% CI = [0.49, 1.32], p = 0.64), Europe (β = 0.86, 95% CI = [0.75, 0.97], p = 0.008), South America (β = 0.84, 95% CI = [−0.1, 1.78], p = 0.71), and Asia (β = 0.59, 95% CI = [0.41, 0.76], p ≤ 0.0001); however, these difference were only significant for Europe and Asia.
Peer review outcomes by the author and gatekeeper characteristics and homogeny
Having observed differences in gender and national equity in peer review based on the composition of the reviewer team (Fig 5), we further investigated whether these patterns of inequity persisted when controlling for potentially-confounding factors. Extending Fig 6, we again modelled outcomes of initial and full submissions using logistic regression, but incorporating additional variables for reviewer characteristics and author-reviewer homogeny. We included the corresponding author-editor geographic distance (for initial submissions), and the last author-reviewers geographic distance (for full submissions); the former is the geographic distance between the centroids of the countries of affiliation of the corresponding author and the Senior Editor, whereas the latter is the sum of the geographic distance between the centroids of the last author’s country, and the country of all of the peer reviewers. This variable is intended to model the degree of homogeny between the author and the editor or reviewers. All distances were calculated in thousands of kilometers; for example, the geographic distance between the United States and Denmark is 7.53 thousands of kilometers. For both initial and full submissions, we included a dummy variable indicating whether the distance was zero. For full submissions, we considered three approaches to model the extent to which gender equity differed based on the gender composition of the reviewer team. One approach used interaction terms between the last author gender and the composition of the reviewer team (S1 Text); another compared parameter estimates for last author gender between separate models (S2 Text), and the third modelled global interactions using a variable combining factor levels for last author gender and reviewer team composition (see fig 7.B). In this section, we first affirm results from Fig 5 for initial and full submissions using regression results from S8 Table and Fig 7.A, focusing on those variables not included in Fig 6. Following this, we present an approach to model a generalizable relationship between last author gender and reviewer team gender composition.
S8 Table shows that, for initial submissions, there were similar effects for each control variable, in terms of direction and magnitude, as in Fig 6. We did not consider the relationship between the gender of the corresponding author and the gender of the Senior Editor in order to protect the identity of the small number of Senior Editors. Controlling for other variables, zero distance between the corresponding author and Senior Editor (indicating that they were from the same country) was associated with a 1.56 times increased odds of being encouraged (95% CI = [1.01, 1.034], p ≤ 0.0001). Controlling for presence of corresponding author-editor distance, every additional 1,000km of corresponding author-editor geographic distance was associated with a 1.02 times increase in the odds of being encouraged (95% CI [1.45, 1.67], p = 0.0003). We note that these geographic effects may be confounded by the low number of Senior Editors, and the fact that the majority of Senior Editors were affiliated within North America and Europe.
For full submissions, we first modelled peer review outcomes as in Fig 6 but with additional variables for the gender composition of the reviewer team and last author-reviewers geographic distance (see Fig 7.A). The effect of control variables—submission year, submission type, author institutional prestige, and first author gender—were similar to those in Fig 6. A full submission with a male last author was 1.14 times more likely to be accepted than a submission with a female last author (95% CI = [1.020, 1.256], p = 0.032), even after controlling for reviewer-team gender composition. Compared to mixed-gender reviewer teams, submissions reviewed by all-male reviewers were 1.15 times more likely to be accepted (95% CI = [1.051, 1.252], p = 0.0059); there was no significant difference between all-female and mixed-gender teams. After controlling for reviewer characteristics (gender composition, distance author-reviewer geographic distance), there were effects of author continent of affiliations that diverged from Fig 6. Compared to affiliation within North America, submissions with a last author from Oceania were associated with a 1.494 times increased odds of acceptance, though with wide confidence intervals, (95% CI = [1.020, 1.968], p = 0.097]); this diverges from the non-significant negative effect observed in Fig 6. Controlling for last author-reviewers geographic distance, affiliation within Asia was associated with a 0.779 times reduced odds of acceptance compared to North America (95% CI = [0.565, 0.992], p = 0.022)—a smaller effect than the 0.585 times reduced odds observed in Fig 6. last author-reviewers geographic distance of zero (indicating that all reviewers were from the same country as the corresponding author) was not associated with a strong effect. Every 1000km of last author-reviewer distance was associated with a 0.988 times decreased odds of acceptance (95% CI = [0.982, 0.994], p ≤ 0.0001). The negative effect of last author-reviewers geographic distance provides additional evidence for the observations from Fig 5—that homogeny between the author and reviewers was associated with a greater odds of acceptance, even when controlling for the continent of affiliation of the author and other characteristics of the author and submission.
To make use of all data in a single regression, we modelled global interactions between last author gender and reviewer-team composition by combining them into a single categorical variable containing all six combinations of factor levels (Fig 7.B). Full submission with a male last author and which were reviewed by a team of all-male reviewers was associated with a 1.22 times higher odds of being accepted than a full submission with a female last author that was reviewed by an all male team (95% CI = [1.044, 1.40], p = 0.027). No significant differences were observed for other combinations of author gender and reviewer gender composition. The absolute difference in parameter estimates between male and female authors among mixed-gender teams (0.084) was less than half that of all-male reviewer teams (0.198), suggesting greater equity among submissions reviewed by mixed-gender teams than by all-male teams. Taken together, these findings and those discussed in S1 Text and S2 Text suggest that gender inequity in peer review outcomes were in part mitigated by mixed-gender reviewer teams, even controlling for many potentially confounding factors. These results provide evidence affirming observations from the univariate analysis in fig 5.
Discussion
We identified inequities in peer review outcomes at eLife, based on the gender and national affiliation of the senior (last and corresponding) authors. We observed a disparity in the acceptance rates of submissions with male and female last authors that favored men. Inequities were also observed by country of affiliation. In particular, submissions from developed countries with high scientific capacities tended to have higher success rates than others. These inequities in peer review outcomes could be attributed, at least in part, to a favorable interaction between gatekeeper and author demographics under the conditions of gender or national homogeny; we describe this favoring as homophily, a preference based on shared characteristics. Gatekeepers were more likely to recommend a manuscript for acceptance if they shared demographic characteristics with the authors, demonstrating homophily. In particular, manuscripts with male senior (last or corresponding) authors were more likely to be accepted if reviewed by an all-male reviewer panel rather than a mixed-gender panel. Similarly, manuscripts were more likely to be accepted if at least one of the reviewers was from the same country as the last or corresponding author, though there were exceptions on a per-country basis (such as France and Canada). We followed our univariate analysis with a regression analysis, and observed evidence that these inequities persisted even when controlling for potentially confounding variables. The differential outcomes on the basis of author-reviewer homogeny suggests that peer review at eLife is influenced by some form of bias—be it implicit bias [3, 17], geographic or linguistic bias [26, 65, 66], or cognitive particularism [40]. Specifically, a homophilic interaction suggests that peer review outcomes may sometimes be based on more than the intrinsic quality of manuscript; the composition of the review team is also related to outcomes in peer review.
The opportunity for homophilous interactions is determined by the demographics of the gatekeeper pool. We found that the demographics of the gatekeepers differed significantly from those of the authors, even for last authors, who tend to be more senior [59–62]. Women were underrepresented among eLife gatekeepers, and gatekeepers tended to come from a small number of highly-developed countries. The underrepresentation of women at eLife mirrors global trends—women comprise a minority of total authorships, yet constitute an even smaller proportion of gatekeepers across many domains [14, 67–74]. Similarly, gatekeepers at eLife were less internationally diverse than their authorship, reflecting the general underrepresentation of the “global south” in leadership positions of international journals [75].
The demographics of the reviewer pool made certain authors more likely to benefit from homophily in the review process than others. U.S. authors were much more likely than not (see S5 Table) to be reviewed by a panel with at least one reviewer from the their country. However, the opposite was true for authors from other countries. Fewer opportunities for such homophily may result in a disadvantage for scientists from smaller and less scientifically prolific countries. For gender, male lead authors had a nearly 50 percent chance of being reviewed by a homophilous (all-male), rather than a mixed-gender team. In contrast, because all-female reviewer panels were so rare (accounting for only 81 of 6,509 full submission decisions), female authors were highly unlikely to benefit from homophily in the review process.
Increasing eLife’s editorial representation of women and scientists from a more diverse set of nations may lead to more diverse pool of peer reviewers and reviewing editors and a more equitable peer review process. Editors often invite peer reviewers from their own professional networks, networks that likely reflect the characteristics of the editor [76–78]; this can lead to editors, who tend to be men [14, 67–74] and from scientifically advanced countries [75] to invite peer reviewers who are cognitively or demographically similar to themselves [44, 79, 80], inadvertently excluding certain groups from the gatekeeping process. Accordingly, we found that male Reviewing Editors at eLife were less likely to create mixed-gender teams of gatekeepers than female Reviewing Editors (see S8 Fig). We observed a similar effect based on the nationality of the Reviewing Editor and invited peer reviewers (see S9 Fig). Moreover, in S11 Table we conducted an analysis similar to that in Fig 7, and found that this homophilous relationship may mostly result from the relationship between male last authors and male reviewing editors.
The size of disparities we observed in peer review outcomes may seem modest, however these small disparities can accumulate through each stage of the review process (initial submission, full submission, revisions), and potentially affect the outcomes of many submissions. For example, the overall acceptance rate (the rate at which initial submissions were eventually accepted) for male and female corresponding authors was 15.6 and 13.8 percent respectively; in other words, manuscripts submitted to eLife with female lead authors were published at about 88 percent the rate of those with male lead authors. Similarly, manuscripts submitted by lead authors from China were accepted at only 22.0 percent the rate of manuscripts submitted by a lead author from the United States (with overall acceptance rates of 4.9 and 22.3 percent, respectively). Success in peer review is vital for a researcher’s career because successful publication strengthens their professional reputation and makes it easier to attract funding, students, postdocs, and hence further publications. Even small advantages can compound over time and result in pronounced inequalities in science [81–84].
Our finding that the gender of the last authors was associated with a significant difference in the rate at which full submissions were accepted at eLife stands in contrast with a number of previous studies of journal peer review; these studies found no significant difference in outcomes of papers submitted by male and female authors [85–87], or differences in reviewer’s evaluations based on the author’s apparent gender [88]. This discrepancy may be explained in part by eLife’s unique context, policies, or the relative selectivity of eLife compared to venues where previous studies found gender equity. In addition, our results point to a key feature of study design that may account for some of the differences across studies: the consideration of multiple authorship roles. This is especially important for the biosciences, for which authorship order is strongly associated with contribution [61, 62, 89]. Whereas our study examined the gender of the first, last, and corresponding authors, most previous studies have focused on the gender of the first author (e.g., [2, 90]) or of the corresponding author (e.g., [22, 91]). Like previous studies, we observed no strong relationship between first author gender and review outcomes at eLife. Only when considering lead authorship roles—last authorship, and to a lesser extent, corresponding author, did we observe such an effect. Our results may be better compared with studies of grant peer review, where leadership roles are more explicitly defined, and many studies have identified significant disparities in outcomes favoring men [18, 92–95], although many other studies have found no evidence of gender disparity [21, 23, 24, 96–98]. Given that science has grown increasingly collaborative and that average authorship per paper has expanded [99, 100], future studies of disparities would benefit from explicitly accounting for multiple authorship roles and signaling among various leadership positions on the byline [59, 101].
The relationship we found between the gender and nationality of the gatekeepers and peer review outcomes also stands in contrast to the findings from a number of previous studies. One study, [102], identified a homophilous relationship between female reviewers and female authors. However, most previous analyses found only procedural differences based on the gender of the gatekeeper [22, 87, 88, 103] and identified no difference in outcomes based on the interaction of author and gatekeeper gender in journal submissions [87, 104, 105] or grant review [23]. Studies of gatekeeper nationality have found no difference in peer review outcomes based on the nationality of the reviewer [104, 106], though there is little research on the correspondence between author and reviewer gender. One past study examined the interaction between U.S. and non-U.S. authors and gatekeepers, but found an effect opposite to what we observed, such that U.S. reviewers tended to rate submissions of U.S. authors more harshly than those of non-U.S. authors [43]. Our results also contrast with the study most similar to our own, which found no evidence of bias related to gender, and only modest evidence of bias related to geographic region [2]. These discrepancies may result from our analysis of multiple author roles. Alternatively, they may result from the unique nature of eLife’s consultative peer review; the direct communication between peer reviewers compared to traditional peer review may render the social characteristics of reviewers more influential.
Limitations
There are limitations of our methodology that must be considered. First, we have no objective measure of the intrinsic quality of manuscripts. Therefore, it is not clear which review condition (homophilic or non-homophilic) more closely approximates the ideal of merit-based peer review outcomes. Second, measuring the relationship between reviewer and author demographics on peer review outcomes cannot readily detect biases that are shared by all reviewers/gatekeepers (e.g., if all reviewers, regardless of gender, favored manuscripts from male authors); hence, our approach could underestimate the influence of bias. Third, our analysis is observational, so we cannot establish causal relationships between success rates and authors or gatekeeper demographics—there remain potential confounding factors that we were unable to control for in the present analysis, such as the gender distribution of submission by country (see S5 Fig). Along these lines, the reliance on statistical tests with arbitrary significance thresholds may provide misleading results (see [107]), or obfuscate statistically weak but potentially important relationships. Fourth, our gender-assignment algorithm is only a proxy for author gender and varies in reliability by continent.
Further studies will be required to determine the extent to which the effects we observed generalize to other peer review contexts. Specific policies at eLife, such as their consultative peer review process, may contribute to the effects we observed. Other characteristics of eLife may also be relevant, including its level of prestige [13], and its disciplinary specialization in the biological sciences, whose culture may differ from other scientific and academic disciplines. It is necessary to see the extent to which the findings here are particularistic or generalizeable; it may also be useful in identifying explanatory models. Future work is necessary to confirm and expand upon our findings, assess the extent to which they can be generalized, establish causal relationships, and mitigate the effects of these methodological limitations. To aid in this effort, we have made as much as possible of the data and analysis publicly available at: https://github.com/murrayds/elife-analysis.
Conclusion and recommendations
Many factors can contribute to gender, national, and other inequities in scientific publishing. [47, 50, 108–111], which can affect the quantity and perceived quality of submitted manuscripts. However, these structural factors do not readily account for the observed relationship between gatekeeper and author demographics associated with peer review outcomes at eLife; rather, biases related to the personal characteristics of the authors and gatekeepers are likely to play some role in peer review outcomes.
Our results suggest that it is not only the form of peer review that matters, but also the composition of reviewers. Homophilous preferences in evaluation are a potential mechanism underpinning the Matthew Effect [1] in academia. This effect entrenches privileged groups while potentially limiting diversity, which could hinder scientific production, since diversity may lead to better working groups [112] and promote high-quality science [113, 114]. Increasing gender and international representation among scientific gatekeepers may improve fairness and equity in peer review outcomes and accelerate scientific progress. However, this must be carefully balanced to avoid overburdening scholars from minority groups with disproportionate service obligations.
Although some journals and publishers, such as eLife and Frontiers Media, have begun providing peer review data to researchers (see [44, 115]), data on equity in peer review outcomes is currently available only for a small fraction of journals and funders. While many journals collect these data internally, they are not usually standardized or shared publicly. One group, PEERE, authored a protocol for open sharing of peer review data [116, 117], though this protocol is recent, and the extent to which it will be adopted remains uncertain. To both provide better benchmarks and to incentivize better practices, journals should make analyses on author and reviewer demographics publicly available. These data include, but would not be limited to, characteristics such as gender, race, sexual orientation, seniority, and institution and country of affiliation. It is likely that privacy concerns and issues relating to confidentiality will limit the full availability of the data; but analyses that are sensitive to the vulnerabilities of smaller populations should be conducted and made available as benchmarking data. As these data become increasingly available, systematic reviews can be useful in identifying general patterns across disciplines and countries.
Some high-profile journals have experimented with implementing double-blind peer review as a potential solution to inequities in publishing, including Nature [118] and eNeuro [12], though in some cases with low uptake [119]. Our findings of homophilic effects may suggest that single-blind review is not the optimal form of peer review; however, our study did not directly test whether homophily persists in the case of double blind review. If homophily is removed in double-blind review, it reinforces the interpretation of bias; if it is maintained, it would suggest other underlying attributes of the manuscript that may be contributing to homophilic effects. Double-blind peer review is viewed positively by the scientific community [120, 121], and some studies have found evidence that double-blind review mitigates inequities that favor famous authors, elite institutions [85, 122, 123], and those from high-income and English-speaking nations [28]
There may be a tension, however, in attempting to further blind peer review while other aspects of the scientific system become more open. More than 20 percent of eLife papers that go out for review, for example, are already available as preprints. Several statements required for the responsible conduct of research—e.g., conflicts of interest, funding statements, and other ethical declarations—complicate the possibility of truly blind review. Other options involve making peer review more open—one recent study showed evidence that more open peer review did not compromise the integrity or logistics of the process, so long as reviewers could maintain anonymity [124].
Other alternatives to traditional peer review have also been proposed, including study pre-registration, consultative peer review, and hybrid processes (eg: [58, 125–129]), as well as alternative forms of dissemination, such as preprint servers (e.g., arXiv, bioRxiv). Currently, there is little empirical evidence to determine whether these formats constitute less biased or more equitable alternatives [3]. In addition, journals are analyzing the demographics of their published authorship and editorial staff in order to identify key problem areas, focus initiatives, and track progress in achieving diversity goals [14, 79, 86]. More work should be done to study and understand the issues facing peer review and scientific gatekeeping in all its forms and to promote fair, efficient, and meritocratic scientific cultures and practices. Editorial bodies should craft policies and implement practices to mitigate disparities in peer review; they should also continue to be innovative and reflective about their practices to ensure that papers are accepted on scientific merit, rather than particularistic characteristics of the authors.
Competing interests
Wei Mun Chan and Andrew M. Collings are employed by eLife. Jennifer Raymond and Cassidy R Sugimoto are Reviewing Editors at eLife. Andrew M. Collings was employed by PLOS between 2005 and 2012.
Ethics statement
This research underwent expedited review by the Institutional Review Board at Indiana University Bloomington and was determined to be exempt (Protocol #: 1707327848).
Acknowledgments
We are grateful for the editing and feedback provided by Susanna Richmond (Senior Manager at eLife), Mark Patterson (Executive Director at eLife), Eve Marder, Anna Akhmanova, and Detlef Weigel (Deputy Editors at eLife). We are also grateful for the work of James Gilbert (Production Editor at eLife) for extracting the data used in this analysis. This work was partially supported by a grant from the National Science Foundation (SciSIP #1561299).
References
- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.
- 11.
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.
- 83.
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.
- 94.
- 95.↵
- 96.↵
- 97.
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.
- 110.
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.
- 127.
- 128.
- 129.↵