Appendix S1: Blacklisting referees

We provide the mathematical details of the editorial strategy of blacklisting referees with a high record of disagreements.

Figure S1: Effect of narcissistic referees

Narcissists accept only manuscripts that are similar enough to their own work to fall within the quality interval covering 95\% of their own scientific production. These are meant to represent referees with a (conscious or unconscious) bias towards endorsing the relevance/importance of manuscripts on their subfield of expertise. Here we plot the effect of narcissistic referees on the quality of accepted (\textbf{A}) and rejected (\textbf{B}, \textbf{C}) papers, as a function of their percentage in the referee pool (the remainder being moving-standard impartial referees). For comparison, we also plot the effect of indifferent selfish referees (described in the main text).

Figure S2: Two versus three referees

Average quality of accepted papers when two (\textbf{A}) and three (\textbf{B}) referees are assigned per manuscript, in concert with each editorial strategy tested in this study. Outcomes are qualitatively similar but quantitatively different. Three referees leads to better results overall (\textbf{C}), although not by very large percentage points, and the advantage declines with higher incidence of selfish referees in the pool, even reversing in some cases. $Q_{2(3)}$ is the average quality of accepted papers under 2 (3) referees. Under three referees the editor always honors the majority vote, unless dictated otherwise by the editorial strategy at hand.

Figure S3: Normal versus lognormal quality distribution

Average quality of accepted and rejected papers under normal (\textbf{A, B, C}) and lognormal (\textbf{D, E, F}) distribution of proficiency across authors and quality across a given author's works. No editorial action considered. A normal distribution follows if manuscript quality is the end result of multiple random additive factors. A lognormal distribution occurs under multiplicative random factors. Comparison between the top and bottom rows indicates that our results are robust to relaxing the assumption of normality. Parameters: mean author proficiency 100 (normal, lognormal); standard deviation of proficiency 10 (normal), 0.5 (lognormal); standard deviation of quality per author's works 5 (normal), 0.5 (lognormal).

The authors declare that they have no competing interests.

Rafael D'Andrea conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

James P O'Dwyer conceived and designed the experiments, reviewed drafts of the paper.

James O'Dwyer is supported by the Simons Foundation Grant #376199 and McDonnell Foundation Grant #220020439. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

