Gender differences and bias in open source: Pull request acceptance of women versus men

Computer Science, California Polytechnic State University - San Luis Obispo, San Luis Obispo, California, United States
Computer Science, North Carolina State University, Raleigh, North Carolina, United States
Statistics, North Carolina State University, Raleigh, North Carolina, United States
DOI
10.7287/peerj.preprints.1733v2
Subject Areas
Human-Computer Interaction, Social Computing, Programming Languages, Software Engineering
Keywords
gender, bias, open source, software development, software engineering
Copyright
© 2016 Terrell et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, Stallings J. 2016. Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Preprints 4:e1733v2

Abstract

Biases against women in the workplace have been documented in a variety of studies. This paper presents the largest study to date on gender bias, where we compare acceptance rates of contributions from men versus women in an open source software community. Surprisingly, our results show that women's contributions tend to be accepted more often than men's. However, women's acceptance rates are higher only when they are not identifiable as women. Our results suggest that although women on GitHub may be more competent overall, bias against them exists nonetheless.

Author Comment

This revision addresses community feedback, specifically and most substantially:

(1) controlling covariates using propensity score matching,

(2) providing an interpretation of whether the differences are meaningful,

(3) including raw data used in figures as part of the appendix,

(4) characterizing the authors' own biases,

(5) adding section examining "Are women focusing their efforts on fewer projects?",

(6) comparing GitHub developers that are on Google+ to those who are not,

(7) adding analysis of exclusively projects that are licensed as open source,

(8) addition of statistical tests and corrections for false discovery, as appropriate,

(9) replacing bar chart representation of pull request acceptance rate,

(10) characterized missing data, and

(11) adding threats of uncaptured covariates and developer aliases.

The paper has a slightly revised title (adding "differences and") and we have added an author, Jon Stallings, who has contributed substantially to the revision.

Additionally, we have revised our data analysis pipeline substantially to use R scripts that extract data from our database and produce latex macros that define numerical results. We believe this improves the reliability of our analysis. In doing so, we found and fixed errors the following errors in the prior version:

* Y-axis in Figure 2 was previously truncated and means and medians in caption were incorrect,

* Rounding errors and transposition of "files changed" and "commits" in "Are women making smaller changes?",

* Incorrect summation of "without reference" pull requests, and consequently the accompanying percentages, in "Are women making pull requests that are more needed?", and

* One programming language difference (.m) was previously incorrectly reported as statistically significant.