Gender differences and bias in open source: Pull request acceptance of women versus men
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
Biases against women in the workplace have been documented in a variety of studies. This paper presents the largest study to date on gender bias, where we compare acceptance rates of contributions from men versus women in an open source software community. Surprisingly, our results show that women's contributions tend to be accepted more often than men's. However, women's acceptance rates are higher only when they are not identifiable as women. Our results suggest that although women on GitHub may be more competent overall, bias against them exists nonetheless.
Cite this as
2016. Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Preprints 4:e1733v2 https://doi.org/10.7287/peerj.preprints.1733v2Author comment
This revision addresses community feedback, specifically and most substantially:
(1) controlling covariates using propensity score matching,
(2) providing an interpretation of whether the differences are meaningful,
(3) including raw data used in figures as part of the appendix,
(4) characterizing the authors' own biases,
(5) adding section examining "Are women focusing their efforts on fewer projects?",
(6) comparing GitHub developers that are on Google+ to those who are not,
(7) adding analysis of exclusively projects that are licensed as open source,
(8) addition of statistical tests and corrections for false discovery, as appropriate,
(9) replacing bar chart representation of pull request acceptance rate,
(10) characterized missing data, and
(11) adding threats of uncaptured covariates and developer aliases.
The paper has a slightly revised title (adding "differences and") and we have added an author, Jon Stallings, who has contributed substantially to the revision.
Additionally, we have revised our data analysis pipeline substantially to use R scripts that extract data from our database and produce latex macros that define numerical results. We believe this improves the reliability of our analysis. In doing so, we found and fixed errors the following errors in the prior version:
* Y-axis in Figure 2 was previously truncated and means and medians in caption were incorrect,
* Rounding errors and transposition of "files changed" and "commits" in "Are women making smaller changes?",
* Incorrect summation of "without reference" pull requests, and consequently the accompanying percentages, in "Are women making pull requests that are more needed?", and
* One programming language difference (.m) was previously incorrectly reported as statistically significant.
Sections
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Josh Terrell conceived and designed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Andrew Kofink conceived and designed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Justin Middleton conceived and designed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper.
Clarissa Rainear analyzed the data, wrote the paper, reviewed drafts of the paper.
Emerson Murphy-Hill conceived and designed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Chris Parnin conceived and designed the experiments, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Jon Stallings conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.
Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):
NCSU IRB approved under #6708.
Data Deposition
The following information was supplied regarding data availability:
Data sets from GHTorrent and Google+ are publicly available.
Funding
This material is based in part upon work supported by the National Science Foundation under grant number 1252995. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.