The work conflates "open source" with "github", several times mentioning "open source" communities but then discussing github.
This could be justified if there was some attempt to restrict the analysis to only those projects hosted on github that are open source. For example, it is possible to inspect the github repos LICENSE file and match it against known Open Source licenses.
Without that, I would recommended that all use of "open source" with reference to github be removed, and replaced with "projects hosted on github" or similar.
Hello, I have some brief feedback regarding your bar charts in this manuscript. I am concerned that your bar charts are visually exaggerating effect sizes because they start at 60%. Please read this article to explain why it is vital to start bar charts at 0: https://flowingdata.com/2015/08/31/bar-chart-baselines-start-at-zero/
Furthermore, I think your manuscript would benefit from providing the exact percentages and CIs in each bar chart, especially in Figure 5 where your final argument seems to rest on said percentages.
I would also like to mirror some concerns wrt Figure 5 that others have raised:
1) It seems that the bars for each gender should be placed side-by-side, rather than the bars for gender-neutral/gendered profiles. This is because the primary comparison you seem to make (and rest your final conclusion upon) is within-gender between gender-neutral and gendered profiles. Really, the Figure 5 could just as well be presented as a grouped table because there are so few values (4 within each group).
2) Men also experience a significant drop in PR acceptance rate when their profile is gendered and they are considered outsiders, but this fact is completely glossed over in the text. In fact, outsiders don't seem to experience a statistically significant difference in PR acceptance rate regardless of their gender when their profile is gender-neutral or gendered.
As an actual developer who has worked in web design and is currently completing a college-level Computer Science course, this report feels insulting.
Firstly, in the introduction when the paper mentions a (presumed) woman named Rachel Nabors. It talks about how her pull requests were being rejected and she felt that it was discrimination based on her gender.
Rachel claiming sexism is a cop-out move, it sounds like she can't handle being critiqued and instantly plays the victim card. To confirm this, I looked her up. After checking her pull requests here, I found out that Rachel hasn't actually made any substantial pull requests.
She's only made 8 pull requests (which is low for a developer), 4 of which have been rejected. And of the 4 pull requests that we're rejected, two of them were rejected because she was trying to fix an error that had already been previously fixed by a male coder.
A laughable error such as that reveals that this paper cannot possible be approved academically.
The second massive problem in this paper is on page 11 when it judges how good a piece of code is by line numbers. If you're not familiar with Computer Science, I can tell you, without a doubt that judging code based on line numbers is completely meaningless.
There's actually a great quote on this by Bill Gates:
Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.
I suggest the authors read up on this here.
In conclusion, this paper is not understanding the very thing that it's analyzing, and it's sample size of only 8,000 women against 150,000 men doesn't help support the 4% higher merge rate of women.
I'd like to see a Bayesian multiple logistic regression done on this data using as predictors all the various explanatory variables that were considered. Could the authors make the data available in a format that allows such alternative analyses?
This is an interesting and novel research agenda. That said, I find the abstract misleading. The reporting of this study has also been very problematic, I think this is partially because of how you have written your abstract.
Your most statistically significant results seem to be that reporting gender has moderately positive impact on female acceptance (inside group) and that reporting gender has a large negative effect on acceptance for all outsiders, male and female. These two main results should be in the abstract. In your abstract you really should not be making strong claims about this paper showing bias against women because it doesn't. For the inside group it looks like the bias moderately favours women. For the outside group the biggest effect is the drop for both genders. You should hence be stating that it is difficult to understand the implications for bias in the outside group because it appears the main bias is against people with any gender vs people who are gender neutral.
You should also need to report the pooled results as well as the partitioned results for inside and outside.
It is also difficult to see how significant of any of the changes are after reporting gender in figure 5.
I believe with more careful interpretation of these results you would be able to publish this in a credible journal.
Interesting paper, but you haven't reported any measures of (standardised) effect size. With chi-square of 1170 (Table 1), Cohen's d = 0.04, which suggests a trivial effect. Likewise, with chi-square of 7.9 (Table 2), Cohen's d = 0.005, again a trivial effect. With such a large sample size, even very small/trivial effects like this are likely to be statistically significant. Focus on the size of effect and precision of the difference (confidence interval), not statistical significance.
This entire paper is based on the unsupported assumption that the acceptance rate of pull requests is a reliable proxy for the competence of the developer making the requests. I'd suggest that the requests most likely to be accepted are those that conform to the project's coding style (for better or worse) and do not challenge the egos of the project maintainers.
The study is nice but the data presentation, interpretation and discussion are very misleading. The introduction primes a clear expectation that women will be discriminated against while the data of course show the opposite. After a very large amount of data trawling, guided by a clear bias, you found a very small effect when the subjects were divided in two (insiders vs outsiders) and then in two again (gendered vs non-gendered). These manipulations (which some might call "p-hacking") were not statistically compensated for. Furthermore, you present the fall in acceptance for women who are identified by gender, but don't note that men who were identified also had a lower acceptance rate. In fact, the difference between men and women, which you have visually amplified by starting your y-axis at 60% (an egregious practice) is minuscule. The prominence given to this non-effect in the abstract, and the way this imposes an interpretation on the "gender bias" in your title, is therefore unwarranted.
This is a serious issue because already, the popular press are reporting your findings as showing bias against women, and the word is spreading that the GitHub coding community is misogynist when in fact the data show that it seems warmly accepting of women's contributions. This is very damaging to the credibility of academics and the confidence of women coders.
Having said all that, I thought the study itself was really interesting and if you removed the part about bias against women, is a very thought-provoking analysis, raising the question of why women have such a high acceptance rate for their code.
This is bad statistics and very weak evidence of anything. I feel sad that this kind of thing is promoted by the media.
The authors brag about the "largest study to date" in the abstract without recognizing that the effect size is very minor and the bias, which does not depend on sample size, could be enormous. There are many sources of possible biases: small % who could be assigned a gender, the reliability of the gender assignment itself, the representativity of GitHub, the p-hacking, lack of controlling for confounders, you name it.
The weaknesses are listed only in the appendix, but hey, the media folks only read the abstract. Morally, if you want to even publish such weak evidence you should make that clear at the beginning. But of course, who cares about disseminating bad information the popularity is worth it.
FYI, when you do causal analysis with observational data (that is, gender causes pull acceptance or discrimination), you need to do a good job controlling for confounders. Who knows, for example, if woman are not more educated? With more experience?
Showing only the top of the bar in the graph, without starting from zero is a good artifact for those who can only find very small, irrelevant effects, and need therefore to focus on them. Fortunately people are becoming more aware of that, but not the media, not those without statistical knowledge that will believe and spread any garbage posing as science.
The small confidence intervals, and statistically significant results, caused by large sample size, focus on the irrelevant sampling error and dismiss other errors and biases, already mentioned. If your data covers the population badly, is not reliable, sampling error becomes less important compared to other errors and small effect sizes may be just biases.
Please, don't make yourself more important than the science and the information that others rely on. If you want to report weak evidence, make sure that is clearly stated.
The results of this paper have now morphed into the unambiguous "One recent paper showed that women are considered better computer coders than men". (US News "Female Professors Hardly Brilliant, Certainly Not Genius" March 7, 2016)
The paper showed anything but that. The striking statistic from the paper was not the difference between 74% and 78%, but that only 8K women participated in GitHub compared to 150K men. Papers like this perpetuate the idea that women are mistreated by society. It leads to rewards for the authors but causes hostility between the genders and a worse life for both genders.
You can also choose to receive updates via daily or weekly email digests. If you are following multiple preprints then we will send you no more than one email per day or week based on your preferences.
Note: You are now also subscribed to the subject areas of this preprint and will receive updates in the daily or weekly email digests if turned on. You can add specific subject areas through your profile settings.
Usage since published - updated daily