0

This GHTorrent website warns in its FAQ for reliability issues with the dataset. (See The Promises and Perils of Mining GitHub, Eirini Kalliamvakou et al, 2014 http://gousios.gr/bibliography/KGBSGD15.html )

For this project specifically: VI. Only a fraction of projects use pull requests. And of those that use them, their use is very skewed. VII. If the commits in a pull-request are reworked (...

read more, vote or answer

waiting for moderation
0

What about the size effect in figure 11 ? What is the Cohen’s d for the gendered outsiders ?

I read in your study : “For outsiders, while men and women perform similarly when their genders are neutral, when their genders are apparent, men’s acceptance rate is 1.2% higher than women’s (χ2(df = 1, n = 419,411) = 7, p < .01).” For outsiders, we can see that the acceptance rate for women is 0.61...

read more, vote or answer

waiting for moderation
1

You examine several possible reasons for women having higher acceptance rates than men. One issue which you did not look at, and which I think may be relevant could be as follows:

If it is true that men are more likely to engage in risk than women, and given that there is a risk in submitting a PR, in that it may be rejected, is it possible that women wait until they have gained more experience...

read more, vote or answer

waiting for moderation
1

Emerson Murphy-Hill commented on the pre-print of this paper that they ran an analysis that women are more likely to accept pull requests from men than from women and that men are more likely to accept pull requests from women than from men. He said they did not put that analysis in this paper to keep it crisp. I...

read more, vote or answer

waiting for moderation
0

Here, as well as later in the article, you are working with data with drastically different sample sizes -- summing the table you have about 3M PRs from identified men and about 140k from women, making any one pull request about 20x more likely to be from a man than from a woman.

Have you considered potential impact on this imbalance on your statistical methods?

read more, vote or answer

waiting for moderation