What about the size effect in figure 11 ? What is the Cohen’s d for the gendered outsiders ?
I read in your study : “For outsiders, while men and women perform similarly when their genders are neutral, when their genders are apparent, men’s acceptance rate is 1.2% higher than women’s (χ2(df = 1, n = 419,411) = 7, p < .01).
”
For outsiders, we can see that the acceptance rate for women is 0.6188902 and 0.6310398 for men, which seems pretty close.
https://dfzljdn9uc3pi.cloudfront.net/2017/cs-111/1/data.pdf
You write : « We have demonstrated statistically significant differences between men’s and women’s pull request acceptance rates, such as that, overall, women’s acceptance rates are 4.1% higher than men’s. We caution the reader from interpreting too much from statistical significance; for big data studies such as this one, even small differences can be statistically significant. Instead, we encourage the reader to examine the size of the observed effects. »
In their paper Using Effect Size—or Why the P Value Is Not Enough Gail M. Sullivan, MD, MPH and Richard Feinn, PhD also caution the reader from interpreting too much from statistical significance :
"Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them. -Gene V. Glass1 The primary product of a research inquiry is one or more measures of effect size, not P values. -Jacob Cohen2"
"How to Calculate Effect Size Depending upon the type of comparisons under study, effect size is estimated with different indices. The indices fall into two main study categories, those looking at effect sizes between groups and those looking at measures of association between variables (table 1). For two independent groups, effect size can be measured by the standardized difference between two means, or mean (group 1) – mean (group 2) / standard deviation. The denominator standardizes the difference by transforming the absolute difference into standard deviation units. Cohen's term d is an example of this type of effect size index. Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), and large (d ≥ 0.8).5 According to Cohen, “a medium effect of .5 is visible to the naked eye of a careful observer. A small effect of .2 is noticeably smaller than medium but not so small as to be trivial. A large effect of .8 is the same distance above the medium as small is below it.” 6 These designations large, medium, and small do not take into account other variables such as the accuracy of the assessment instrument and the diversity of the study population. However these ballpark categories provide a general guide that should also be informed by context. Between group means, the effect size can also be understood as the average percentile distribution of group 1 vs. that of group 2 or the amount of overlap between the distributions of interventions 1 and 2 for the two groups under comparison. For an effect size of 0, the mean of group 2 is at the 50th percentile of group 1, and the distributions overlap completely (100%)—that is , there is no difference. For an effect size of 0.8, the mean of group 2 is at the 79th percentile of group 1; thus, someone from group 2 with an average score (ie, mean) would have a higher score than 79% of the people from group 1. The distributions overlap by only 53% or a non-overlap of 47% in this situation (table 2).5,6"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/
I would like to calculate the Cohen’s d but the standard deviation (sd) is missing in your paper. Could you give the d and sd to calculate it ? I suspect it is almost zero which would mean there is no difference between gendered outsiders women and men’s acceptance rate (or merge rate).