Mitigation of GHTorrent problems
Viewed 60 times
This GHTorrent website warns in its FAQ for reliability issues with the dataset. (See The Promises and Perils of Mining GitHub, Eirini Kalliamvakou et al, 2014 http://gousios.gr/bibliography/KGBSGD15.html )
For this project specifically: VI. Only a fraction of projects use pull requests. And of those that use them, their use is very skewed. VII. If the commits in a pull-request are reworked (in response to comments) GitHub records only the commits that are the result of the peer-review, not the original commits. VIII. Most pull requests appear as non-merged even if they are actually merged.
Is this advice ignored or were actions were taken to mitigate the documented reliability issues with the GHTorrent source data?
waiting for moderation