Comments on "Researcher bias: The use of machine learning in software defect prediction"

Chakkrit Tantithamthavorn; Shane McIntosh; Ahmed E Hassan; Kenichi Matsumoto

doi:10.7287/peerj.preprints.1260v1

Comments on "Researcher bias: The use of machine learning in software defect prediction"

Chakkrit Tantithamthavorn ¹, Shane McIntosh², Ahmed E Hassan³, Kenichi Matsumoto¹

1 Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan

2 Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada

3 School of Computing, Queen's University, Kingston, Ontario, Canada

DOI: 10.7287/peerj.preprints.1260v1

Published: 2015-07-22
Accepted: 2015-07-22

Subject Areas: Data Mining and Machine Learning, Data Science, Software Engineering
Keywords: Software Engineering, Software Quality Assurance, Software Defect Prediction, Machine Learning, Researcher Bias

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. 2015. Comments on "Researcher bias: The use of machine learning in software defect prediction" PeerJ PrePrints 3:e1260v1 https://doi.org/10.7287/peerj.preprints.1260v1

Abstract

Shepperd et al. (2014) find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al. (2014)’s data. We observe that (a) researcher group shares a strong association with the dataset and metric families that are used to build a model; (b) the strong association among the explanatory variables introduces a large amount of interference when interpreting the impact of the researcher group on model performance; and (c) after mitigating the interference, we find that the researcher group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model may have more to do with the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat potential bias in their results.