This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Research has yielded approaches for predicting future changes and defects in software artifacts, based on historical information, helping developers in effectively allocating their (limited) resources. Developers are unlikely able to focus on all predicted software artifacts, hence the ordering of predictions is important for choosing the right artifacts to concentrate on. We propose using a Genetic Algorithm (GA) for tailoring prediction models to prioritize classes with more changes/defects. We evaluate the approach on two models, regression tree and linear regression, predicting changes/defects between multiple releases of eight open source projects. Our results show that regression models calibrated by GA significantly outperform their traditional counterparts, improving the ranking of classes with more changes/defects by up to 48%. In many cases the top 10% of predicted classes can contain up to twice as many changes or defects.
This is currently submitted to a Software Engineering conference for peer review.
A replication package for our study is publicly available for download. In the replication package, we provide: (i) the scripts for the extraction process on a specific dataset, (ii) the datasets used in our experimentation, and (iii) the raw data for the experimented predictors.