Smart learning: A search-based approach to rank change and defect prone classes
- Published
- Accepted
- Subject Areas
- Data Mining and Machine Learning, Software Engineering
- Keywords
- defect prediction, code change, genetic algorithm
- Copyright
- © 2015 Alexandru et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Smart learning: A search-based approach to rank change and defect prone classes. PeerJ PrePrints 3:e1160v1 https://doi.org/10.7287/peerj.preprints.1160v1
Abstract
Research has yielded approaches for predicting future changes and defects in software artifacts, based on historical information, helping developers in effectively allocating their (limited) resources. Developers are unlikely able to focus on all predicted software artifacts, hence the ordering of predictions is important for choosing the right artifacts to concentrate on. We propose using a Genetic Algorithm (GA) for tailoring prediction models to prioritize classes with more changes/defects. We evaluate the approach on two models, regression tree and linear regression, predicting changes/defects between multiple releases of eight open source projects. Our results show that regression models calibrated by GA significantly outperform their traditional counterparts, improving the ranking of classes with more changes/defects by up to 48%. In many cases the top 10% of predicted classes can contain up to twice as many changes or defects.
Author Comment
This is currently submitted to a Software Engineering conference for peer review.
Supplemental Information
Replication package
A replication package for our study is publicly available for download. In the replication package, we provide: (i) the scripts for the extraction process on a specific dataset, (ii) the datasets used in our experimentation, and (iii) the raw data for the experimented predictors.