Smart learning: A search-based approach to rank change and defect prone classes

Department of Informatics, University of Zurich, Zurich, Switzerland
SERG, Delft University of Technology, Delft, Netherlands
DOI
10.7287/peerj.preprints.1160v1
Subject Areas
Data Mining and Machine Learning, Software Engineering
Keywords
defect prediction, code change, genetic algorithm
Copyright
© 2015 Alexandru et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Alexandru CV, Panichella A, Panichella S, Bacchelli A, Gall HC. 2015. Smart learning: A search-based approach to rank change and defect prone classes. PeerJ PrePrints 3:e1160v1

Abstract

Research has yielded approaches for predicting future changes and defects in software artifacts, based on historical information, helping developers in effectively allocating their (limited) resources. Developers are unlikely able to focus on all predicted software artifacts, hence the ordering of predictions is important for choosing the right artifacts to concentrate on. We propose using a Genetic Algorithm (GA) for tailoring prediction models to prioritize classes with more changes/defects. We evaluate the approach on two models, regression tree and linear regression, predicting changes/defects between multiple releases of eight open source projects. Our results show that regression models calibrated by GA significantly outperform their traditional counterparts, improving the ranking of classes with more changes/defects by up to 48%. In many cases the top 10% of predicted classes can contain up to twice as many changes or defects.

Author Comment

This is currently submitted to a Software Engineering conference for peer review.

Supplemental Information

Replication package

A replication package for our study is publicly available for download. In the replication package, we provide: (i) the scripts for the extraction process on a specific dataset, (ii) the datasets used in our experimentation, and (iii) the raw data for the experimented predictors.

DOI: 10.7287/peerj.preprints.1160v1/supp-1