Better Software Defect Prediction using Equalized Learning with Machine Learners

K. Kaminsky and G.D. Boetticher (USA)


Defect prediction, equalized learning, machine learning, genetic programs, software engineering, software quality


Many software organizations frequently do not allocate enough resources for software quality. As a consequence, an automated process for predicting software defects is an important research issue to the software engineering community. Researchers seek to build accurate and reliable predictors based upon legacy data. This is a very difficult task when one considers that software quality defect data is an "implicitly data-starved" domain due to the skewed distribution of the data. The severe lack of data problem is compounded by the fact that the interesting data (software modules with moderate to high defect rates) occurs on a relatively infrequent basis. One potential solution seeks to "balance" the data in order to compensate for the skewness. This is accomplished by replicating those instances of interest in order to emphasize their contribution to the modeling process. To assess the feasibility of this technique, a series of Genetic Programming experiments are conducted using multiple data sets from various NASA-based repositories of defect data. The results demonstrate the feasibility of balancing the data.

Important Links:

Go Back