Missing Data Imputation in Breast Cancer Prognosis

J.M. Jerez, I. Molina, J.L. Subirats, and L. Franco (Spain)


Missing data imputation, artificial neural networks, breast cancer, prognosis.


Missing data are often a problem present in real datasets and different imputation techniques are normally used to alleviate this problem. In this paper we analyze the perfor mance of two different data imputation methods in a task where the aim is to predict the probability of breast can cer relapse. Mean imputation and hot-deck methods were used to replace missing values found in a dataset containing 3679 records of patients. Artificial neural network models were trained with the standard dataset (containing no miss ing data but a restricted number of cases) and also with the data reconstructed by using the two imputation meth ods mentioned above. The results were analyzed in terms of the predictive accuracy and also in terms of the calibra tion of the results.

Important Links:

Go Back