High Accuracy Back-Retreat Diffusion-Fuzzy Clustering of Breast Cancer Data for the Detection of Malignancy

Ashutosh Patri, Abhijit Nayak, and Anup Anurag


DifFUZZY, Wisconsin Breast Cancer Data, Fuzzy C-Means, Breast Cancer, Fuzzy Clustering


A novel fuzzy clustering method is proposed here for separating the breast cancer data, which operates with reasonable accuracy, allows flexibility in dataset & is modestly time consuming. This method can be applied to any type of cancer data set with some initial labels to obtain high accuracy result in the classification of unlabeled samples. Further, the curse of dimensionality is not an issue for the proposed scheme as it can be applied to data having any number of dimensions or attributes. The DifFUZZY unsupervised clustering algorithm is applied at the initial stage, giving an accuracy of 96.28% over Wisconsin Breast Cancer Dataset (WBCD); the result is further improved to 98.14% by using the proposed Back-Retreat algorithm. The formed clusters are estimated using three internal cluster validation indices and the performance of the method is evaluated using receiver operating characteristic (ROC) curves. The clustering algorithm is compared with Fuzzy C-Means (FCM) algorithm and the results are compared with different classifiers and clustering techniques.

Important Links:

Go Back