A Multi-Level Biomedical Classification Model by using Aggregation and Abstraction Techniques

Keywords

Abstract

Data mining on biomedical data usually faces challenges of preserving privacy and finding associations among the attributes. Comprised of various and meticulous clinical measurements, the data to be data-mined often carry many attributes. When all these attributes are used in constructing a classification model, it may lead to a well-known problem in data mining called over-fitting which results in poor prediction accuracy. At the same time, the high resolution (or details) of the attributes may compromise the privacy of the patients’ identities. In this paper a multi-level classification model is proposed to analyse biomedical data with the attributes flexibly abstracted and aggregated at will of the user. The novel method contributes to biomedical research community in threefold: (1) increasing the prediction accuracy; (2) subsiding the privacy issue; and (3) enabling the relations between the attributes of the data to be further analysed by biomedical experts. The prototype of the model is tested via several experiments with some classical biomedical data obtained from UCI. A visualization tool is also programmed that shows both the significances of the attributes and their predictive powers. The experimental results indicate that by applying appropriate aggregation and abstraction techniques, decision trees can make to be more compact and more accurate.

Important Links:



Go Back