Simple Rule Extraction for Classification of a Genetic Disease

R. Hewett (USA)


Machine learning, induction, data mining algorithms, biomedical informatics


This paper investigates a simple rule induction approach to analysis of genome data. In particular, we apply an inductive system, SORCER to classify clinical phenotypes of a genetic collagenous disorder, Osteogenesis imperfecta, using a data set of point mutations in COLIA1 gene. SORCER uses second-order decision tables as representations in an inductive algorithm to extract rules from a given data set. We describe SORCER’s inductive approach and mechanisms that are important to the analysis. Experimental results show that on the average, over ten 10-fold cross validations, SORCER obtained an error estimate of 19.1 %, compared to 37.3 % obtained from the decision tree learner, C4.5. The paper describes a particular application of SORCER but is also intended to provide a machine learning approach that might be useful for biomedical informatics.

Important Links:

Go Back