Summarizing Gene-Expression-based Classifiers by Meta-Mining Comprehensible Relational Patterns

F. Železný, O. Štěpánková (Czech Republic), J. Tolar (USA), and N. Lavrač (Slovenia)


Relational Data Mining, Gene Expression Microarrays, Gene Ontology


We propose a methodology for predictive classification from gene expression data, able to combine the robust ness of high-dimensional statistical classification methods with the comprehensibility and interpretability of simple logic-based models. We first construct a robust classifier combining contributions of a large number of gene expres sion values, and then (meta)-mine the classifier for com pact summarizations of subgroups among genes associated with a given class therein. The subgroups are described by means of relational logic features extracted from publicly available gene ontology information. The curse of dimen sionality pertaining to the gene expression based classifica tion problem due to the large number of attributes (genes) is turned into an advantage in the secondary, meta-mining task as here the original attributes become learning exam ples. We cross-validate the proposed method on two classi fication problems: (i) acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia (AML), (ii) seven subclasses of ALL.

Important Links:

Go Back