Multi-Class Classification using Covariance among Binary Classifiers and its Application to the Analysis of Tumor Microarrays

Li-San Wang and Yuk Yee Leung

Keywords

Machine learning, multi-class classification , error-correcting-output-coding

Abstract

Many microarray datasets include samples from more than two types or conditions and call for models capable of multi-class classification. Dietterich and Bakiri introduced a generic meta-classifier approach called the Error-Correcting-Output-Coding (ECOC). ECOC first generates ideal codes by each binary-class sub-problems. Then it generates an “output code” as the vector of outcomes for the sample to be classified. A “decoding” step finds the class that produces the most similar code using Hamming distance. Since the binary classification sub-problems have substantial overlaps, the coordinates in the output code are correlated but this is not leveraged in ECOC or later revisions. The new MCAB (Multi-class classification using Covariance Among Binary Classifiers) algorithm uses the covariance matrix from the training dataset in decoding which captures the correlations among the classes. We compared MCAB with the best variant of ECOC (Escelera et al.) available in the latest (2010) ECOC library by external 10-fold cross validation on three published multi-class benchmark cancer gene expression microarray datasets. We found that MCAB generally outperforms ECOC in overall accuracy and class-specific precision and recall values regardless of what binary classifiers was being used. MCAB using support vector machine with recursive feature elimination (SVM-RFE) as the binary-class classifier had the best performance. These results suggest that MCAB is robust and accurate for classifying multi-class microarray datasets, and can be readily used for other types of data.

Important Links:



Go Back