Taking Advantage of Unlabeled Data with the Ordered Classification Algorithm

T. Solorio and O. Fuentes (Mexico)

Keywords

classification, unlabeled data, ensembles

Abstract

We introduce a new method for improving poor performance of classifiers due to a small training set. The Ordered Classification algorithm presented here incrementally increases the training set by adding unlabeled examples. The algorithm selects these unlabeled examples accordingly to the confidence level of the predictions made by an ensemble of classifiers. The use of this confidence level measurement, which was inspired by the Query By Committee approach within the Active Learning setting, ensures that the algorithm incorporates the examples which are more likely to have the right classification label assigned by the ensemble. Experimental results show that this algorithm effectively takes advantage of the unlabeled data yielding an error reduction of up to 78\%. Giving that a very common scenario in classification problems is the lack of a large enough training set, this algorithm provides a practical solution.

Important Links:



Go Back