Supervised Learning with Unsupervised Output Separation

N. Japkowicz (Canada)


Machine Learning, Decision Trees, Combination of Classifiers, Clustering.


In supervised learning approaches, the output labels are imposed by the knowledge engineer who prepared the data. While knowing the labels of a data set is quite useful, in cases where data points belonging to very different data distributions are agglomerated in the same class, a learning algorithm can have difficulties modeling these classes accurately. In such cases, it should be useful to separate the main classes into a number of more homogeneous subclasses. This paper assumes that the above problem is quite common and describes a simple combination method that attempts to fix it. It then tests the approach on 5 domains taken from the UCI Repository. The results show that in three out of five cases, the approach has a positive effect, in one case, it breaks even and in the fifth case, it degrades the previously established performance.

Important Links:

Go Back