Experiments with Hierarchical Text Classification

M. Granitzer; P. Auer

Experiments with Hierarchical Text Classification

M. Granitzer and P. Auer (Austria)

Keywords

Machine Learning, Supervised Learning, Hierarchical Text Classiﬁcation, Boosting, Ranking Performance

Abstract

This paper applies Boosting to hierarchical text classiﬁca tion where the hierarchical structure is given as directed acyclic graph and compares the results to Support Vec tor Machines. Hierarchical classiﬁcation is performed top down and in each node a ﬂat classiﬁer decides if a docu ment should be further propagated or not. As ﬂat classi ﬁers BoosTexter, CentroidBooster and Support Vector Ma chines are used, were CentroidBooster is an AdaBoost.MH based alternative similar to BoosTexter. Experiments on the Reuters Corpus Volume 1 and the OHSUMED data set show that the F1-measure increases if the hierarchal struc ture of a data set is taken into account. Regarding time complexity we show, that depending on the structure of a hierarchy, learning and classiﬁcation time can be reduced. Besides these hard classiﬁcation approaches we also inves tigate the ranking performance of hierarchical classiﬁers. Ranking, which can be achieved by providing a meaningful score for each classiﬁcation decision, is important in most practical settings. We investigate an approach based on us ing a sigmoid function for calculating a meaningful score, where parameter estimation is based on error bounds from computational learning theory.

Important Links:

DOI:
From Proceeding (481) Artificial Intelligence and Soft Computing - 2005

Go Back