Experiments with Hierarchical Text Classification

M. Granitzer and P. Auer (Austria)

Keywords

Machine Learning, Supervised Learning, Hierarchical Text Classification, Boosting, Ranking Performance

Abstract

This paper applies Boosting to hierarchical text classifica tion where the hierarchical structure is given as directed acyclic graph and compares the results to Support Vec tor Machines. Hierarchical classification is performed top down and in each node a flat classifier decides if a docu ment should be further propagated or not. As flat classi fiers BoosTexter, CentroidBooster and Support Vector Ma chines are used, were CentroidBooster is an AdaBoost.MH based alternative similar to BoosTexter. Experiments on the Reuters Corpus Volume 1 and the OHSUMED data set show that the F1-measure increases if the hierarchal struc ture of a data set is taken into account. Regarding time complexity we show, that depending on the structure of a hierarchy, learning and classification time can be reduced. Besides these hard classification approaches we also inves tigate the ranking performance of hierarchical classifiers. Ranking, which can be achieved by providing a meaningful score for each classification decision, is important in most practical settings. We investigate an approach based on us ing a sigmoid function for calculating a meaningful score, where parameter estimation is based on error bounds from computational learning theory.

Important Links:



Go Back