A Simple Method for Labeling Hierarchical Document Clusters

M.F. Moura and S.O. Rezende (Brazil)

Keywords

cluster labeling, hierarchical clustering, document clustering, topic taxonomy, attribute selection, text mining

Abstract

One of the problems of automatic models in generating topic taxonomies is the process of creating the most significant word list that discriminates each document group. A proposal for doing this through labeling hierarchical document clusters is presented in this paper. The goal of this method is to aid the construction of topic taxonomies. It aims to be language independent and to produce discrim inative labels for each group, without replications of the terms along the hierarchy branches. Moreover the method is cluster independent, that is, it can be applied to any hierarchical cluster result or any hierarchy, even those that are manually produced. In order to reach the goals the method is based on a formal definition of a topic taxonomy, also presented in the paper. The method was implemented and evaluated against three other literature methods through their recalling results in a particular information retrieval process. The proposed method outperformed the other three methods; additionally it has a linear complexity.

Important Links:



Go Back