A Simple Method for Labeling Hierarchical Document Clusters

M.F. Moura and S.O. Rezende (Brazil)


cluster labeling, hierarchical clustering, document clustering, topic taxonomy, attribute selection, text mining


One of the problems of automatic models in generating topic taxonomies is the process of creating the most significant word list that discriminates each document group. A proposal for doing this through labeling hierarchical document clusters is presented in this paper. The goal of this method is to aid the construction of topic taxonomies. It aims to be language independent and to produce discrim inative labels for each group, without replications of the terms along the hierarchy branches. Moreover the method is cluster independent, that is, it can be applied to any hierarchical cluster result or any hierarchy, even those that are manually produced. In order to reach the goals the method is based on a formal definition of a topic taxonomy, also presented in the paper. The method was implemented and evaluated against three other literature methods through their recalling results in a particular information retrieval process. The proposed method outperformed the other three methods; additionally it has a linear complexity.

Important Links:

Go Back