Generating Hypergraph of Term Associations for Automatic Document Concept Clustering

I-J. Chiang (Taiwan), T.Y. Lin (USA), and J.Y.-J. Hsu (Taiwan)

Keywords

Document Clustering, Hypergraph, Association Rules, Concept, Connected Components, Decomposition

Abstract

This paper presents a novel approach to document cluster ing using hypergraph decomposition. Given a set of doc uments, the associations among frequently co-occurring terms in any of the documents define naturally a hyper graph, which can then be decomposed into connected com ponents at various levels. Each connected component rep resents a primitive concept in the collection. The docu ments can then be clustered based on the primitive con cepts. Experiments with three different data sets from web pages and medical literatures have shown that the pro posed unsupervised clustering approach performs signifi cantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). The results indicate that hypergraphs are a perfect model to capture association rules in text and is very useful for automatic document clustering.

Important Links:



Go Back