Extracting Common Concepts from WordNet to Classify Documents

Y. Ino, T. Matsui, and H. Ohwada (Japan)

Keywords

common concepts, medium-frequency words, WordNet, SVM, Reuters-21578, macro-averaged F1.

Abstract

This paper explores a method for extracting common concepts from WordNet to classify documents. We fo cus on adding medium-frequency words to high-frequency words, because some medium-frequency words are para phrases of high-frequency words. The proposed method extracts generic concepts common to high-frequency and medium-frequency words, and chooses feature words from the extracted common concepts and the high-frequency words. The effects of this method are examined in sev eral experiments using Support Vector Machines (SVMs) and the Reuters-21578 standard test collection. The pro posed method is especially effective in raising the macro averaged F1 value, which increased to 59.0% from 54.7%.

Important Links:



Go Back