Selecting Features by Term Importance for Text Categorization

S.-S. Kang (Korea)

Keywords

Text categorization, term weighting, feature selection,document representation, term importance

Abstract

A great deal of work has been done over the years in an attempt to improve the performance of the text categorization system. Statistical or probabilistic models, such as Nave Bayes and support vector machine(SVM), and feature selection techniques are explored and obtained a good result. However, the performance depends crucially on the choice of effective term weighting systems. Unfortunately, many of categorization methods are lacking in effectiveness, and more refined category representation methods are required. We applied a new technique of term weighting method for the representation of input document and the category learning; that is, the text words are ranked in accordance with how well they are able to discriminate the documents of a collection from each other. Experimental results are given showing the effectiveness of the term weighting method. We found that our term weighting system got a significant improvement over the base-line system of tf idf weighting scheme.

Important Links:



Go Back