Design and Implementation of Document Classification using Keyword Frequency and TFIDF

J.-H. Kim, S.C. Hwang, S. Park, and K.-T. Kim (Korea)

Keywords

Data Mining, Keyword Frequency, TFIDF, Conceptual Knowledge

Abstract

An algorithm for classifying documents through a keyword extractor is introduced in this study. The system consists of a document collector, indexer and a document classifier. The conceptual knowledge of the category to be classified is required for classification. The web document collector collects web documents from web directories of internet portal sites and the title, hyperlink and text data are abstracted from these documents to be saved in files. The conceptual knowledge is constructed by applying a method that combines the keyword term-frequency method and TFIDF algorithm through the indexer. Finally, the document classifier applies the classification algorithm and the conceptual knowledge on the documents to be classified for classifying the documents.

Important Links:



Go Back