Web Page Classification using Rules and Heuristics

L. Karanikola, A. Katsaris, and I. Karali (Greece)

Keywords

Web page classification, heuristics, derivation, rules

Abstract

Today, the World Wide Web (WWW) provides us with the ability to have access to a vast amount of information. Web pages are organized in such a way that makes search and retrieval of specific information very difficult. In order to tackle the aforementioned problem, there is a need of a system that categorizes the content of a web page. Traditionally, web page classification was carried out manually by a group of experts. But the enourmous increase of the WWW makes this approach practically not applicable. The existence of a system that automatically categorizes web content seems to be the only way to tackle with the classification problem. In this paper, we study an approach concern ing the problem of web page classification. This approach applies a set of derivation rules in a subset of the content of the initial web page. This approach was implemented in the AutoCat system.

Important Links:



Go Back