A Hybrid Approach to Statistical and Semantical Analysis of Web Documents

T. Gottron and R. Schneider (Germany)


Terminological ontology, template detection, IR


This paper describes a new approach to improve the analysis and categorization of web documents using sta tistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on inter net resources.

Important Links:

Go Back