Finding Similar Web Sites by using Link Information and User's Access History

S. Kurihara, T. Hirotsu, T. Takada, O. Akashi, and T. Sugawara (Japan)

Keywords

Internet, Mirror site, Access history, Link information

Abstract

We are studying techniques that allow even ordinary end users to make efficient use of the Internet. We previously proposed an algorithm for determining the degree of similarity between web sites by using link information to find web sites that are mirrors of each other and ones that are not mirrors but have similar content and can be used as substitutes for each other. As a result of verifying the basic effectiveness of that algorithm, we found that when trying to find similar web sites to site-A, in addition to ones found to have almost 100% similarity to site-A, there existed ones that were thoroughly adequate for use as substitutes for site-A, even though they had a low degree of similarity of 50% or less. Therefore, for practical use of that algorithm, it is essential to be able to automatically judge whether web sites that can be inferred to have some kind of similarity are actually mirror sites or similar sites that can be used as substitutes. To solve this problem, in this paper, we propose and evaluate the basic effectiveness of an automatic judgment methodology, and we focus on its operation and propose a methodology for effectively finding candidates for a similar site by using a user’s Internet access history.

Important Links:



Go Back