Focusing on Sites in the Web

Y. Asano, H. Imai, M. Toyoda, and M. Kitsuregawa (Japan)

Keywords

Web Technologies, Information Retrieval, Web-links, Web Site, Degree Distribution

Abstract

In recent years, several information retrieval methods using information about the Web-links are developed, such as HITS and Trawling. In order to analyze the Web-links dividing into links inside each Web site (local-links) and links between Web sites (global-links) for the information retrieval, it is required that a proper model of the Web site, a phrase used ambiguously in daily life. In the existing researches, a Web server is used as a model of the Web site. This idea works relatively well in case that a Web site corresponds to a server such as public Web sites, but works poorly in case that multiple Web sites correspond to a server such as private Web sites on rental Web servers. In this paper, we propose a new model of the Web site, directory-based site to handle typical private sites, and a method to identify them using information about the URL and the Web-links. The method distinguishes about 2/3 of over 110 thousands servers approximately by using URL data of jp-domain constructed by Toyoda and Kitsuregawa. Interestingly, we find the difference between the global-links and the local-links in the degree distribution, an important part of information retrieval from the Web.

Important Links:



Go Back