Integrating Web Information with XML Concrete Views

T.-T. Dang-Ngoc, H. Kou, and G. Gardarin (France)


Web, Semantic Integration, XML, Mediator


To cope with the difficulties of Web information search, lots of technologies related to Web search engines have been proposed and have also seen very successful applications. Rather than yet another Web search engines with general purpose, this paper couples text mining and XML view caching techniques within Web meditation architecture and presents a prototype framework for topic-centric Web information search. Given a topic domain, domain-specific information is extracted from the Web documents belonging to the domain, then text-mining technologies are applied to discover the semantics contained in the Web information into a domain-specific common concept model defined using semantic Web languages. Finally an XML-based mediator allows the users to query the integrated Web information using XQuery. Once Web information is represented in the concept model with explicit semantic hierarchy understandable to the programs, user's queries against special fragments of Web documents can be carried out. One important part of out works aims at integrating XML view and cache techniques to manage Web information. Checksum technology is used to monitor the updates of Web page. One prototype is under construction centered on popular French sites of the finance domain.

