Thorough Indexing of Images on the World Wide Web

N.C. Rowe (USA)


Intelligent information systems, AI in multimedia systems, images, captions, World Wide Web, indexing


The diversity of the World Wide Web requires intelligent automated tools to find useful information. We describe a Web "crawler" and caption filter MARIE-4 that searches the Web to find text likely to be image captions and its associated image objects. Rather than examining a few features like existing systems, it uses broad set of criteria including some novel ones to yield higher recall than competing systems, which generally focus on high precision. We tested these criteria in careful experiments that extracted 8140 caption candidates for 4585 representative images, and quantified for the first time the relative value of several kinds of clues for captions. The crawler is self improving in that it obtains from experience further statistics as positive and negative clues. We index the results found by the crawler and provide a user interface. We have done demonstration implementations of a Web search engine for all 667,573 publicly-accessible U.S. Navy Web images and all 301,178 U.S. Army Web images.

