Head-Snapshot: A Fast High-Frequency-Word Retrieval Algorithm

Y. Wang and N. Cercone (Canada)


Document, Document collection, High-frequency-word,Patricia, Cache Patricia


High-frequency-words occurring in a document col lection contain major semantic information for this collec tion. We propose a data structure (cache Patricia) and an algorithm (the Head-Snapshot Algorithm) that can quickly retrieve these high-frequency-words. Cache Patricia is an improvement of Patricia by in troducing a cache mechanism into Patricia. The Head Snapshot Algorithm operates on a cache Patricia to quickly retrieve high-frequency-words. The Head-Snapshot Algo rithm runs in time only relevant to the maximum number of high-frequency-words the user wants to obtain. We also report the experimental result which shows that the Head Snapshot Algorithm is much faster than the Traverse Re trieval Algorithm.

