Quantizing Time Series for Efficient Subsequence Matching

I.F. Vega-López (Mexico) and B. Moon (USA)

Keywords

Time Series, Subsequence Match, Similarity Search.

Abstract

Indexing time series data is an interesting problem that has attracted much interest in the research community for the last decade. Traditional indexing methods organize the data space using different metrics. However, searching high-dimensional spaces using a hierarchical index is not always efficient because a large portion of the index might need to be accessed during search. We have revisited this problem of matching subsequences in light of new techno logical advances. In particular, we have paid close attention to the increasing ratio of CPU to disk performance. We rec ognize this problem is heavily bound by IO operations and address this issue in a twofold manner. First, we propose the use of quantization to generate small and homogeneous representations of time series. Quantization provides tight upperand lower-bounds on the measure of similarity to a query sequence. This allows us to drastically reduce the number of false alarms during search. Second, we organize the quantized representation of data in a linear array that can be efficiently read from disk. By reducing the number of false alarms and by sequentially reading the index, we are able to significantly reduce the IO cost of query pro cessing. In consequence, we improve the overall search performance by up to a factor of 3 with respect to state of the art techniques for subsequence matching. 503-800

Important Links:



Go Back