A Histogram Utilizing the Cluster Information

J. Roh, H.K. Park, K.W. Min, and M.H. Kim (Korea)


Multi-Dimensional Histogram, Selectivity Estimation, Cluster, Skew


Histograms are summary structures of large datasets, which are mainly used for selectivity estimation during query optimization. Selectivity estimation is the fast approximation of query result size. In this paper, we focus on multi-dimensional histograms, especially bi dimensional histograms. At the time of selectivity estimation, buckets partially overlapping with a query return approximated results assuming that all objects within them are uniformly distributed. Since, however, the objects within the region of a query are not likely to be uniformly distributed, skews (or clusters) in buckets commonly degrades the accuracy of a histogram. Our aim is to utilize clusters in buckets to enhance the accuracy of selectivity estimation. We propose a new method that associates cluster information with a bucket. We present new schemes which define clusters formally and algorithms which find such clusters efficiently as well. We show through experiments that our proposed method provides better performance than other existing well known methods.

Important Links:

Go Back