K. Kerdprasop, N. Kerdprasop (Thailand), and J. Sun (USA)
Clustering, Density biased sampling, Reservoir sampling
A reservoir-sampling algorithm is a simple random algorithm for drawing a sample of size n without replacement from a population of size N, N n. We adopt the algorithm to get the benefit from its advantage of efficient memory usage and extend it to deal with clustering large data of varying cluster sizes. Our proposed algorithm is a density biased sampling using only a single scan of the data. Thus, good efficiency can be expected. Moreover, our experimental results reveal its effectiveness on the subsequent clustering phase. Noise tolerance is additional characteristics of our method.
Important Links:
Go Back