Performance Evaluation and Characterization of Scalable Data Mining Algorithms

Y. Liu, J. Pisharath, W.-K. Liao, G. Memik, A. Choudhary, and P. Dubey (USA)


Performance evaluation, data mining, benchmark, parallel computing


Data mining has become one of the most essential tools in diverse fields. The increases in data sizes and algorithmic complexities require the computational power of chip to increase even further. In this paper, we present detailed characteristics from the hardware and software perspectives for a set of representative data mining programs. We first design MineBench, a benchmarking suite containing representative data mining applications from multiple categories including two classification, two association rule mining, and four clustering applications. We evaluate the MineBench applications on an 8-way Shared Memory Parallel (SMP) machine and analyze their important performance characteristics. During the evaluation, the input datasets and the number of processors used are varied to measure the scalability of the applications in our benchmark suite. We present the results based on characteristics such as scalability, I/O complexity, fraction of time spent in the OS mode, and communication/synchronization overheads. This information can aid designers of future systems as well as programmers of new data mining algorithms to achieve better system and algorithmic performance.

Important Links:

Go Back