Using PC Clusters in Association Rule Mining

F. Kovács and S. Juhász (Hungary)

Keywords

data mining, association rule, distributed data mining

Abstract

One of the most important problems in data mining is association rule mining. It requires very large computation and I/O traffic capacity. For that reason there are several parallel mining algorithms, which can take advantage of the performance of the cluster systems, but these algorithms are optimised and developed on supercomputer platforms. Capacity of PC keeps the possibility to build cluster systems cheaper, but usage of them raises some issues about the optimisation of the distributed mining algorithms, especially the cost of the node to node communication and data distribution. Current association rule mining algorithms do not take the cost of initial data distribution into consideration therefore they do not make any data processing during the data distribution phase. The main part of the distributed association rule mining algorithms is based on Apriori algorithm therefore these algorithms suffer from the drawback of the Apriori algorithm. In this paper a new distributed association rule mining algorithm is introduced, which is based on modified version of the Apriori algorithm, which give a solution of the Apriori algorithm bottlenecks. Other advantage of this new distributed algorithm is that it can make significant data processing during the data distribution phase.

Important Links:



Go Back