A Comparative Study Among Algorithms for Frequent Pattern Generation in Data Mining

M.R. Islam, S.M. Khan, S.S.K. Robin, and M. Asad-uz-zaman (Bangladesh)

Keywords

Data mining, Association rules, Frequent Pattern, Apriori algorithm, FP-tree algorithm, PD algorithm.

Abstract

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is also called a method of "knowledge presentation" where visualization and knowledge representation techniques are used to present the mined knowledge to the user. Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. The pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass making it more efficient to mine all frequent patterns in a large dataset. This algorithm avoids the costly process of candidate set generation and saves a great amount of counting time to evaluate support with reduced datasets. In this paper, existing frequent pattern generation algorithms are explored, their comparisons are discussed, which show that the PD algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree. Further, PD is also more scalable than the Apriori and FP-tree.

Important Links:



Go Back