M. Oguchi and M. Kitsuregawa (Japan)
Cluster Computing, Data Mining, Storage Area Network, Runtime Data Declustering
Personal computer/workstation (PC/WS) clusters have come to be studied intensively in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system in the next generation, such as large server sites and/or high perfor mance parallel computers, because of their good scalabil ity and cost performance ratio. In the viewpoint of applica tions, data intensive applications including data mining and ad-hoc query processing in databases are considered very important for high performance computing, in addition to the conventional scientific calculation. Thus, investigating the feasibility of such applications on a PC cluster is mean ingful. In this paper, a PC cluster connected with Storage Area Network(SAN) is built and evaluated. In the case of SAN cluster, each node can access all shared disks directly without using LAN; thus, SAN clusters achieve much bet ter performance than LAN clusters for disk access oper ations. However, if a lot of nodes access the same shared disk simultaneously, application performance degrades due to I/O-bottleneck. A runtime data declustering method, in which data is declustered to several other disks dynamically during the execution of application, is proposed to resolve this problem. Parallel data mining is implemented and evaluated on the SAN-connected PC cluster. This application requires iterative scans of a shared disk, which degrade execution performance severely due to I/O-bottleneck. The runtime data declustering method is applied and characteristics of the system such as I/O and network operations are evalu ated in detail. According to the results of experiments, the proposed method prevents performance degradation caused by shared disk bottleneck in SAN clusters.
Important Links:
Go Back