Second Level Parallelism using SIMD Accelerators on Heterogeneous MapReduce Clusters

Masoud Ebrahimi and Farshad Khunjush


MapReduce, Heterogeneous Computing, Distributed Computing


The MapReduce programming model introduced by Google is one of the most successful efforts to cope with the growth of demand for processing large amount of data in large-scale clusters. Although MapReduce programming paradigm has never been easier or more scalable, distributed platforms have changed drastically in recent years. These days, most of the data centers and clusters are equipped with new processing elements such as Multi-Core CPUs, SIMD accelerators particularly, and FPGAs. Unfortunately, current MapReduce frameworks are incapable in harnessing the computational power of these available nodes. In this paper, we propose a new design philosophy to implement MapReduce frameworks in order to comply with above-mentioned multi-level parallelism that exists in modern data centers. We designed a novel architecture to leverage all types of SIMD architectures in distributed platforms. Experiments and evaluations show our novel implementation not only complies with the characteristics of MapReduce applications but also outperforms Hadoop in terms of speedup and throughput.

Important Links:

Go Back