Comparison of Table Join Execution Time for Parallel DBMS and MapReduce

Aleksey V. Burdakov; Uriy A. Grigorev; Andrey D. Ploutenko

doi:10.2316/P.2014.811-006

Comparison of Table Join Execution Time for Parallel DBMS and MapReduce

Aleksey V. Burdakov, Uriy A. Grigorev, and Andrey D. Ploutenko

Keywords

DBMS, SQL, MapReduce technology, Table join request, Query execution time estimate, Execution time comparison

Abstract

Analysis of existing research work indicates that preference for implementation of queries to structured data is given to parallel DBMS. MapReduce (MR) is perceived as supplementary to DBMS technology. We attempt to figure out behavior pattern of parallel row-storage DBMS and MR system Hadoop on the example of Join task depending on the variation of the parameters that in other authors’ experiments do not vary or differ from ours. This article presents detailed process models for table joins in the parallel row-storage DBMS and MRsystem, as well as the results of detailed calculation experiments performed on these models. The models were set up for various scalability schemes for MR (number of nodes) and DMBS (data volume in a node) and fragmentation of the joined tables by the primary key. The following parameters were varied: queried data selectivity, number of sorted resulting records and cardinality of the grouping attribute. The modeling results showed that with the increase of the stored data volume parallel DBMS starts losing against MR-system at certain thresholds.

Important Links:

Go Back