Aleksey V. Burdakov, Uriy A. Grigorev, and Andrey D. Ploutenko
DBMS, SQL, MapReduce technology, Table join request, Query execution time estimate, Execution time comparison
Analysis of existing research work indicates that preference for implementation of queries to structured data is given to parallel DBMS. MapReduce (MR) is perceived as supplementary to DBMS technology. We attempt to figure out behavior pattern of parallel row-storage DBMS and MR system Hadoop on the example of Join task depending on the variation of the parameters that in other authors’ experiments do not vary or differ from ours. This article presents detailed process models for table joins in the parallel row-storage DBMS and MRsystem, as well as the results of detailed calculation experiments performed on these models. The models were set up for various scalability schemes for MR (number of nodes) and DMBS (data volume in a node) and fragmentation of the joined tables by the primary key. The following parameters were varied: queried data selectivity, number of sorted resulting records and cardinality of the grouping attribute. The modeling results showed that with the increase of the stored data volume parallel DBMS starts losing against MR-system at certain thresholds.
Important Links:
Go Back