Load Balancing Parallel Loops on Message-passing Systems

R.L. Cariño and I. Banicescu (USA)


Dynamic load balancing, distributed algorithm, heterogeneous computing


In large scientific applications with parallel loops that execute on message-passing systems, the strategy of partitioning data among processors and assigning each processor to perform the computations on its own data generally results in load imbalance due to nonuniform data distribution, algorithmic variances, heterogeneity of processors, or unpredictable phenomena such as network latency and operating system interference. Runtime data redistribution is necessary to balance processor loads. Recently, loop scheduling algorithms based on a probabilistic analysis of parallel loop iterates with variable running times have been developed. Assuming a centralized workqueue approach, these algo rithms execute the iterates in variable size chunks, where the sizes are adaptively determined during runtime such that the chunks complete before the optimal time with a high probability. This study uses experiments with these loop scheduling algorithms on a computationally intensive parallel application of profiling a quadrature routine. The experimental results strongly suggest the benefits of using a distributed workqueue approach in conjunction with these loop scheduling algorithms for load balancing parallel loops on message-passing systems. The aggregate processor time with straightforward parallelization is reduced by as much as 75% (three-fourths) when this approach is utilized. This result highlights the potential for a successful integration of this approach into other applications, especially those with parallel loops characterized by irregular behavior that cannot be predicted before runtime.

Important Links:

Go Back