Checkpointing Fortran/MPI Programs for Computational Grids

C. Mishra (Canada), M.R. Mittal (USA), and S.K. Aggarwal (India)

Keywords

: Grid Computing, Fault Tolerance, Checkpointing, MPI, Compilers

Abstract

: In recent years, Grid Computing has emerged as a new and powerful paradigm in the field of high performance computing. However, the running time of a majority of important computational science applications is more than the mean-time-to-failure of these grids. It is important, therefore, to provide some measure of fault tolerance for compute intensive appli cations, for them to truly harness the power of compu tational grids. In this paper we describe the design of Merlin, a tool for instrumenting Fortran/MPI programs to make them fault tolerant. Merlin provides applica tion level check pointing in these programs, which is independent of the MPI library being used on hetero geneous nodes.

Important Links:



Go Back