Distributed MPI Deadlock Detection in Distributed Memory Systems

W. Haque and B. Ollenberger (Canada)

Keywords

Parallel Algorithms and Architectures, Parallel Program ming, Distributed Memory, MPI, Deadlock Detection

Abstract

The message passing interface (MPI) is a commonly used application programming interface for the development of portable parallel programs. It is easy, however, to create MPI programs that are prone to deadlock. It is desirable to be able to detect these deadlocks in running programs. It is further desirable to perform this deadlock detection in a distributed manner, without assuming the existence of shared memory for communication. A distributed deadlock detector has been developed that can find deadlocks with a very low overhead and minimal additional communication required among nodes. The detector makes use of the MPI profiling layer, allowing it to be added to a program at link time, requiring no change or recompilation of the user's code. The detector has also been tested on widely varying MPI implementations, demonstrating its portability.

Important Links:



Go Back