An Efficient Failure Detector for Sparsely Connected Networks

M. Hutle (Austria)

Keywords

Fault Tolerant Distributed Systems, Unreliable Failure De tectors, Partitionable Networks, Sparse Networks

Abstract

We present an implementation of an eventually perfect fail ure detector for sparsely connected, partitionable networks, where each process has only a bounded number of neigh bors. Processes and links may fail by crashing. Regard ing synchrony, our algorithm only needs to know an upper bound on the jitter of the communication between direct neighbors. No a-priori knowledge about the number of pro cesses in the system is required. The algorithm uses heartbeats to determine whether a process is in the same partition. By reducing the fre quency of forwards by distance, information about nearer processes is more accurate than about farther ones, and the message size becomes constant. Since this property can be guaranteed independently of the number of processes in the system, our failure detector is very efficient in terms of communication complexity.

Important Links:



Go Back