S. Wang and X. Yun (PRC)
distributed systems; fault-tolerant; failure detector; push;push
It is widely recognized that the detection of process failures is a crucial problem for fault-tolerance distributed systems, and failure detectors are used in a wide variety of settings, such as network communication protocols, computer cluster management, group membership protocols, etc. Unfortunately, it is difficult to implement a reliable failure detector in asynchronous distributed systems. In this paper, a new failure detector based on the combination of PULL and PUSH approach is proposed to adapt to the situation of message losses. Through theoretical analysis, the new algorithm is proved to be feasible, and experiments show that this approach can efficiently reduce wrong suspicions caused by message losses, and increase detection time very little.
Important Links:
Go Back