Byzantine Fault Tolerant Execution of Long-Running Distributed Applications

S.L. Pallemulle, I. Wehrman, and K.J. Goldman (USA)

Keywords

Fault tolerant systems, Byzantine agreement, autonomous distributed applications

Abstract

Long-running distributed applications that automate criti cal decision processes require Byzantine fault tolerance to ensure progress in spite of arbitrary failures. Existing repli cation protocols for data servers guarantee that externally requested operations execute correctly even if a bounded number of replicas fail arbitrarily. However, since these protocols only support passive state machine replication, they are insuf´Čücient to support continued correct execution of autonomous long-running distributed applications. Building on the Castro and Liskov Byzantine Fault Tolerance protocol for replicated state machines (CLBFT), we present a practical algorithm for Byzantine fault tolerant execution of autonomous distributed applications. The algorithm supports replicated clients that actively ex ecute application logic by issuing operation requests on replicated data servers as well as other replicated clients. Our work facilitates dynamic upgrades to replica groups, supports both synchronous and asynchronous operation re quests, and provides fault isolation between replica groups with respect to both correctness and performance. The algorithm scales well to large replica groups with only twice the latency and message complexity as compared to CLBFT, which supports only unreplicated clients.

Important Links:



Go Back