Partial and Total Recovery in ATLAS

M. Fairén and A. Vinacua (Spain)

Keywords

Distributed applications, Software development tools, Fault-tolerance, Process state recovery.

Abstract

Complex applications may benefit from spreading out their computational needs over the nodes of a network. How ever, in doing so, they become more prone to failure be cause of communications disruption or single-node fail ures. ATLAS, a framework supporting the development of such distributed applications with minimal programming effort, provides simple transaction-style mechanisms to re cover from such failures, or even a total crash. This article is an overview of the design criteria followed and the mech anisms implemented in ATLAS to do so.

Important Links:



Go Back