SLA Provisioning by Utilizing Profit-Oriented Fault Tolerance

A. Keller, K. Voss, D. Battré, M. Hovestadt, and O. Kao (Germany)


Service Level Agreements, Risk Management, Resource Management, Grid Economy, Fault Tolerance


Service Level Agreements (SLAs) are mandatory for the commercial success of the Grid. However, resource out ages are common events and threat a successful SLA pro visioning. In order to prevent SLA violations of jobs af fected by a resource outage, fault tolerance mechanisms are essential. However, if various resources crash at the same time, not enough alternative resources might exist to restart all jobs affected. In this case a profit-oriented re source and fault tolerance management is of crucial impor tance for commercial Grid providers. This paper describes how a deadline-aware scheduling policy combined with ap plication transparent checkpointing and job migration over the Grid can increase the profit of a resource provider as well as the user’s confidence in the provider.

