On the Choice of Checkpoint Interval Based on Time Series Analysis

J. Hong, S. Kim, and Y. Cho (Korea)


Checkpoint and Recovery, Checkpoint Interval, Ex pected Execution Time


Checkpointing is a common mechanism to reduce the execution time of a process in the presence of failures. Checkpointing mechanism saves the process state to stable storage periodically so that after failure the pro cess can be restored to the state of its most recent checkpoint. Taking a checkpoint too infrequently may spend too much time to reprocess its execution after failures. On the other hand, taking checkpoint too fre quently will increase the checkpoint overhead and will affect the total execution time of a process. Check point overhead is closely related to the memory usage of a process and checkpoint interval. In this paper, we derive the equations for the expected total execu tion time of a process with and without checkpoint and we also present an efficient checkpoint algorithm which uses an adaptive time series analysis to adjust checkpoint interval dynamically. The proposed algo rithm uses memory usage history to predict the future checkpoint cost and to determine the checkpoint inter val dynamically.

Important Links:

Go Back