Constructing a Flexible Internet-Scale Time-Sharing System using Deterministic Checkpointing

A.L. Beberg and V.S. Pande (USA)


Parallel and Distributed Algorithms, Cloud Computing, Computational Models, Task Scheduling.


Distributed systems and clustering have both grown into mainstream phenomena, now used throughout academia and industry. Despite advances in a wide variety of interconnects, including home Internet connections at speeds and latencies only available in a data center several years ago, the scheduling and use of these systems is still firmly rooted in the era of batch processing. Batch processing provides a basic scheduler and simple programming model for many types of computation, but lacks the flexibility and efficient resource utilization of even the most rudimentary of time-sharing systems. With a large number of use cases now constrained by these limitations, time-sharing concepts must once again come to the rescue. Based on experience from the, Folding@home, and Storage@home systems, it can be demonstrated that the Internet has advanced and can now meet the higher requirements for time-sharing. This paper will explore those requirements and the potential benefits of moving beyond batch processing and into time-sharing for Internet-scale computations, and lay out a method for deterministic checkpointing required to implement such a system.

Important Links:

Go Back