The Self Distributing Virtual Machine (SDVM) – Making Computer Clusters Heal Themselves

J. Haase, F. Eschmann, and K. Waldschmidt (Germany)

Keywords

Parallel Computing Systems, Cluster Computing, Parallel Processing, Fault Tolerance

Abstract

With the rapidly growing capability of computer architec tures their complexity grows as well. More and more paral lelism is necessary to provide the needed computing power. Moreover, systems must adapt to changing environments and cope with a breakdown of components. One approach is to incorporate organic features into computer systems. Organic computers [14] are characterized by self-x proper ties like self-configuring, self-optimizing, self-healing, and self-protecting. To make a cluster computer behave "organic", it should possess (among other important aspects) some kind of self-healing feature, which detects and deactivates defec tive components. If a node fails, all data stored in its local memory is lost. Therefore deactivation alone will usually not suffice, as in computer clusters data is often stored in a decentralized way. Concepts have to be developed to store data redundantly and to recover the data in case of a failure. In this paper the concept and features of the implemented prototype of the Self Distributing Virtual Machine (SDVM) is presented. Self-healing will be discussed as one aspect of the functionality of the SDVM.

Important Links:



Go Back