A Fault-Tolerant Communication Scheme for Regular Cluster Networks

K. Day, B. Arafeh, and A. Touzene (Oman)

Keywords

Cluster Systems, Network Management, Interconnection Networks, Fault-Tolerant Routing

Abstract

Large cluster systems with thousands of nodes have become a cost-effective alternative to traditional supercomputers. In these systems cluster nodes are interconnected using high-degree switches. Regular direct network topologies including tori (k-ary n-cubes) and meshes are among adapted choices for interconnecting these high-degree switches. We propose a general fault tolerant routing scheme applicable for regular direct interconnection networks satisfying some interconnection conditions. The scheme is based on the availability of efficiently identifiable disjoint routes between network nodes. The proposed scheme is first presented in general terms for any interconnection topology satisfying the presented connectivity conditions. The scheme is then illustrated on two examples of interconnection topologies namely the binary hypercube and the k-ary n-cube.

Important Links:



Go Back