Probabilistic Analysis on Mesh Network Fault Tolerance

J. Chen and T. Wang (USA)


interconnection network, mesh, torus, network connectivity, fault tolerance, multicomputer system


Mesh network is one of the most important interconnection network topologies for large multicomputer systems. Mesh networks perform poorly in tolerating faults in the view of worst-case analysis: two faulty nodes will be sufficient to disconnect a mesh network. In this paper, we study fault tolerance of mesh networks under a more realistic model in which each network node has an independent failure probability. We first show that if the node failure prob ability is fixed, then the connectivity probability of mesh networks can be arbitrarily small when the network size is sufficiently large. Thus, it is practically important, in par ticular for multicomputer system manufacturers, to deter mine the upper bound for node failure probability when the probability of network connectivity and the network size are given. We propose the concept of k-submesh connec tivity and develop new techniques based on this concept to derive formal proofs for the lower bounds on the connec tivity probability for mesh networks. Our study shows that mesh networks of practical size can tolerate a large number of faulty nodes thus are reliable enough for multicomputer systems. For example, we formally prove that as long as the node failure probability is bounded by 0.1%, the mesh networks of up to a quarter million nodes remain connected with a probability 99%. Our results also hold without any changes for the torus networks

Important Links:

Go Back