H. Chen, J. Decker, and N. Bierbaum (USA)
InfiniBand, 10GbE, TOE, Parallel I/O, Cluster, HPC
Large clustered computers provide low-cost compute cycles, and therefore have promoted the development of sophisticated parallel-programming algorithms based on the Message Passing Interface. Storage platforms, however, fail to keep pace with similar advances. This paper compares standard 4X InfiniBand (IB) to 10 Gigabit Ethernet (GbE) for use as a common storage infrastructure in addition to message passing. Considering IB’s native ability to accelerate protocol processing in hardware, the Ethernet hardware in this study provided similar acceleration using TCP Offload Engines. We evaluated their I/O performance using the IOZONE benchmark on the iSCSI-based TerraGRID parallel filesystem. Our evaluations show that 10GbE, with or without protocol-offload, offered better throughput and latency than IB to socket-based applications. Although protocol-offload in both 10GbE and IB demonstrated significant improvement in I/O performance, large amount of CPU are still being consumed to handle the associated data-copies and interrupts. The emerging RDMA technologies hold promises to remove the remaining CPU overhead. We plan to continue our study to research the applications of RDMA in parallel I/O.
Important Links:
Go Back