Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

M. Dorojevets and D. Strukov (USA)


Multithreading, Latency Tolerance, Thread Migration


In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithreading, in the proposed scalable NUMA shared memory architecture. The performance evaluation results for the conjugate gradient NAS benchmark demonstrate that the proposed fine-grain thread migration combined with data prefetching can be effectively used to reduce memory latency and switch traffic in NUMA shared memory multiprocessors with a large non-uniformity memory access ratio.

