Hardware Support for Concurrent Execution of Loops Containing Loop-carried Data Dependences

C.D. Lima, K. Sano, and T. Nakamura (Japan)


Parallel Architecture, Loop-Level Parallelism, Thread Level Parallelism


This paper discusses the exploitation of loop-level parallelism in single chip microprocessors containing multiple processing elements. We propose an on-chip communication mechanism that utilizes distributed shift-register files, to allow the concurrent execution of loops containing loop-carried data dependences. In our experiments, we utilize the Shift Architecture [1] as the baseline architecture to implement our proposal and evaluate its performance. Our initial simulation results show that a considerable amount of loop overlap can be obtained by our approach, in comparison with those conventional approaches utilizing shared memory for on-chip inter-iteration communication.

