Optimising Data Movement Rates for Parallel Processing Applications on Graphics Processors

O. Harrison and J. Waldron (Ireland)


Parallel Processing, High Performance Computing, Graph ics Processors, Transfer Rates.


Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s mem ory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data depen dency, high data to instruction ratio and predictable mem ory access patterns. One largely ignored, yet key, bottle neck for this type of processing on GPUs is both down load and readback transfer performance to and from the graphics card. Existing tools provide great developer as sistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of conīŦguration states which affect both the download and readback performance.

Important Links:

Go Back