A GPGPU PROGRAMMING FRAMEWORK BASED ON A SHARED-MEMORY MODEL

Kazuhiko Ohno, Dai Michiura, Masaki Matsumoto, Takahiro Sasaki, and Toshio Kondo

References

  1. [1] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, and T.J. Purcell, A survey of general-purpose computation on graphics hardware, Computer Graphics Forum, 26 (1), 2007, 80–113.
  2. [2] Gpgpu.org. http://www.gpgpu.org/.
  3. [3] CUDA Zone. http://developer.nvidia.com/category/zone/cuda-zone.
  4. [4] OpenCL. http://www.khronos.org/opencl/.
  5. [5] NVIDIA Corporation. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, 1.1 edition, 2009.
  6. [6] NVIDIA Corporation. NVIDIA CUDA C Programming Guide, 4.2 edition, April 2012.
  7. [7] NVIDIA Corporation. CUDA C Best Practices Guide, 4.1 edition, January 2012.
  8. [8] M. Baskaran, J. Ramanujam, and P. Sadayappan, AutomaticC-to-CUDA code generation for affine programs. In Compiler construction, volume 6011 of Lecture notes in computer science(Berlin/Heidelberg: Springer, 2010) 244–263.
  9. [9] S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Notations, 44, 2009, 101–110.
  10. [10] N. Sundaram, A. Raghunathan, and S.T. Chakradhar, A framework for efficient and scalable execution of domain- specific templates on GPUs, International Parallel and Distributed Processing Symposium, 0, 2009, 1–12.
  11. [11] Y. Yang, P. Xiang, J. Kong, and H. Zhou, A GPGPU compiler for memory optimization and parallelism management,SIGPLAN Notations, 45, 2010, 86–97.
  12. [12] S. Ueng, M. Lathara, S.S. Baghsorkhi, and W.W. Hwu, CUDA-Lite: reducing GPU programming complexity, in Languages and compilers for parallel computing (2008), 1–15.
  13. [13] NVIDIA Corporation, Thrust quick start guide, March 2012.
  14. [14] J. Protić, M. Tomaševć, and V. Milutinović, Distributed shared memory: concepts and systems, IEEE Parallel and Distributed Technology, 4 (2), 1996, 63–79.
  15. [15] S. Raina, Virtual shared memory: a survey of techniques andsystems. Technical report, University of Bristol, 1992.
  16. [16] I. Gelado et al., CUBA: an architecture for efficient CPU/co-processor data communication, Proc. 22nd Annual Int. Conf. on Supercomputing, ICS ’08, 2008, 299–308.
  17. [17] B. Dreier, M. Zahn, and T. Ungerer, The Rthreads distributed shared memory system, Proc. 3rd Int. Conf. on Massively Parallel Computing Systems, 1998.
  18. [18] I. Gray and N.C. Audsley, Exposing non-standard architectures to embedded software using compile-time virtualization, Proc. 2009 Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’09, 2009, 147–156.
  19. [19] Q. Hou, K. Zhou, and B. Guo, BSGP: bulk-synchronous GPUprogramming, in ACM SIGGRAPH 2008 papers, SIGGRAPH ’08, 2008, 19:1–19:12.
  20. [20] Himeno benchmark. http://accc.riken.jp/HPC_e/himenobmt_e.html.
  21. [21] NVIDIA Corporation. NVIDIA’s next generation CUDA compute architecture: Kepler GK110, 1.0 edition, 2012.

Important Links:

Go Back