GPU computing and programming
by Felipe A. Cruz
Nagasaki Advanced Computing Center
A major new trend in computing has already started. Consider that until recently all computing system on the TOP500 supercomputer list  were based on multi-core CPUs. However, systems based on graphic processing units (GPUs) have started to compete for the top-ten performance places of the TOP500 list. Furthermore, a GPU-based system has also shown to be an excellent low-cost computing platform, and obtained the ACM Gordon Bell prize in 2009 .
In the last few years, GPUs transitioned from graphics-only processing to become a general purpose parallel computing architecture. Today, it is possible to use GPUs on a PC or a computing cluster for high-performance scientific computing applications. GPUs are gaining so much importance due to three reasons: they are fast, they are cheap, and they use less power . Of course, GPU systems also have limitations, which we need to be aware of.
This course will cover the concepts of GPU computing and GPU programming. The course is organized in three 1-hour sessions, and one hands-on laboratory. The topics of the course are:
- Session one: GPU computing. Overview of the GPU architecture and programming models.
- Session two: Basics of GPU programming. How to write simple GPU computing programs.
- Session three: Programming GPUs to achieve high performance. Performance considerations, measurements, and optimizations.
- Laboratory: practical experience.
The first two sessions of the course will cover the basics of GPU computing and programming, and little GPU computing experience is required. The third session is more advanced and programming experience is recommended. The hands-on laboratory will complement the course sessions by adding practical experience. In the laboratory, participants will learn how to build GPU programs, starting from a simple application and gradually increasing the complexity of the applications.
 Top 500 supercomputing sites. http://www.top500.org
 T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori, and M. Taiji. 42 tflops hierarchical n-body simulations on gpus with applications in both astrophysics and turbulence. In SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1–12, New York, NY, USA, 2009. ACM.
 These features are measured in floating point operations per second (FLOPS), performance-per-dollar, and performance-per-watt, respectively.