Post-PASI training: Week 1
Syllabus
Class 1 (August 8th):
- Understand the need for multi-core in applications
- Manycore architecture:
- GPU vs CPU chip design
- Data parallelism
- Concepts behind a CUDA-friendly algorithm
- Basic CUDA:
- C-like language
- Threads and thread hierarchy
- Launching a CUDA kernel
Lab 1 (August 9th):
- Familiarize with CUDA and nvcc compiler
- Device query
- Launch a simple vector add
- Implement a matrix matrix multiplication
- References:
- Kirk, D. and Hwu, W. Programming Massively Parallel Processors. (Ch. 1, Ch. 2, Ch. 3)
- CUDA C Programming Guide. Version 4. Ch. 1-2
Class 2 (August 10th):
- Programming model: mapping the discretized model to the GPU threads
- Warps and warp scheduling
- Multilevel memory hierarchy
- Shared, global, registers, textures, constant, texture memories
- Sizes and latency
- Fundamentals of the finite difference method
Lab 2 (August 11th):
- Implementation 2D explicit heat transfer with global memory
- Implementation 2D explicit heat transfer with texture memory
- References:
- Kirk, D. and Hwu, W. Programming Massively Parallel Processors (Ch. 4)
- Micikevicius P. 3D Finite Difference Computations on GPUs using CUDA
- Sanders, J. And Kandrot E. CUDA by Example. (Ch. 7)
Class 3 (August 12th):
- Shared memory in detail
- Tiling
- Bank conflicts
- Race conditions and atomic operations
- References:
- Kirk, D. and Hwu, W. Programming Massively Parallel Processors. (Ch. 4, Ch. 5)
- Micikevicius P. 3D Finite Difference Computations on GPUs using CUDA