Post-PASI training: Week 1

Syllabus

Class 1 (August 8th):

Lecture 1 – slides

Understand the need for multi-core in applications
Manycore architecture:
- GPU vs CPU chip design
- Data parallelism
- Concepts behind a CUDA-friendly algorithm
Basic CUDA:
- C-like language
- Threads and thread hierarchy
- Launching a CUDA kernel

Lab 1 (August 9th):

Lab1 – slides

matrixMultiplication.cu

Familiarize with CUDA and nvcc compiler
Device query
Launch a simple vector add
Implement a matrix matrix multiplication

References:
- Kirk, D. and Hwu, W. Programming Massively Parallel Processors. (Ch. 1, Ch. 2, Ch. 3)
- CUDA C Programming Guide. Version 4. Ch. 1-2

Class 2 (August 10th):

Lecture 2 – slides

Programming model: mapping the discretized model to the GPU threads
Warps and warp scheduling
Multilevel memory hierarchy
- Shared, global, registers, textures, constant, texture memories
- Sizes and latency
Fundamentals of the finite difference method

Lab 2 (August 11th):

Lab2 – Slides

FD_2D_global.cu

FD_2D_texture_pad.cu

Implementation 2D explicit heat transfer with global memory
Implementation 2D explicit heat transfer with texture memory

References:
- Kirk, D. and Hwu, W. Programming Massively Parallel Processors (Ch. 4)
- Micikevicius P. 3D Finite Difference Computations on GPUs using CUDA
- Sanders, J. And Kandrot E. CUDA by Example. (Ch. 7)

Class 3 (August 12th):

Lecture 3 – Slides

Shared memory in detail
Tiling
Bank conflicts
Race conditions and atomic operations

References:
- Kirk, D. and Hwu, W. Programming Massively Parallel Processors. (Ch. 4, Ch. 5)
- Micikevicius P. 3D Finite Difference Computations on GPUs using CUDA