Post-PASI training: Week 2

Syllabus

August 15th:

  • Holiday

Lab 3 (August 16th):

Lab3 – Slides

FD_2D_shared.cu

FD_2D_shared_ghost.cu

  • Using shared memory as cache
    • Implement 2D explicit heat transfer with shared memory
  • Comparison of each implementation: timings vs programming effort
  • References:
    • Kirk, D. and Hwu, W. Programming Massively Parallel Processors.(Ch. 4Ch. 5)

Class 4 (August 17th):

Lecture 4 – Slides

  • Control flow
    • Warp divergence
  • Memory coalescing
  • Latency hiding
  • Occupancy
  • Measuring effective performance

Lab 4 (August 18th):

Lab4 – Slides

lab4_files

AAt_tiled.cu

Class 5 (August 19th):

Lecture 5 – Slides

  • Further optimization techniques:
    • Data prefetching
    • Instruction optimization
    • Loop unrolling
  • Thread and block heuristics
  • Example: optimizing a parallel reduction