Iterative methods for sparse linear systems on GPU

by Dr Nathan Bell

NVIDIA Research

This course will introduce students to iterative methods for solving sparse linear systems and how they are efficiently implemented on the GPU. Students will gain intuition for the performance characteristics of sparse matrix operations and how to choose among sparse matrix representations. We will show how generic parallel algorithms or “parallel primitives” can be used to manipulate sparse, unstructured data sets and build complex algorithms. A final lecture will summarize potential optimization techniques and methods for preconditioning iterative solvers to accelerate convergence. A tentative course schedule follows below.

Lesson 1: Sparse matrix representations and iterative solvers
Lesson 2: Sparse matrix-vector multiplication (SpMV)
Laboratory 1: Implementing an iterative solver on the GPU
Lesson 3: Leveraging parallel primitives with Thrust
Laboratory 2: Putting it all together
Lesson 4 (30 mins): Future directions: optimization techniques and preconditioning

Laboratory exercises will use the Thrust and CUSP libraries for CUDA.

Student Requirements

The interested student should have:

Understanding of standard (dense) linear algebra concepts
Ability to write simple CUDA kernels (e.g. SAXPY)
For this course, familiarity with C++ is beneficial but not necessary.

Supplementary Materials

Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors, Nathan Bell and Michael Garland, in “Proc. Supercomputing ’09”, November 2009