12 Steps to a Fast Multipole Method on GPUs

by Dr Rio Yokota

Boston University

Fast multipole methods are used for gravitational N-body problems

Why would I want to learn FMM?

The combination of algorithmic acceleration and hardware acceleration can have tremendous impact. The FMM is a fast algorithm for calculating matrix vector multiplications in O(N) time, and it runs very fast on GPUs. Its combination of high degree of parallelism and O(N) complexity make it an attractive solver for the Peta-scale and Exa-scale era. It has a wide range of applications, e.g. quantum mechanics, molecular dynamics, electrostatics, acoustics, structural mechanics, fluid mechanics, and astrophysics.

What are the prerequisites?

Basic understanding of linear algebra and calculus
Some experience with C or C++

The aim of this course

This course will provide a hands-on tutorial with simple exercises that will lead to a full FMM code that runs on GPUs. Each step is carefully planned so that there is no sudden jump in the difficulty. During the tutorial each attendee will have remote access to the GPU cluster in Nagasaki Japan. There will be Teaching Assistants walking around to assist you, so that no one is left behind.

The 12 steps

step 1 : direct N-body
step 2 : multipole expansion
step 3 : tree structure
step 4 : link lists, interaction lists
step 5 : treecode
step 6 : M2L translation
step 7 : single level FMM
step 8 : M2M,L2L translation
step 9 : multi-level FMM
step 10 : direct N-body on GPUs
step 11 : P2P on GPUs
step 12 : M2L on GPUs

Printable course description

PASI Yokota course

Supplementary material

Flow of FMM calculation — This diagram shows the flow of both the FMM and treecode algorithm, and introduces the various operations that take place.