Recent publications by the research group
- “Fast N-body simulations on GPUs”, Rio Yokota, L A Barba. Submitted 2011.
This paper presents the hybrid treecode-FMM with auto-tuning capability.
- “Petascale turbulence simulation using a highly parallel fast multipole method”, Rio Yokota, L A Barba, Tetsu Narumi, Kenji Yasuoka. Submitted 2011. Preprint: arXiv:1106.5273
This paper presents the application of a periodic FMM algorithm to the simulation of homogeneous turbulent flow in a cube, demonstrating scalable computations with many GPUs. The calculations were carried out in the TSUBAME 2.0 system at the Tokyo Institute of Technology, thanks to guest access provided by the Grand Challenge Program of TSUBAME. A total of 2048 GPUs were used, corresponding to half of the complete system, to achieve a sustained performance of 0.5 Petaflop/s.
- “A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems”, Rio Yokota, L A Barba. Accepted in Int. J. High-perf. Comput. 2011. Preprint: arXiv:1106.2176
Describes scaling of the FMM on a large CPU system with O(100k) cores. Several advanced techniques were utilized including a hierarchical communication method for efficient intra-node/inter-node all-to-all exchanges, and single-node tuning via OpenMP and SIMD vectorization.