Javascript must be enabled to continue!
A CUDA fast multipole method with highly efficient M2L far field evaluation
View through CrossRef
Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation has [Formula: see text] computational complexity. The fast multipole method (FMM) can reduce runtime and complexity to [Formula: see text] for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with [Formula: see text] potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.
Title: A CUDA fast multipole method with highly efficient M2L far field evaluation
Description:
Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications.
Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs).
However, the naïve pairwise summation has [Formula: see text] computational complexity.
The fast multipole method (FMM) can reduce runtime and complexity to [Formula: see text] for any specified precision.
Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with [Formula: see text] potential that are found, e.
g.
in biomolecular simulations.
The algorithm involves several operators to exchange information in an octree data structure.
We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance.
We propose, implement and benchmark three different M2L parallelization approaches.
Approach (1) utilizes Unified Memory to minimize programming and porting efforts.
It achieves decent speedups for only little implementation work.
Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies.
The presorted list-based approach (3) fits periodic boundary conditions particularly well.
It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications.
The result is a compute-bound implementation, i.
e.
performance is limited by arithmetic operations rather than by memory accesses.
The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.
Related Results
Parallel Strategy of FMBEM for 3D Elastostatics and its GPU Implementation Using CUDA
Parallel Strategy of FMBEM for 3D Elastostatics and its GPU Implementation Using CUDA
Finite Element Method (FEM1) is pervasively used in most of 3D product design analysis, in which Computer Aided Design (CAD) models need to be converted in to mesh models first and...
Linear ion traps in mass spectrometry
Linear ion traps in mass spectrometry
Abstract
I.
Introduction
000
II.
Linear Multipoles
000
A. Multipole Fields
000
1. Multipole Potentials
000
2. Ion Motion in 2D Multipole Fields
000...
Fast Fourier Transforms in Electromagnetics
Fast Fourier Transforms in Electromagnetics
This Chapter review the fast Fourier transform (FFT) technique and its application to computational electromagnetics, especially to the fast solver algorithms including the Conjuga...
Harnessing CUDA Dynamic Parallelism for the Solution of Sparse Linear Systems
Harnessing CUDA Dynamic Parallelism for the Solution of Sparse Linear Systems
We leverage CUDA dynamic parallelism to reduce execution time while significantly reducing energy consumption of the Conjugate Gradient (CG) method for the iterative solution of sp...
CUDA Implementation of Histogram Stretching Function for Improving X-ray Image
CUDA Implementation of Histogram Stretching Function for Improving X-ray Image
This paper presents a method to improve the contrast of digital X-ray image using CUDA program on a GPU. The histogram is commonly used to get the statistical distribution of the c...
An error-controlled fast multipole method
An error-controlled fast multipole method
We present a two-stage error estimation scheme for the fast multipole method (FMM). This scheme can be applied to any particle system. It incorporates homogeneous as well as inhomo...
Corrected Article: “An error-controlled fast multipole method” [J. Chem. Phys. 131, 244102 (2009)]
Corrected Article: “An error-controlled fast multipole method” [J. Chem. Phys. 131, 244102 (2009)]
We present a two-stage error estimation scheme for the fast multipole method (FMM). This scheme can be applied to any particle system. It incorporates homogeneous as well as inhomo...
"Best Tradition": CREATE, JCSEE and the Program Evaluation Standards
"Best Tradition": CREATE, JCSEE and the Program Evaluation Standards
Background: Evaluation “is a task in the best tradition of the most abstract theoretical science as well as the most practical applied science” (Scriven, 1968, p .9). The Program E...

