Javascript must be enabled to continue!

A CUDA fast multipole method with highly efficient M2L far field evaluation

Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation has [Formula: see text] computational complexity. The fast multipole method (FMM) can reduce runtime and complexity to [Formula: see text] for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with [Formula: see text] potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.

SAGE Publications

Bartosz Kohnke Carsten Kutzner Andreas Beckmann Gert Lube Ivo Kabadshow Holger Dachsel Helmut Grubmüller

The International Journal of High Performance Computing Applications

2020

Title: A CUDA fast multipole method with highly efficient M2L far field evaluation

Description:

Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications.

Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs).

However, the naïve pairwise summation has [Formula: see text] computational complexity.

The fast multipole method (FMM) can reduce runtime and complexity to [Formula: see text] for any specified precision.

Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with [Formula: see text] potential that are found, e.

in biomolecular simulations.

The algorithm involves several operators to exchange information in an octree data structure.

We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance.

We propose, implement and benchmark three different M2L parallelization approaches.

Approach (1) utilizes Unified Memory to minimize programming and porting efforts.

It achieves decent speedups for only little implementation work.

Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies.

The presorted list-based approach (3) fits periodic boundary conditions particularly well.

It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications.

The result is a compute-bound implementation, i.

performance is limited by arithmetic operations rather than by memory accesses.

The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.

Back

Finite Element Method (FEM1) is pervasively used in most of 3D product design analysis, in which Computer Aided Design (CAD) models need to be converted in to mesh models first and...

Linear ion traps in mass spectrometry

Abstract I. Introduction 000 II. Linear Multipoles 000 A. Multipole Fields 000 1. Multipole Potentials 000 2. Ion Motion in 2D Multipole Fields 000...

Fast Fourier Transforms in Electromagnetics

This Chapter review the fast Fourier transform (FFT) technique and its application to computational electromagnetics, especially to the fast solver algorithms including the Conjuga...

Harnessing CUDA Dynamic Parallelism for the Solution of Sparse Linear Systems

We leverage CUDA dynamic parallelism to reduce execution time while significantly reducing energy consumption of the Conjugate Gradient (CG) method for the iterative solution of sp...

CUDA Implementation of Histogram Stretching Function for Improving X-ray Image

This paper presents a method to improve the contrast of digital X-ray image using CUDA program on a GPU. The histogram is commonly used to get the statistical distribution of the c...

An error-controlled fast multipole method

We present a two-stage error estimation scheme for the fast multipole method (FMM). This scheme can be applied to any particle system. It incorporates homogeneous as well as inhomo...

Corrected Article: “An error-controlled fast multipole method” [J. Chem. Phys. 131, 244102 (2009)]

We present a two-stage error estimation scheme for the fast multipole method (FMM). This scheme can be applied to any particle system. It incorporates homogeneous as well as inhomo...

"Best Tradition": CREATE, JCSEE and the Program Evaluation Standards

Background: Evaluation “is a task in the best tradition of the most abstract theoretical science as well as the most practical applied science” (Scriven, 1968, p .9). The Program E...

Email:
Password:

Email:

A CUDA fast multipole method with highly efficient M2L far field evaluation

Related Results