Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Harnessing CUDA Dynamic Parallelism for the Solution of Sparse Linear Systems

View through CrossRef
We leverage CUDA dynamic parallelism to reduce execution time while significantly reducing energy consumption of the Conjugate Gradient (CG) method for the iterative solution of sparse linear systems on graphics processing units (GPUs). Our new implementation of this solver is launched from the CPU in the form of a single “parent” CUDA kernel, which invokes other “child” CUDA kernels. The CPU can then continue with other work while the execution of the solver proceeds asynchronously on the GPU, or block until the execution is completed. Our experiments on a server equipped with an Intel Core i7-3770K CPU and an NVIDIA “Kepler” K20c GPU illustrate the benefits of the new CG solver.
Title: Harnessing CUDA Dynamic Parallelism for the Solution of Sparse Linear Systems
Description:
We leverage CUDA dynamic parallelism to reduce execution time while significantly reducing energy consumption of the Conjugate Gradient (CG) method for the iterative solution of sparse linear systems on graphics processing units (GPUs).
Our new implementation of this solver is launched from the CPU in the form of a single “parent” CUDA kernel, which invokes other “child” CUDA kernels.
The CPU can then continue with other work while the execution of the solver proceeds asynchronously on the GPU, or block until the execution is completed.
Our experiments on a server equipped with an Intel Core i7-3770K CPU and an NVIDIA “Kepler” K20c GPU illustrate the benefits of the new CG solver.

Related Results

ПОЛІТИЧНИЙ ПАРАЛЕЛІЗМ: УКРАЇНСЬКИЙ КОНТЕКСТ
ПОЛІТИЧНИЙ ПАРАЛЕЛІЗМ: УКРАЇНСЬКИЙ КОНТЕКСТ
<p><em>The study uses an empirical method which involves free finding of right material to study the origin and genesis of political parallelism as an integral characte...
Procedure for Western blot v1
Procedure for Western blot v1
Goal: This document has the objective of standardizing the protocol for Western blot. This technique allows the detection of specific proteins separated on polyacrylamide gel and t...
A CUDA fast multipole method with highly efficient M2L far field evaluation
A CUDA fast multipole method with highly efficient M2L far field evaluation
Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiqui...
The Concept of Parallelism in the Lyrical Poems of Besarani (1641- 1702)
The Concept of Parallelism in the Lyrical Poems of Besarani (1641- 1702)
The following study is about “The Concept of Parallelism in the Lyrical Poems of Besarani (1641- 1702)”, who wrote his lyrics in the Kurdish dialect Gorani (Hawrami). Parallelism (...
On the technique of artistic parallelism in Dargin quatrains
On the technique of artistic parallelism in Dargin quatrains
The subject of this research is the quatrains &ndash; a variety of lyrical songs, one of widespread and popular poetic genres in the folklore of the peoples of Dagestan. The ob...
Love the one you’re with: replicate viral adaptations converge on the same phenotypic change
Love the one you’re with: replicate viral adaptations converge on the same phenotypic change
Parallelism is important because it reveals how inherently stochastic adaptation is. Even as we come to better understand evolutionary forces, stochasticity limits how well we can ...
High-level compiler analysis for OpenMP
High-level compiler analysis for OpenMP
Nowadays, applications from dissimilar domains, such as high-performance computing and high-integrity systems, require levels of performance that can only be achieved by means of s...
Porting of the DBCSR Library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi Systems
Porting of the DBCSR Library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi Systems
Multiplication of two sparse matrices is a key operation in the simulation of the electronic structure of systems containing thousands of atoms and electrons. The highly optimized ...

Back to Top