Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement

View through CrossRef
BLAS is a fundamental building block of advanced linear algebra libraries and many modern scientific computing applications. GPU is known for its strong arithmetic computing capability, and highly suited for BLAS operations. However, porting code to GPUs often requires significant effort especially for large complex codes or legacy codes, even for BLAS heavy applications. While various tools exist to automatically offload BLAS to GPU, they are often impractical due to the high costs associated with mandatory data transfers. The advent of unified memory architectures in recent GPU designs, such as the NVIDIA Grace-Hopper, allows cache-coherent memory access across all types of memory for both CPU and GPU, potentially eliminating the bottlenecks faced in conventional architectures. This breakthrough paves the way for innovative application developments and porting strategies. In this paper, building on my preliminary work[1] demonstrating the possibility of performant automatic *gemm offload, I extend the framework to all level-3 BLAS operations, and present SCILIB-Accel[2], a novel tool for automatic BLAS offload . SCILIB-Accel leverages the cache-coherent NVLink C2C interconnect in Grace-Hopper and introduces a Device First-Use data movement policy. This policy, inspired by the OpenMP First-Touch approach in multi-socket CPU programming, minimizes CPU-GPU data transfers for typical scientific computing codes. Additionally, utilizing the dynamic binary instrumentation technique, the tool intercepts BLAS symbols directly from a CPU binary, requiring no code modifications or recompilation. SCILIB-Accel has been evaluated using multiple quantum physics codes on up to a few hundred GPU nodes, yielding promising speedups. Notably, for the LSMS method in the MuST suite, a 3x speedup was achieved on Grace-Hopper compared to Grace-Grace. SCILIB-Accel is the first tool to deliver practical, high-performance automatic BLAS offload for scientific applications.
Qeios Ltd
Title: Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement
Description:
BLAS is a fundamental building block of advanced linear algebra libraries and many modern scientific computing applications.
GPU is known for its strong arithmetic computing capability, and highly suited for BLAS operations.
However, porting code to GPUs often requires significant effort especially for large complex codes or legacy codes, even for BLAS heavy applications.
While various tools exist to automatically offload BLAS to GPU, they are often impractical due to the high costs associated with mandatory data transfers.
The advent of unified memory architectures in recent GPU designs, such as the NVIDIA Grace-Hopper, allows cache-coherent memory access across all types of memory for both CPU and GPU, potentially eliminating the bottlenecks faced in conventional architectures.
This breakthrough paves the way for innovative application developments and porting strategies.
In this paper, building on my preliminary work[1] demonstrating the possibility of performant automatic *gemm offload, I extend the framework to all level-3 BLAS operations, and present SCILIB-Accel[2], a novel tool for automatic BLAS offload .
SCILIB-Accel leverages the cache-coherent NVLink C2C interconnect in Grace-Hopper and introduces a Device First-Use data movement policy.
This policy, inspired by the OpenMP First-Touch approach in multi-socket CPU programming, minimizes CPU-GPU data transfers for typical scientific computing codes.
Additionally, utilizing the dynamic binary instrumentation technique, the tool intercepts BLAS symbols directly from a CPU binary, requiring no code modifications or recompilation.
SCILIB-Accel has been evaluated using multiple quantum physics codes on up to a few hundred GPU nodes, yielding promising speedups.
Notably, for the LSMS method in the MuST suite, a 3x speedup was achieved on Grace-Hopper compared to Grace-Grace.
SCILIB-Accel is the first tool to deliver practical, high-performance automatic BLAS offload for scientific applications.

Related Results

On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...
Crescimento de feijoeiro sob influência de carvão vegetal e esterco bovino
Crescimento de feijoeiro sob influência de carvão vegetal e esterco bovino
<p align="justify"><span style="color: #000000;"><span style="font-family: 'Times New Roman', serif;"><span><span lang="pt-BR">É indiscutível a import...
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
High-level compiler analysis for OpenMP
High-level compiler analysis for OpenMP
Nowadays, applications from dissimilar domains, such as high-performance computing and high-integrity systems, require levels of performance that can only be achieved by means of s...
Even Star Decomposition of Complete Bipartite Graphs
Even Star Decomposition of Complete Bipartite Graphs
<p><span lang="EN-US"><span style="font-family: 宋体; font-size: medium;">A decomposition (</span><span><span style="font-family: 宋体; font-size: medi...
Confidence Guides Spontaneous Cognitive Offloading
Confidence Guides Spontaneous Cognitive Offloading
Background: Cognitive offloading is the use of physical action to reduce the cognitive demands of a task. Everyday memory relies heavily on this practice, for example when we write...
The Annual Performance Review As A Positive Source For Employee Motivation?
The Annual Performance Review As A Positive Source For Employee Motivation?
<p class="MsoNormal" style="text-align: justify; margin: 0in 0.5in 0pt; mso-pagination: none;"><span style="color: black; font-size: 10pt; mso-themecolor: text1;"><s...

Back to Top