Javascript must be enabled to continue!

Quantitative Performance Analysis of BLAS Libraries on GPU Architectures

Basic Linear Algebra Subprograms (BLAS) are a set of linear algebra routines commonly used by machine learning applications and scientific computing. BLAS libraries with optimized implementations of BLAS routines offer high performance by exploiting parallel execution units in target computing systems. With massively large number of cores, graphics processing units (GPUs) exhibit high performance for computationally-heavy workloads. Recent BLAS libraries utilize parallel cores of GPU architectures efficiently by employing inherent data parallelism. In this study, we analyze GPU-targeted functions from two BLAS libraries, cuBLAS and MAGMA, and evaluate their performance on a single-GPU NVIDIA architecture by considering architectural features and limitations. We collect architectural performance metrics and explore resource utilization characteristics. Our work aims to help researchers and programmers to understand the performance behavior and GPU resource utilization of the BLAS routines implemented by the libraries.

Deu Muhendislik Fakultesi Fen ve Muhendislik

Işıl ÖZ

Deu Muhendislik Fakultesi Fen ve Muhendislik

2024

Title: Quantitative Performance Analysis of BLAS Libraries on GPU Architectures

Description:

Basic Linear Algebra Subprograms (BLAS) are a set of linear algebra routines commonly used by machine learning applications and scientific computing.

BLAS libraries with optimized implementations of BLAS routines offer high performance by exploiting parallel execution units in target computing systems.

With massively large number of cores, graphics processing units (GPUs) exhibit high performance for computationally-heavy workloads.

Recent BLAS libraries utilize parallel cores of GPU architectures efficiently by employing inherent data parallelism.

In this study, we analyze GPU-targeted functions from two BLAS libraries, cuBLAS and MAGMA, and evaluate their performance on a single-GPU NVIDIA architecture by considering architectural features and limitations.

We collect architectural performance metrics and explore resource utilization characteristics.

Our work aims to help researchers and programmers to understand the performance behavior and GPU resource utilization of the BLAS routines implemented by the libraries.

Back

BLAS is a fundamental building block of advanced linear algebra libraries and many modern scientific computing applications. GPU is known for its strong arithmetic computing capabi...

Vina-GPU 2.1: towards further optimizing docking speed and precision of AutoDock Vina and its derivatives

AbstractAutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discovery. Our Vina-GPU method leverages t...

SYCL-BLAS: Combining Expression Trees and Kernel Fusion on Heterogeneous Systems

The support for heterogenous platforms requires multiple specialised devices collaborate to execute an application. The SYCL standard publishes by Khronos, providing a C++ abstract...

Unlocking the Power of Parallel Computing: GPU technologies for Ocean Forecasting

Abstract. Operational ocean forecasting systems are complex engines that must execute ocean models with high performance to provide timely products and datasets. Significant comput...

Vina-GPU 2.0：further accelerating AutoDock Vina and its derivatives with GPUs

Modern drug discovery typically faces large virtual screens from huge compound databases where multiple docking tools are involved for meeting various real scenes or improving the ...

GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool

Abstract Motivation Accurate and efficient predictions of protein structures play an important role in understanding their funct...

Accelerated hydrologic modeling: ParFlow GPU implementation

<p>&#160; ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase pro...

Enabling Real-Time High-Resolution Flood Forecasting for the Entire State of Berlin Through RIM2D’s Multi-GPU Processing

Abstract. Urban areas are increasingly experiencing more frequent and intense pluvial flooding due to the combined effects of climate change and rapid urbanization—a trend expected...

Email:
Password:

Email:

Quantitative Performance Analysis of BLAS Libraries on GPU Architectures

Related Results