Javascript must be enabled to continue!

The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions

Modern graphics processing units (GPUs) have been at the leading edge of increasing chip-level parallelism over the last ten years, and the CUDA programming model has recently allowed us to exploit its power across many computational domains. Within them, dense linear algebra algorithms emerge like a natural fit for CUDA and the GPU because they are usually inherently parallel and can naturally be expressed as a blocked computation. In this paper, we extensively analyze the GPU programming and performance of one of the fundamental building blocks in numerical lineal algebra algorithms: The Matrix-Matrix Multiply. Different programming approaches and optimization techniques have already been published in the literature, which we review and analyze to pursue further optimizations and unveil the potential of some hardware resources when programming the GPU under CUDA. Experimental results are shown on a GeForce 8800 GTX and a Tesla C870 GPU with a performance peak of 43 GFLOPS.

IOS Press

Cecilia José María García José Manuel Ujaldón Manuel

Advances in Parallel Computing

2025

Title: The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions

Description:

Within them, dense linear algebra algorithms emerge like a natural fit for CUDA and the GPU because they are usually inherently parallel and can naturally be expressed as a blocked computation.

In this paper, we extensively analyze the GPU programming and performance of one of the fundamental building blocks in numerical lineal algebra algorithms: The Matrix-Matrix Multiply.

Different programming approaches and optimization techniques have already been published in the literature, which we review and analyze to pursue further optimizations and unveil the potential of some hardware resources when programming the GPU under CUDA.

Experimental results are shown on a GeForce 8800 GTX and a Tesla C870 GPU with a performance peak of 43 GFLOPS.

Back

AbstractAutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discovery. Our Vina-GPU method leverages t...

Vina-GPU 2.0：further accelerating AutoDock Vina and its derivatives with GPUs

Modern drug discovery typically faces large virtual screens from huge compound databases where multiple docking tools are involved for meeting various real scenes or improving the ...

GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool

Abstract Motivation Accurate and efficient predictions of protein structures play an important role in understanding their funct...

Enabling Real-Time High-Resolution Flood Forecasting for the Entire State of Berlin Through RIM2D’s Multi-GPU Processing

Abstract. Urban areas are increasingly experiencing more frequent and intense pluvial flooding due to the combined effects of climate change and rapid urbanization—a trend expected...

Unlocking the Power of Parallel Computing: GPU technologies for Ocean Forecasting

Abstract. Operational ocean forecasting systems are complex engines that must execute ocean models with high performance to provide timely products and datasets. Significant comput...

Accelerated hydrologic modeling: ParFlow GPU implementation

<p>&#160; ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase pro...

Parallel garment drape simulation of triangular mesh using GPU programming

PurposeThe purpose of this paper is to determine the possibility of implementing parallel processing feature of graphic processor unit (GPU) in garment drape simulation.Design/meth...

Matrix Subgridding and Its Effects in Dual Porosity Simulators

Abstract Naturally fractured reservoirs are found throughout the world and contain significant amounts of oil reserves. The so-called dual porosity model is one o...

Email:
Password:

Email:

The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions

Related Results