Javascript must be enabled to continue!
Research on the Application and Performance Optimization of GPU Parallel Computing in Concrete Temperature Control Simulation
View through CrossRef
With the development of engineering technology, engineering has higher requirements for the accuracy and the scale of simulation calculation. The computational efficiency of traditional se-rial program can not meet the requirements of engineering。Therefore, reducing the calcula-tion time of temperature control simulation program has important engineering significance for real-time simulation of temperature field and stress field, and then adopting more reasona-ble temperature control and crack prevention measures. GPU parallel computing is introduced into the temperature control simulation program of massive concrete to solve this problem and the optimization is carried out. Considering factors such as GPU clock rate, number of cores, parallel overhead and Parallel Region, The improved GPU parallel algorithm analysis indicator formula is proposed. It makes up for the shortcomings of traditional formula that focus only on time. According to this formula, when there are enough threads, the parallel effect is limited by the size of the parallel domain, and when the parallel domain is large enough, the efficiency is limited by the parallel overhead and the clock rate. This paper studies the optimal Kernel execu-tion configuration. Shared Memory is utilized to improve memory access efficiency by 155%. After solving the problem of bank conflicts, an accelerate rate of 437.5x was realized in the sub-routine of the matrix transpose of the solver. The asynchronous parallel of data access and logi-cal operation is realized on GPU by using CUDA Stream , which can overlap part of the data access time. On the basis of GPU parallelism, asynchronous parallelism can double the compu-ting efficiency. Compared with the serial program, the accelerate rate of inner product matrix multiplication of the GPU asynchronous parallel program is 61.42x. This study further proposed a theoretical formula of data access overlap rate to guide the selection of the number of CUDA streams to achieve the optimal computing conditions. The GPU parallel program compiled and optimized by CUDA Fortran platform can effectively improve the computational efficiency of the simulation program for concrete temperature control, and better serve for engineering computing.
Title: Research on the Application and Performance Optimization of GPU Parallel Computing in Concrete Temperature Control Simulation
Description:
With the development of engineering technology, engineering has higher requirements for the accuracy and the scale of simulation calculation.
The computational efficiency of traditional se-rial program can not meet the requirements of engineering。Therefore, reducing the calcula-tion time of temperature control simulation program has important engineering significance for real-time simulation of temperature field and stress field, and then adopting more reasona-ble temperature control and crack prevention measures.
GPU parallel computing is introduced into the temperature control simulation program of massive concrete to solve this problem and the optimization is carried out.
Considering factors such as GPU clock rate, number of cores, parallel overhead and Parallel Region, The improved GPU parallel algorithm analysis indicator formula is proposed.
It makes up for the shortcomings of traditional formula that focus only on time.
According to this formula, when there are enough threads, the parallel effect is limited by the size of the parallel domain, and when the parallel domain is large enough, the efficiency is limited by the parallel overhead and the clock rate.
This paper studies the optimal Kernel execu-tion configuration.
Shared Memory is utilized to improve memory access efficiency by 155%.
After solving the problem of bank conflicts, an accelerate rate of 437.
5x was realized in the sub-routine of the matrix transpose of the solver.
The asynchronous parallel of data access and logi-cal operation is realized on GPU by using CUDA Stream , which can overlap part of the data access time.
On the basis of GPU parallelism, asynchronous parallelism can double the compu-ting efficiency.
Compared with the serial program, the accelerate rate of inner product matrix multiplication of the GPU asynchronous parallel program is 61.
42x.
This study further proposed a theoretical formula of data access overlap rate to guide the selection of the number of CUDA streams to achieve the optimal computing conditions.
The GPU parallel program compiled and optimized by CUDA Fortran platform can effectively improve the computational efficiency of the simulation program for concrete temperature control, and better serve for engineering computing.
Related Results
Study on the effect of seawater on making and curing of unreinforced concrete applications
Study on the effect of seawater on making and curing of unreinforced concrete applications
Concrete, an essential component of worldwide infrastructure, depends significantly on fresh water for its manufacturing, contributing to freshwater scarcity in many regions. As co...
Parallel Monte Carlo Tree Search on GPU
Parallel Monte Carlo Tree Search on GPU
Monte Carlo Tree Search (MCTS) is a method for making optimal decisions in artificial intelligence (AI) problems, typically move planning in combinatorial games. It combines the ge...
Heat transfer in supercritical fluids: computational approaches & studies
Heat transfer in supercritical fluids: computational approaches & studies
(English) This thesis delves into investigating the complexities of heat transfer in supercritical fluids through the application of advanced theoretical and computational methodol...
Parallel metaheuristics on GPU
Parallel metaheuristics on GPU
Métaheuristiques parallèles sur GPU
Les problèmes d'optimisation issus du monde réel sont souvent complexes et NP-difficiles. Leur modélisation est en constante évo...
Vina-GPU 2.1: towards further optimizing docking speed and precision of AutoDock Vina and its derivatives
Vina-GPU 2.1: towards further optimizing docking speed and precision of AutoDock Vina and its derivatives
Abstract
AutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discov...
Nature Inspired Parallel Computing
Nature Inspired Parallel Computing
Parallel computing is more and more important for science and engineering, but it is not used so widely as serial computing. People are used to serial computing and feel parallel c...
Parallel garment drape simulation of triangular mesh using GPU programming
Parallel garment drape simulation of triangular mesh using GPU programming
PurposeThe purpose of this paper is to determine the possibility of implementing parallel processing feature of graphic processor unit (GPU) in garment drape simulation.Design/meth...

