Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Performance evaluation of NEMO4.2 with Paraver

View through CrossRef
The last release of the NEMO v4.2 ocean model includes many modifications that have a significant impact on the model performance. The goal of the work is to assess NEMO performance obtained due to the optimizations carried out during the last four years within the IMMERSE and IS-ENES3 projects. The computational analysis was conducted using Extrae and Paraver which are the performance tools developed at the Barcelona Supercomputing Center.Extrae provides a trace rich of information regarding the usage of the computational resources made by the model, these include measurements related to the memory subsystem, instruction cycles, vectorization level, communications among parallel processes and many others. Paraver provides a visual inspection of the trace and an insight of the computational features of the NEMO model; this allows to define easily a detailed quantitative evaluation of performance issues.The performance analysis carried out on NEMO is based on the evaluation of different metrics each one related to a different aspect of the computational resource. The main aspects analyzed are the execution time, the communication time, the number of instructions per cycle and the cache hit rate. In addition, we combined these metrics to evaluate the parallel scalability and the global efficiency of the model when the number of core increases.Our investigation was focused on evaluating the impact of the last HPC changes and namely: the use of collective neighbors communication pattern, available in MPI3, for the halo exchange; the use of the loop fusion technique to improve the data locality; the impact of the extended halo; the impact of the MPI+OpenMP version of NEMO obtained by means of PSyclone which is a DSL compiler developed at the STFC.The analysis has been carried out on MareNostrum4 supercomputer at BSC with the NEMO source code available @commit 1d9676ff (a.k.a 68-summer-body-2022 branch) and using the Bench Test configured for ORCA12-like resolution. The evaluation of the MPI+OpenMP was carried out using NEMO 4.0 in ORCA025 configuration kindly provided by STFC as outcome of the PSyclone DSL compiler.The use of the extended halo with 2 points provides a significant improvement on the performance with a factor of 13% due to a reduction of the number of exchanged messages.The use of MPI3 communications does not introduce many benefits: a lower number of MPI point-to-point exchanges is compensated by the higher message size of MPI3 neighbors collective communications.The use of loop fusion does not introduce many benefits: few routines with loop fusion and the little improvement registered in cache misses is compensated by the increase in the number of instructions due to the fusion of the loops.The analysis of the traces on the hybrid MPI/OpenMP NEMO version processed by Psyclone doesn’t highlight many benefits when the number of OpenMP threads increases due to the part of the code not parallelized.Finally, one of the most important HPC development, the tiling, has not been analyzed yet, since the last modifications have been merged recently and the resulting code is still under revision.
Title: Performance evaluation of NEMO4.2 with Paraver
Description:
The last release of the NEMO v4.
2 ocean model includes many modifications that have a significant impact on the model performance.
The goal of the work is to assess NEMO performance obtained due to the optimizations carried out during the last four years within the IMMERSE and IS-ENES3 projects.
The computational analysis was conducted using Extrae and Paraver which are the performance tools developed at the Barcelona Supercomputing Center.
Extrae provides a trace rich of information regarding the usage of the computational resources made by the model, these include measurements related to the memory subsystem, instruction cycles, vectorization level, communications among parallel processes and many others.
Paraver provides a visual inspection of the trace and an insight of the computational features of the NEMO model; this allows to define easily a detailed quantitative evaluation of performance issues.
The performance analysis carried out on NEMO is based on the evaluation of different metrics each one related to a different aspect of the computational resource.
The main aspects analyzed are the execution time, the communication time, the number of instructions per cycle and the cache hit rate.
In addition, we combined these metrics to evaluate the parallel scalability and the global efficiency of the model when the number of core increases.
Our investigation was focused on evaluating the impact of the last HPC changes and namely: the use of collective neighbors communication pattern, available in MPI3, for the halo exchange; the use of the loop fusion technique to improve the data locality; the impact of the extended halo; the impact of the MPI+OpenMP version of NEMO obtained by means of PSyclone which is a DSL compiler developed at the STFC.
The analysis has been carried out on MareNostrum4 supercomputer at BSC with the NEMO source code available @commit 1d9676ff (a.
k.
a 68-summer-body-2022 branch) and using the Bench Test configured for ORCA12-like resolution.
The evaluation of the MPI+OpenMP was carried out using NEMO 4.
0 in ORCA025 configuration kindly provided by STFC as outcome of the PSyclone DSL compiler.
The use of the extended halo with 2 points provides a significant improvement on the performance with a factor of 13% due to a reduction of the number of exchanged messages.
The use of MPI3 communications does not introduce many benefits: a lower number of MPI point-to-point exchanges is compensated by the higher message size of MPI3 neighbors collective communications.
The use of loop fusion does not introduce many benefits: few routines with loop fusion and the little improvement registered in cache misses is compensated by the increase in the number of instructions due to the fusion of the loops.
The analysis of the traces on the hybrid MPI/OpenMP NEMO version processed by Psyclone doesn’t highlight many benefits when the number of OpenMP threads increases due to the part of the code not parallelized.
Finally, one of the most important HPC development, the tiling, has not been analyzed yet, since the last modifications have been merged recently and the resulting code is still under revision.

Related Results

Performance Prediction and Evaluation Tools
Performance Prediction and Evaluation Tools
La predicció és un concepte de recerca molt interessant. No es només predir el resultat futur, sinó que també cal predir el resultat conegut, a vegades anomenat validació. <br/&...
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Abstract The rapid growth of open access publishing (OAP) has significantly improved the accessibility and dissemination of scientific knowledge. However, this expansion has also c...
Integrating Artificial Intelligence into Teacher Performance Evaluation: Evidence from Undergraduate Institutions in Anhui Province
Integrating Artificial Intelligence into Teacher Performance Evaluation: Evidence from Undergraduate Institutions in Anhui Province
Purpose: China's educational evaluation reforms and rapid developments in AI have posed challenges to traditional teacher performance evaluation systems at the undergraduate level....
Performative Microforests
Performative Microforests
The design of office buildings can substantially improve the building, social, and ecological performance of office building projects. However, existing research on improving the p...
Measurable Progress? Teaching Artsworkers to Assess and Articulate the Impact of Their Work
Measurable Progress? Teaching Artsworkers to Assess and Articulate the Impact of Their Work
The National Cultural Policy Discussion Paper—drafted to assist the Australian Government in developing the first national Cultural Policy since Creative Nation nearly two decades ...
"Best Tradition": CREATE, JCSEE and the Program Evaluation Standards
"Best Tradition": CREATE, JCSEE and the Program Evaluation Standards
Background: Evaluation “is a task in the best tradition of the most abstract theoretical science as well as the most practical applied science” (Scriven, 1968, p .9). The Program E...
The Future of Evaluation in Society
The Future of Evaluation in Society
The impetus for this volume lives in a rich and vibrant past. It is organized to honor one of the founders and most prolific contributors to the profession and transdiscipline of e...

Back to Top