Javascript must be enabled to continue!

Towards a Performance Engineering Workflow for OpenMP 4.0

Parallel programming and performance optimization of parallel programs are not simple tasks. Various HPC and OpenMP courses as well as literature serve as introduction to this topic. Assuming the role of HPC beginners we evaluate how far the knowledge acquired from introductory courses and literature can drive performance optimization of a conjugate gradient kernel. We concentrate on OpenMP programming for a large NUMA machine and investigate the new target construct in OpenMP 4.0 to offload to a Xeon Phi coprocessor. We evaluate the final results with a performance model. From these experiences we propose a performance engineering workflow for general use.

IOS Press

Schmidl Dirk Iwainsky Christian Terboven Christian Bischof Christian H. Müller Matthias S.

Advances in Parallel Computing

2025

Title: Towards a Performance Engineering Workflow for OpenMP 4.0

Description:

Parallel programming and performance optimization of parallel programs are not simple tasks.

Various HPC and OpenMP courses as well as literature serve as introduction to this topic.

Assuming the role of HPC beginners we evaluate how far the knowledge acquired from introductory courses and literature can drive performance optimization of a conjugate gradient kernel.

We concentrate on OpenMP programming for a large NUMA machine and investigate the new target construct in OpenMP 4.

0 to offload to a Xeon Phi coprocessor.

We evaluate the final results with a performance model.

From these experiences we propose a performance engineering workflow for general use.

Back

Related Results

High-level compiler analysis for OpenMP

Nowadays, applications from dissimilar domains, such as high-performance computing and high-integrity systems, require levels of performance that can only be achieved by means of s...

Towards a safe and efficient OpenMP

(English) The growing complexity of contemporary multi-core and heterogeneous architectures necessitates parallel programming models capable of efficiently leveraging the available...

Automatic Parallelization for Heterogeneous Embedded Systems

Parallélisation automatique pour systèmes hétérogènes embarqués L'utilisation d'architectures hétérogènes, combinant des processeurs multicoeurs avec des accélérate...

Scheduler guided OpenMP execution in cloud VMs

Exécution OpenMP guidée par le ordonnanceur dans les machines virtuelles cloud OpenMP est un cadre largement utilisé pour paralléliser les applications, permettant ...

Efficient Parallel Linked List Processing

OpenMP is a very popular and successful parallel programming API, but efficient parallel traversal of a list (of possibly unknown size) of items linked by pointers is a challenging...

A Version Control Approach for Workflow-Alteration in Collaborative Design

The workflow management system should be flexible enough to manage workflow changes caused by new partnerships, new technologies, and new strategies in collaborative design. Theref...

Performance evaluation of NEMO4.2 with Paraver

The last release of the NEMO v4.2 ocean model includes many modifications that have a significant impact on the model performance. The goal of the work is to assess NEMO performanc...

Formalizing Bottlenecks in Task-Based OpenMP Applications

Task support was introduced into OpenMP to address irregular parallelism in shared memory architectures. Creating tasks that are extremely fine granular in applications, however, i...

Email:
Password:

Email: