Javascript must be enabled to continue!
Towards a Performance Engineering Workflow for OpenMP 4.0
View through CrossRef
Parallel programming and performance optimization of parallel programs are not simple tasks. Various HPC and OpenMP courses as well as literature serve as introduction to this topic. Assuming the role of HPC beginners we evaluate how far the knowledge acquired from introductory courses and literature can drive performance optimization of a conjugate gradient kernel. We concentrate on OpenMP programming for a large NUMA machine and investigate the new target construct in OpenMP 4.0 to offload to a Xeon Phi coprocessor. We evaluate the final results with a performance model. From these experiences we propose a performance engineering workflow for general use.
Title: Towards a Performance Engineering Workflow for OpenMP 4.0
Description:
Parallel programming and performance optimization of parallel programs are not simple tasks.
Various HPC and OpenMP courses as well as literature serve as introduction to this topic.
Assuming the role of HPC beginners we evaluate how far the knowledge acquired from introductory courses and literature can drive performance optimization of a conjugate gradient kernel.
We concentrate on OpenMP programming for a large NUMA machine and investigate the new target construct in OpenMP 4.
0 to offload to a Xeon Phi coprocessor.
We evaluate the final results with a performance model.
From these experiences we propose a performance engineering workflow for general use.
Related Results
High-level compiler analysis for OpenMP
High-level compiler analysis for OpenMP
Nowadays, applications from dissimilar domains, such as high-performance computing and high-integrity systems, require levels of performance that can only be achieved by means of s...
Towards a safe and efficient OpenMP
Towards a safe and efficient OpenMP
(English) The growing complexity of contemporary multi-core and heterogeneous architectures necessitates parallel programming models capable of efficiently leveraging the available...
Automatic Parallelization for Heterogeneous Embedded Systems
Automatic Parallelization for Heterogeneous Embedded Systems
Parallélisation automatique pour systèmes hétérogènes embarqués
L'utilisation d'architectures hétérogènes, combinant des processeurs multicoeurs avec des accélérate...
Scheduler guided OpenMP execution in cloud VMs
Scheduler guided OpenMP execution in cloud VMs
Exécution OpenMP guidée par le ordonnanceur dans les machines virtuelles cloud
OpenMP est un cadre largement utilisé pour paralléliser les applications, permettant ...
Efficient Parallel Linked List Processing
Efficient Parallel Linked List Processing
OpenMP is a very popular and successful parallel programming API, but efficient parallel traversal of a list (of possibly unknown size) of items linked by pointers is a challenging...
A Version Control Approach for Workflow-Alteration in Collaborative Design
A Version Control Approach for Workflow-Alteration in Collaborative Design
The workflow management system should be flexible enough to manage workflow changes caused by new partnerships, new technologies, and new strategies in collaborative design. Theref...
Performance evaluation of NEMO4.2 with Paraver
Performance evaluation of NEMO4.2 with Paraver
The last release of the NEMO v4.2 ocean model includes many modifications that have a significant impact on the model performance. The goal of the work is to assess NEMO performanc...
Formalizing Bottlenecks in Task-Based OpenMP Applications
Formalizing Bottlenecks in Task-Based OpenMP Applications
Task support was introduced into OpenMP to address irregular parallelism in shared memory architectures. Creating tasks that are extremely fine granular in applications, however, i...

