Javascript must be enabled to continue!
LU Factorisation on Xeon and Xeon Phi Processors
View through CrossRef
This paper outlines the parallelisation and vectorisation methods we have used to port a LU decomposition library to the Xeon Phi co-processor. We ported a LU factorisation algorithm, which utilizes the Gaussian elimination method to perform the decomposition, using Intel LEO directives, OpenMP 4.0 directives, Intel's Cilk array notation, and vectorisation directives. We compare the performance achieved with these different methods, investigate the cost of data transfer on the overall time to solution, and analyse the impact of these optimization and parallelisation techniques on code running on the host processors as well. The results show that performance can be improved on the Xeon Phi by optimising the memory operations, and that Cilk array notation can benefit this benchmark on standard processors but do not have the same impact on the Xeon Phi co-processor. We have also demonstrated cases where the Xeon Phi will compute our implementations faster than we can run them on a node of a HPC system, and that our implementations are not as efficient as the LU factorisation implemented in the mkl library.
Title: LU Factorisation on Xeon and Xeon Phi Processors
Description:
This paper outlines the parallelisation and vectorisation methods we have used to port a LU decomposition library to the Xeon Phi co-processor.
We ported a LU factorisation algorithm, which utilizes the Gaussian elimination method to perform the decomposition, using Intel LEO directives, OpenMP 4.
0 directives, Intel's Cilk array notation, and vectorisation directives.
We compare the performance achieved with these different methods, investigate the cost of data transfer on the overall time to solution, and analyse the impact of these optimization and parallelisation techniques on code running on the host processors as well.
The results show that performance can be improved on the Xeon Phi by optimising the memory operations, and that Cilk array notation can benefit this benchmark on standard processors but do not have the same impact on the Xeon Phi co-processor.
We have also demonstrated cases where the Xeon Phi will compute our implementations faster than we can run them on a node of a HPC system, and that our implementations are not as efficient as the LU factorisation implemented in the mkl library.
Related Results
Un manoscritto equivocato del copista santo Theophilos († 1548)
Un manoscritto equivocato del copista santo Theophilos († 1548)
<p><font size="3"><span class="A1"><span style="font-family: 'Times New Roman','serif'">ΕΝΑ ΛΑΝ&...
HPC-BLAST: Distributed BLAST for Modern HPC Clusters.
HPC-BLAST: Distributed BLAST for Modern HPC Clusters.
The near exponential growth in sequence data available to bioinformaticists, and the emergence of new fields of biological research, continue to fuel an incessant need for in- crea...
Abstract 1627: PHI-501, a novel and potent pan-RAF inhibitor in metastatic melanoma
Abstract 1627: PHI-501, a novel and potent pan-RAF inhibitor in metastatic melanoma
Abstract
Background: PHI-501 has been developed as a novel inhibitor of NRAS mutated acute myeloid leukemia. Big data and artificial intelligence (AI)-based drug dis...
North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck
North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck
<span style="font-size: 11pt; color: black; font-family: 'Times New Roman','serif'">ΠΗΛΙΝΑ ΙΓ&Delta...
A 2‐year prospective evaluation of the Prostate Health Index in guiding biopsy decisions in a large cohort
A 2‐year prospective evaluation of the Prostate Health Index in guiding biopsy decisions in a large cohort
Objectives
To prospectively evaluate how the Prostate Health Index (PHI) impacts on clinical decision in a real‐life setting for men with a prostate‐specific an...
Сравнение стратегий распараллеливания векторизованного римановского решателя с помощью OpenMP для микропроцессора Intel Xeon Phi KNL
Сравнение стратегий распараллеливания векторизованного римановского решателя с помощью OpenMP для микропроцессора Intel Xeon Phi KNL
Римановские решатели широко используются в численных методах, при решении задач газовой динамики. При этом во время проведения вычислений требуется решать задачу Римана о распаде п...
Intracellular pH regulation in rat Schwann cells
Intracellular pH regulation in rat Schwann cells
AbstractWe examined H+ and HCO3− transport mechanisms that are involved in the regulation of intracellular pH of Schwann cells. Primary cultures of Schwann cells were prepared from...
Innovations in Multicore Network Processor Design for Enhanced Performance
Innovations in Multicore Network Processor Design for Enhanced Performance
The rapid expansion of network traffic, driven by the proliferation of internet-connected devices and the growing demand for high-speed data transmission, has intensified the need ...

