Javascript must be enabled to continue!

LU Factorisation on Xeon and Xeon Phi Processors

This paper outlines the parallelisation and vectorisation methods we have used to port a LU decomposition library to the Xeon Phi co-processor. We ported a LU factorisation algorithm, which utilizes the Gaussian elimination method to perform the decomposition, using Intel LEO directives, OpenMP 4.0 directives, Intel's Cilk array notation, and vectorisation directives. We compare the performance achieved with these different methods, investigate the cost of data transfer on the overall time to solution, and analyse the impact of these optimization and parallelisation techniques on code running on the host processors as well. The results show that performance can be improved on the Xeon Phi by optimising the memory operations, and that Cilk array notation can benefit this benchmark on standard processors but do not have the same impact on the Xeon Phi co-processor. We have also demonstrated cases where the Xeon Phi will compute our implementations faster than we can run them on a node of a HPC system, and that our implementations are not as efficient as the LU factorisation implemented in the mkl library.

IOS Press

Jackson Adrian Dubaniowski Mateusz Iwo

Advances in Parallel Computing

2025

Title: LU Factorisation on Xeon and Xeon Phi Processors

Description:

This paper outlines the parallelisation and vectorisation methods we have used to port a LU decomposition library to the Xeon Phi co-processor.

We ported a LU factorisation algorithm, which utilizes the Gaussian elimination method to perform the decomposition, using Intel LEO directives, OpenMP 4.

0 directives, Intel's Cilk array notation, and vectorisation directives.

We compare the performance achieved with these different methods, investigate the cost of data transfer on the overall time to solution, and analyse the impact of these optimization and parallelisation techniques on code running on the host processors as well.

The results show that performance can be improved on the Xeon Phi by optimising the memory operations, and that Cilk array notation can benefit this benchmark on standard processors but do not have the same impact on the Xeon Phi co-processor.

We have also demonstrated cases where the Xeon Phi will compute our implementations faster than we can run them on a node of a HPC system, and that our implementations are not as efficient as the LU factorisation implemented in the mkl library.

Back

ΕΝΑ ΛΑΝ&...

HPC-BLAST: Distributed BLAST for Modern HPC Clusters.

The near exponential growth in sequence data available to bioinformaticists, and the emergence of new fields of biological research, continue to fuel an incessant need for in- crea...

Abstract 1627: PHI-501, a novel and potent pan-RAF inhibitor in metastatic melanoma

Abstract Background: PHI-501 has been developed as a novel inhibitor of NRAS mutated acute myeloid leukemia. Big data and artificial intelligence (AI)-based drug dis...

North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck

ΠΗΛΙΝΑ ΙΓ&Delta...

A 2‐year prospective evaluation of the Prostate Health Index in guiding biopsy decisions in a large cohort

Objectives To prospectively evaluate how the Prostate Health Index (PHI) impacts on clinical decision in a real‐life setting for men with a prostate‐specific an...

Сравнение стратегий распараллеливания векторизованного римановского решателя с помощью OpenMP для микропроцессора Intel Xeon Phi KNL

Римановские решатели широко используются в численных методах, при решении задач газовой динамики. При этом во время проведения вычислений требуется решать задачу Римана о распаде п...

Intracellular pH regulation in rat Schwann cells

AbstractWe examined H+ and HCO3− transport mechanisms that are involved in the regulation of intracellular pH of Schwann cells. Primary cultures of Schwann cells were prepared from...

Innovations in Multicore Network Processor Design for Enhanced Performance

The rapid expansion of network traffic, driven by the proliferation of internet-connected devices and the growing demand for high-speed data transmission, has intensified the need ...

Email:
Password:

Email:

LU Factorisation on Xeon and Xeon Phi Processors

Related Results