Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Speculative hardware/software co-designed floating-point multiply-add fusion

View through CrossRef
A Fused Multiply-Add (FMA) instruction is currently available in many general-purpose processors. It increases performance by reducing latency of dependent operations and increases precision by computing the result as an indivisible operation with no intermediate rounding. However, since the arithmetic behavior of a single-rounding FMA operation is different than independent FP multiply followed by FP add instructions, some algorithms require significant revalidation and rewriting efforts to work as expected when they are compiled to operate with FMA--a cost that developers may not be willing to pay. Because of that, abundant legacy applications are not able to utilize FMA instructions. In this paper we propose a novel HW/SW collaborative technique that is able to efficiently execute workloads with increased utilization of FMA, by adding the option to get the same numerical result as separate FP multiply and FP add pairs. In particular, we extended the host ISA of a HW/SW co-designed processor with a new Combined Multiply-Add (CMA) instruction that performs an FMA operation with an intermediate rounding. This new instruction is used by a transparent dynamic translation software layer that uses a speculative instruction-fusion optimization to transform FP multiply and FP add sequences into CMA instructions. The FMA unit has been slightly modified to support both single-rounding and double-rounding fused instructions without increasing their latency and to provide a conservative fall-back path in case of mispeculation. Evaluation on a cycle-accurate timing simulator showed that CMA improved SPECfp performance by 6.3% and reduced executed instructions by 4.7%.
Title: Speculative hardware/software co-designed floating-point multiply-add fusion
Description:
A Fused Multiply-Add (FMA) instruction is currently available in many general-purpose processors.
It increases performance by reducing latency of dependent operations and increases precision by computing the result as an indivisible operation with no intermediate rounding.
However, since the arithmetic behavior of a single-rounding FMA operation is different than independent FP multiply followed by FP add instructions, some algorithms require significant revalidation and rewriting efforts to work as expected when they are compiled to operate with FMA--a cost that developers may not be willing to pay.
Because of that, abundant legacy applications are not able to utilize FMA instructions.
In this paper we propose a novel HW/SW collaborative technique that is able to efficiently execute workloads with increased utilization of FMA, by adding the option to get the same numerical result as separate FP multiply and FP add pairs.
In particular, we extended the host ISA of a HW/SW co-designed processor with a new Combined Multiply-Add (CMA) instruction that performs an FMA operation with an intermediate rounding.
This new instruction is used by a transparent dynamic translation software layer that uses a speculative instruction-fusion optimization to transform FP multiply and FP add sequences into CMA instructions.
The FMA unit has been slightly modified to support both single-rounding and double-rounding fused instructions without increasing their latency and to provide a conservative fall-back path in case of mispeculation.
Evaluation on a cycle-accurate timing simulator showed that CMA improved SPECfp performance by 6.
3% and reduced executed instructions by 4.
7%.

Related Results

The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Environmental Surveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) v2
Environmental Surveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) v2
EnvironmentalSurveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) This comprehensive protocol suite enables systematic environmental surveillance for avian influenza...
Performance simulation methodologies for hardware/software co-designed processors
Performance simulation methodologies for hardware/software co-designed processors
Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex de...
Kajian Pengembangan Sediaan Floating Sustained Release Tablet
Kajian Pengembangan Sediaan Floating Sustained Release Tablet
Abstract. Floating sustained release tablets were developed to increase the elimination half-life and bioavailability of the drug because it is able to release the active substance...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
Design of Floating HPMC Matrix Tablets: Effect of Formulation Variables on Floating Properties and Drug Release
Design of Floating HPMC Matrix Tablets: Effect of Formulation Variables on Floating Properties and Drug Release
Floating matrix tablets were designed and evaluated. Theophylline was used as a model drug. The system was prepared by mixing drug, matrix-forming polymer (hydroxypropyl methylcell...
Lectin C gene analysis v1
Lectin C gene analysis v1
Mammalian Tissue Total RNA Purification Protocol by GeneJET RNA Purification Kit (Thermo Scientific, USA) Before starting: • Supplement the required amount of Lysis Buffer with β-...
Design and Functional Requirements for the Floating Container Terminal at Valdez, Alaska
Design and Functional Requirements for the Floating Container Terminal at Valdez, Alaska
ABSTRACT The City of Valdez, Alaska, will be pioneering in container terminal facilities when it installs the first floating dock designed for the operation of a ...

Back to Top