Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Speculative hardware/software co-designed floating-point multiply-add fusion

View through CrossRef
A Fused Multiply-Add (FMA) instruction is currently available in many general-purpose processors. It increases performance by reducing latency of dependent operations and increases precision by computing the result as an indivisible operation with no intermediate rounding. However, since the arithmetic behavior of a single-rounding FMA operation is different than independent FP multiply followed by FP add instructions, some algorithms require significant revalidation and rewriting efforts to work as expected when they are compiled to operate with FMA--a cost that developers may not be willing to pay. Because of that, abundant legacy applications are not able to utilize FMA instructions. In this paper we propose a novel HW/SW collaborative technique that is able to efficiently execute workloads with increased utilization of FMA, by adding the option to get the same numerical result as separate FP multiply and FP add pairs. In particular, we extended the host ISA of a HW/SW co-designed processor with a new Combined Multiply-Add (CMA) instruction that performs an FMA operation with an intermediate rounding. This new instruction is used by a transparent dynamic translation software layer that uses a speculative instruction-fusion optimization to transform FP multiply and FP add sequences into CMA instructions. The FMA unit has been slightly modified to support both single-rounding and double-rounding fused instructions without increasing their latency and to provide a conservative fall-back path in case of mispeculation. Evaluation on a cycle-accurate timing simulator showed that CMA improved SPECfp performance by 6.3% and reduced executed instructions by 4.7%.
Title: Speculative hardware/software co-designed floating-point multiply-add fusion
Description:
A Fused Multiply-Add (FMA) instruction is currently available in many general-purpose processors.
It increases performance by reducing latency of dependent operations and increases precision by computing the result as an indivisible operation with no intermediate rounding.
However, since the arithmetic behavior of a single-rounding FMA operation is different than independent FP multiply followed by FP add instructions, some algorithms require significant revalidation and rewriting efforts to work as expected when they are compiled to operate with FMA--a cost that developers may not be willing to pay.
Because of that, abundant legacy applications are not able to utilize FMA instructions.
In this paper we propose a novel HW/SW collaborative technique that is able to efficiently execute workloads with increased utilization of FMA, by adding the option to get the same numerical result as separate FP multiply and FP add pairs.
In particular, we extended the host ISA of a HW/SW co-designed processor with a new Combined Multiply-Add (CMA) instruction that performs an FMA operation with an intermediate rounding.
This new instruction is used by a transparent dynamic translation software layer that uses a speculative instruction-fusion optimization to transform FP multiply and FP add sequences into CMA instructions.
The FMA unit has been slightly modified to support both single-rounding and double-rounding fused instructions without increasing their latency and to provide a conservative fall-back path in case of mispeculation.
Evaluation on a cycle-accurate timing simulator showed that CMA improved SPECfp performance by 6.
3% and reduced executed instructions by 4.
7%.

Related Results

The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Performance simulation methodologies for hardware/software co-designed processors
Performance simulation methodologies for hardware/software co-designed processors
Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex de...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
Kajian Pengembangan Sediaan Floating Sustained Release Tablet
Kajian Pengembangan Sediaan Floating Sustained Release Tablet
Abstract. Floating sustained release tablets were developed to increase the elimination half-life and bioavailability of the drug because it is able to release the active substance...
Design of Floating HPMC Matrix Tablets: Effect of Formulation Variables on Floating Properties and Drug Release
Design of Floating HPMC Matrix Tablets: Effect of Formulation Variables on Floating Properties and Drug Release
Floating matrix tablets were designed and evaluated. Theophylline was used as a model drug. The system was prepared by mixing drug, matrix-forming polymer (hydroxypropyl methylcell...
Software driven approach for Embedded Devices
Software driven approach for Embedded Devices
This paper presents the possible new design paradigm that emerged during the author’s design of an embedded communication device for Croatian Navy. Prior to codesign techniques tha...
Speculative Fiction
Speculative Fiction
The term “speculative fiction” has three historically located meanings: a subgenre of science fiction that deals with human rather than technological problems, a genre distinct fro...
Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems
Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems
In most existing works, reconfigurable hardware modules are still managed as conventional hardware devices. Further, the software reconfiguration overhead incurred by loading corre...

Back to Top