Javascript must be enabled to continue!

Optimizing Soft Vector Processing in FPGA-Based Embedded Systems

Soft vector processors can augment and extend the capability of FPGA-based embedded systems-on-chip such as the Xilinx Zynq. However, configuring and optimizing the soft processor for best performance is hard. We must consider architectural parameters such as precision, vector lane count, vector length, chunk size, and DMA scheduling to ensure efficient execution of code on the soft vector processing platform. To simplify the design process, we develop a compiler framework and an autotuning runtime that splits the optimization into a combination of static and dynamic passes that map data-parallel computations to the soft processor. We compare and contrast implementations running on the scalar ARM processor, the embedded NEON hard vector engine, and low-level streaming Verilog designs with the VectorBlox MXP soft vector processor. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform other organizations by up to 4 × while saving ≈ 10% dynamic power. Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the autogenerated data-parallel code. When constrained by IO bandwidth, soft vector processors are even competitive with spatial Verilog implementations of computation.

Association for Computing Machinery (ACM)

Nachiket Kapre

ACM Transactions on Reconfigurable Technology and Systems

2016

Title: Optimizing Soft Vector Processing in FPGA-Based Embedded Systems

Description:

Soft vector processors can augment and extend the capability of FPGA-based embedded systems-on-chip such as the Xilinx Zynq.

However, configuring and optimizing the soft processor for best performance is hard.

We must consider architectural parameters such as precision, vector lane count, vector length, chunk size, and DMA scheduling to ensure efficient execution of code on the soft vector processing platform.

To simplify the design process, we develop a compiler framework and an autotuning runtime that splits the optimization into a combination of static and dynamic passes that map data-parallel computations to the soft processor.

We compare and contrast implementations running on the scalar ARM processor, the embedded NEON hard vector engine, and low-level streaming Verilog designs with the VectorBlox MXP soft vector processor.

Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform other organizations by up to 4 × while saving ≈ 10% dynamic power.

Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the autogenerated data-parallel code.

When constrained by IO bandwidth, soft vector processors are even competitive with spatial Verilog implementations of computation.

Back

The subject of study in this article is the evaluation of the performance issues of cloud services implemented using FPGA technology. The goal is to improve the performance of clou...

Аналіз застосування технологій ПЛІС в складі IoT

The subject of study in this article and work is the modern technologies of programmable logic devices (PLD) classified as FPGA, and the peculiarities of its application in Interne...

Methods of Deployment and Evaluation of FPGA as a Service Under Conditions of Changing Requirements and Environments

Applying Field Programmable Gate Array (FPGA) technology in cloud infrastructure and heterogeneous computations is of great interest today. FPGA as a Service assumes that the progr...

Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes

Desde su invención a medidados de los 90, las FPGA han destacado por su gran poder de cómputo, bajo consumo energético y alta flexibilidad al reconfigurar su arquitectura interna p...

Soft Power with Chinese Specifics: Concept and Approaches

The purpose of the study. Joseph Nye’s theory of soft power has enriched the idea of the country’s comprehensive power and attracted great attention from the Chinese theoretical co...

Investigating Energy Consumption of an SRAM-based FPGA for Duty-Cycle Applications

In order to conserve energy, battery powered embedded systems are typically designed with very low-power modules that offer limited computational power and communication bandwidth ...

Performance Analysis of FPGA Architectures based Embedded Control Applications

The performances of System on Chip (SoC) and the Field Programmable Gate Array (FPGA) particularly, are increasing continually. Due to the growing complexity of modern embedded con...

SIFAT-SIFAT MODUL SOFT

Suatu himpunan tak kosong disebut modul atas suatu ring dengan elemen satuan jika himpunan tersebut merupakan grup komutatif yang tertutup terhadap perkalian skalar yang memenuhi b...

Email:
Password:

Email:

Optimizing Soft Vector Processing in FPGA-Based Embedded Systems

Related Results