Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Optimizing Soft Vector Processing in FPGA-Based Embedded Systems

View through CrossRef
Soft vector processors can augment and extend the capability of FPGA-based embedded systems-on-chip such as the Xilinx Zynq. However, configuring and optimizing the soft processor for best performance is hard. We must consider architectural parameters such as precision, vector lane count, vector length, chunk size, and DMA scheduling to ensure efficient execution of code on the soft vector processing platform. To simplify the design process, we develop a compiler framework and an autotuning runtime that splits the optimization into a combination of static and dynamic passes that map data-parallel computations to the soft processor. We compare and contrast implementations running on the scalar ARM processor, the embedded NEON hard vector engine, and low-level streaming Verilog designs with the VectorBlox MXP soft vector processor. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform other organizations by up to 4 × while saving ≈ 10% dynamic power. Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the autogenerated data-parallel code. When constrained by IO bandwidth, soft vector processors are even competitive with spatial Verilog implementations of computation.
Title: Optimizing Soft Vector Processing in FPGA-Based Embedded Systems
Description:
Soft vector processors can augment and extend the capability of FPGA-based embedded systems-on-chip such as the Xilinx Zynq.
However, configuring and optimizing the soft processor for best performance is hard.
We must consider architectural parameters such as precision, vector lane count, vector length, chunk size, and DMA scheduling to ensure efficient execution of code on the soft vector processing platform.
To simplify the design process, we develop a compiler framework and an autotuning runtime that splits the optimization into a combination of static and dynamic passes that map data-parallel computations to the soft processor.
We compare and contrast implementations running on the scalar ARM processor, the embedded NEON hard vector engine, and low-level streaming Verilog designs with the VectorBlox MXP soft vector processor.
Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform other organizations by up to 4 × while saving ≈ 10% dynamic power.
Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the autogenerated data-parallel code.
When constrained by IO bandwidth, soft vector processors are even competitive with spatial Verilog implementations of computation.

Related Results

Method of QoS evaluation of FPGA as a service
Method of QoS evaluation of FPGA as a service
The subject of study in this article is the evaluation of the performance issues of cloud services implemented using FPGA technology. The goal is to improve the performance of clou...
Аналіз застосування технологій ПЛІС в складі IoT
Аналіз застосування технологій ПЛІС в складі IoT
The subject of study in this article and work is the modern technologies of programmable logic devices (PLD) classified as FPGA, and the peculiarities of its application in Interne...
Methods of Deployment and Evaluation of FPGA as a Service Under Conditions of Changing Requirements and Environments
Methods of Deployment and Evaluation of FPGA as a Service Under Conditions of Changing Requirements and Environments
Applying Field Programmable Gate Array (FPGA) technology in cloud infrastructure and heterogeneous computations is of great interest today. FPGA as a Service assumes that the progr...
Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes
Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes
Desde su invención a medidados de los 90, las FPGA han destacado por su gran poder de cómputo, bajo consumo energético y alta flexibilidad al reconfigurar su arquitectura interna p...
Soft Power with Chinese Specifics: Concept and Approaches
Soft Power with Chinese Specifics: Concept and Approaches
The purpose of the study. Joseph Nye’s theory of soft power has enriched the idea of the country’s comprehensive power and attracted great attention from the Chinese theoretical co...
Investigating Energy Consumption of an SRAM-based FPGA for Duty-Cycle Applications
Investigating Energy Consumption of an SRAM-based FPGA for Duty-Cycle Applications
In order to conserve energy, battery powered embedded systems are typically designed with very low-power modules that offer limited computational power and communication bandwidth ...
Performance Analysis of FPGA Architectures based Embedded Control Applications
Performance Analysis of FPGA Architectures based Embedded Control Applications
The performances of System on Chip (SoC) and the Field Programmable Gate Array (FPGA) particularly, are increasing continually. Due to the growing complexity of modern embedded con...
SIFAT-SIFAT MODUL SOFT
SIFAT-SIFAT MODUL SOFT
Suatu himpunan tak kosong disebut modul atas suatu ring dengan elemen satuan jika himpunan tersebut merupakan grup komutatif yang tertutup terhadap perkalian skalar yang memenuhi b...

Back to Top