Javascript must be enabled to continue!
SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA
View through CrossRef
Neural networks are extensively used in software and hardware applications. In hardware applications, it is necessary to
implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and
execute the required neural network with superior performance. Such configurable hardware architecture allows the user to implement
neural networks with different structures and easily modify or change them as needed.
In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and
implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks.
These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural
Networks (CNN). The first layer of these architectures is a software Python layer, which contains a function that serves as the
architecture’s user interface. The function accepts the description of the neural network structure and its training parameters as
inputs and generates three binary files as outputs. These files include the network description, weights, and bias in a specific
format. The second layer is an embedded software layer implemented on the on-chip ARM microcontroller. The embedded layer reads
the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in
the neural network to the third layer, the hardware layer. This embedded layer also monitors a status register(s) built in the third
layer to determine when to send consequent layer parameters and configuration. The third layer is a hardware Intellectual Property (IP)
implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively.
The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation
functions (Relu, Sigmoid, Tanh, and SoftMax). The design also supports implementing the Residual Neural Network (ResNet). The second
architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers. This architecture
allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with
up to 100 elements in each of the input and hidden vectors. It also supports executing an attention layer with up to 64 input vectors and a
maximum vector length of 100 items. FC layers can be configured to support an input vector length up to a value of 256 and number of neurons
up to a value of 256 in each layer. Each FC layer can use either Relu or SoftMax activation functions. Finally, the third architecture supports
implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC). The proposed design supports implementing the
convolution layer with five different filter sizes and different stride and padding values. The CNN hardware IP also supports implementing
two types of pooling (average and maximum) with various pooling window and stride sizes. This hardware architecture also supports FC layers
with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).
Title: SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA
Description:
Neural networks are extensively used in software and hardware applications.
In hardware applications, it is necessary to
implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and
execute the required neural network with superior performance.
Such configurable hardware architecture allows the user to implement
neural networks with different structures and easily modify or change them as needed.
In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and
implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks.
These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural
Networks (CNN).
The first layer of these architectures is a software Python layer, which contains a function that serves as the
architecture’s user interface.
The function accepts the description of the neural network structure and its training parameters as
inputs and generates three binary files as outputs.
These files include the network description, weights, and bias in a specific
format.
The second layer is an embedded software layer implemented on the on-chip ARM microcontroller.
The embedded layer reads
the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in
the neural network to the third layer, the hardware layer.
This embedded layer also monitors a status register(s) built in the third
layer to determine when to send consequent layer parameters and configuration.
The third layer is a hardware Intellectual Property (IP)
implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively.
The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation
functions (Relu, Sigmoid, Tanh, and SoftMax).
The design also supports implementing the Residual Neural Network (ResNet).
The second
architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers.
This architecture
allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with
up to 100 elements in each of the input and hidden vectors.
It also supports executing an attention layer with up to 64 input vectors and a
maximum vector length of 100 items.
FC layers can be configured to support an input vector length up to a value of 256 and number of neurons
up to a value of 256 in each layer.
Each FC layer can use either Relu or SoftMax activation functions.
Finally, the third architecture supports
implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC).
The proposed design supports implementing the
convolution layer with five different filter sizes and different stride and padding values.
The CNN hardware IP also supports implementing
two types of pooling (average and maximum) with various pooling window and stride sizes.
This hardware architecture also supports FC layers
with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).
Related Results
Method of QoS evaluation of FPGA as a service
Method of QoS evaluation of FPGA as a service
The subject of study in this article is the evaluation of the performance issues of cloud services implemented using FPGA technology. The goal is to improve the performance of clou...
Аналіз застосування технологій ПЛІС в складі IoT
Аналіз застосування технологій ПЛІС в складі IoT
The subject of study in this article and work is the modern technologies of programmable logic devices (PLD) classified as FPGA, and the peculiarities of its application in Interne...
The impact of natural closed depressions on soil organic carbon storage in eroded loess landscapes of East Poland
The impact of natural closed depressions on soil organic carbon storage in eroded loess landscapes of East Poland
AbstractSoil erosion in loess landscapes results in soil organic carbon (SOC) redistribution and storage in SOC pools. Understanding the SOC dynamics is important because changes i...
The architecture of differences
The architecture of differences
Following in the footsteps of the protagonists of the Italian architectural debate is a mark of culture and proactivity. The synthesis deriving from the artistic-humanistic factors...
High-Performance FPGA Acceleration for Transformer-Based Models
High-Performance FPGA Acceleration for Transformer-Based Models
Foundation neural networks—large-scale, pre-trained models such as transformers—have rapidly emerged as the cornerstone of state-of-the-art artificial intelligence systems across n...
Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes
Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes
Desde su invención a medidados de los 90, las FPGA han destacado por su gran poder de cómputo, bajo consumo energético y alta flexibilidad al reconfigurar su arquitectura interna p...
Soil carbon sequestration through crops rotation in a Mediterranean Cambisols: measurement and modelling
Soil carbon sequestration through crops rotation in a Mediterranean Cambisols: measurement and modelling
<p>Soil carbon sequestration (SCS) has been identified by the IPCC as one of the most promising and cheap methodology to reduce atmospheric CO<sub>2&...
Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems
Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems
In most existing works, reconfigurable hardware modules are still managed as conventional hardware devices. Further, the software reconfiguration overhead incurred by loading corre...

