Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA

View through CrossRef
Neural networks are extensively used in software and hardware applications. In hardware applications, it is necessary to implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and execute the required neural network with superior performance. Such configurable hardware architecture allows the user to implement neural networks with different structures and easily modify or change them as needed. In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks. These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural Networks (CNN). The first layer of these architectures is a software Python layer, which contains a function that serves as the architecture’s user interface. The function accepts the description of the neural network structure and its training parameters as inputs and generates three binary files as outputs. These files include the network description, weights, and bias in a specific format. The second layer is an embedded software layer implemented on the on-chip ARM microcontroller. The embedded layer reads the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in the neural network to the third layer, the hardware layer. This embedded layer also monitors a status register(s) built in the third layer to determine when to send consequent layer parameters and configuration. The third layer is a hardware Intellectual Property (IP) implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively. The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation functions (Relu, Sigmoid, Tanh, and SoftMax). The design also supports implementing the Residual Neural Network (ResNet). The second architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers. This architecture allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with up to 100 elements in each of the input and hidden vectors. It also supports executing an attention layer with up to 64 input vectors and a maximum vector length of 100 items. FC layers can be configured to support an input vector length up to a value of 256 and number of neurons up to a value of 256 in each layer. Each FC layer can use either Relu or SoftMax activation functions. Finally, the third architecture supports implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC). The proposed design supports implementing the convolution layer with five different filter sizes and different stride and padding values. The CNN hardware IP also supports implementing two types of pooling (average and maximum) with various pooling window and stride sizes. This hardware architecture also supports FC layers with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).
Boise State University, Albertsons Library
Title: SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA
Description:
Neural networks are extensively used in software and hardware applications.
In hardware applications, it is necessary to implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and execute the required neural network with superior performance.
Such configurable hardware architecture allows the user to implement neural networks with different structures and easily modify or change them as needed.
In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks.
These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural Networks (CNN).
The first layer of these architectures is a software Python layer, which contains a function that serves as the architecture’s user interface.
The function accepts the description of the neural network structure and its training parameters as inputs and generates three binary files as outputs.
These files include the network description, weights, and bias in a specific format.
The second layer is an embedded software layer implemented on the on-chip ARM microcontroller.
The embedded layer reads the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in the neural network to the third layer, the hardware layer.
This embedded layer also monitors a status register(s) built in the third layer to determine when to send consequent layer parameters and configuration.
The third layer is a hardware Intellectual Property (IP) implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively.
The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation functions (Relu, Sigmoid, Tanh, and SoftMax).
The design also supports implementing the Residual Neural Network (ResNet).
The second architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers.
This architecture allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with up to 100 elements in each of the input and hidden vectors.
It also supports executing an attention layer with up to 64 input vectors and a maximum vector length of 100 items.
FC layers can be configured to support an input vector length up to a value of 256 and number of neurons up to a value of 256 in each layer.
Each FC layer can use either Relu or SoftMax activation functions.
Finally, the third architecture supports implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC).
The proposed design supports implementing the convolution layer with five different filter sizes and different stride and padding values.
The CNN hardware IP also supports implementing two types of pooling (average and maximum) with various pooling window and stride sizes.
This hardware architecture also supports FC layers with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).

Related Results

Method of QoS evaluation of FPGA as a service
Method of QoS evaluation of FPGA as a service
The subject of study in this article is the evaluation of the performance issues of cloud services implemented using FPGA technology. The goal is to improve the performance of clou...
Аналіз застосування технологій ПЛІС в складі IoT
Аналіз застосування технологій ПЛІС в складі IoT
The subject of study in this article and work is the modern technologies of programmable logic devices (PLD) classified as FPGA, and the peculiarities of its application in Interne...
The impact of natural closed depressions on soil organic carbon storage in eroded loess landscapes of East Poland
The impact of natural closed depressions on soil organic carbon storage in eroded loess landscapes of East Poland
AbstractSoil erosion in loess landscapes results in soil organic carbon (SOC) redistribution and storage in SOC pools. Understanding the SOC dynamics is important because changes i...
The architecture of differences
The architecture of differences
Following in the footsteps of the protagonists of the Italian architectural debate is a mark of culture and proactivity. The synthesis deriving from the artistic-humanistic factors...
High-Performance FPGA Acceleration for Transformer-Based Models
High-Performance FPGA Acceleration for Transformer-Based Models
Foundation neural networks—large-scale, pre-trained models such as transformers—have rapidly emerged as the cornerstone of state-of-the-art artificial intelligence systems across n...
Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes
Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes
Desde su invención a medidados de los 90, las FPGA han destacado por su gran poder de cómputo, bajo consumo energético y alta flexibilidad al reconfigurar su arquitectura interna p...
Soil carbon sequestration through crops rotation in a Mediterranean Cambisols: measurement and modelling
Soil carbon sequestration through crops rotation in a Mediterranean Cambisols: measurement and modelling
<p>Soil carbon sequestration (SCS) has been identified by the IPCC as one of the most promising and cheap methodology to reduce atmospheric CO<sub>2&...
Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems
Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems
In most existing works, reconfigurable hardware modules are still managed as conventional hardware devices. Further, the software reconfiguration overhead incurred by loading corre...

Back to Top