Javascript must be enabled to continue!

SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA

Neural networks are extensively used in software and hardware applications. In hardware applications, it is necessary to implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and execute the required neural network with superior performance. Such configurable hardware architecture allows the user to implement neural networks with different structures and easily modify or change them as needed. In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks. These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural Networks (CNN). The first layer of these architectures is a software Python layer, which contains a function that serves as the architecture’s user interface. The function accepts the description of the neural network structure and its training parameters as inputs and generates three binary files as outputs. These files include the network description, weights, and bias in a specific format. The second layer is an embedded software layer implemented on the on-chip ARM microcontroller. The embedded layer reads the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in the neural network to the third layer, the hardware layer. This embedded layer also monitors a status register(s) built in the third layer to determine when to send consequent layer parameters and configuration. The third layer is a hardware Intellectual Property (IP) implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively. The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation functions (Relu, Sigmoid, Tanh, and SoftMax). The design also supports implementing the Residual Neural Network (ResNet). The second architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers. This architecture allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with up to 100 elements in each of the input and hidden vectors. It also supports executing an attention layer with up to 64 input vectors and a maximum vector length of 100 items. FC layers can be configured to support an input vector length up to a value of 256 and number of neurons up to a value of 256 in each layer. Each FC layer can use either Relu or SoftMax activation functions. Finally, the third architecture supports implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC). The proposed design supports implementing the convolution layer with five different filter sizes and different stride and padding values. The CNN hardware IP also supports implementing two types of pooling (average and maximum) with various pooling window and stride sizes. This hardware architecture also supports FC layers with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).

Boise State University, Albertsons Library

Michael Wasef

2023

Title: SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA

Description:

Neural networks are extensively used in software and hardware applications.

In hardware applications, it is necessary to implement a small, accelerated, and configurable hardware architecture to be easily embedded in hardware devices to implement and execute the required neural network with superior performance.

Such configurable hardware architecture allows the user to implement neural networks with different structures and easily modify or change them as needed.

In this dissertation, three architectures, each containing three layers, have been designed using a system-on-chip approach and implemented on a Field Programmable Gate Array (FPGA), to realize and accelerate the performance of three types of neural networks.

These three neural networks are: Fully Connected Neural Networks (FCNN); Recurrent Neural Networks (RNN); and Convolution Neural Networks (CNN).

The first layer of these architectures is a software Python layer, which contains a function that serves as the architecture’s user interface.

The function accepts the description of the neural network structure and its training parameters as inputs and generates three binary files as outputs.

These files include the network description, weights, and bias in a specific format.

The second layer is an embedded software layer implemented on the on-chip ARM microcontroller.

The embedded layer reads the binary files generated by the Python function and begins transferring the required parameters and configuration of each layer in the neural network to the third layer, the hardware layer.

This embedded layer also monitors a status register(s) built in the third layer to determine when to send consequent layer parameters and configuration.

The third layer is a hardware Intellectual Property (IP) implemented on the FPGA fabric and is configured by the second embedded layer to execute the required neural network layers consecutively.

The first architecture supports implementing FCNN with up to 1024 layers, each with a 1024 maximum neurons and four distinct activation functions (Relu, Sigmoid, Tanh, and SoftMax).

The design also supports implementing the Residual Neural Network (ResNet).

The second architecture supports implementing RNN with three-layer types: Recurrent, Attention, and Fully Connected (FC) layers.

This architecture allows the implementation of a Recurrent layer on an FPGA using a Long Short Term Memory (LSTM) model or a Gated Recurrent Unit (GRU) with up to 100 elements in each of the input and hidden vectors.

It also supports executing an attention layer with up to 64 input vectors and a maximum vector length of 100 items.

FC layers can be configured to support an input vector length up to a value of 256 and number of neurons up to a value of 256 in each layer.

Each FC layer can use either Relu or SoftMax activation functions.

Finally, the third architecture supports implementing a complete CNN, including three-layer types (Convolution, Pooling, and FC).

The proposed design supports implementing the convolution layer with five different filter sizes and different stride and padding values.

The CNN hardware IP also supports implementing two types of pooling (average and maximum) with various pooling window and stride sizes.

This hardware architecture also supports FC layers with input and output vector lengths of up to 4096 elements and two distinct activation functions (Relu and SoftMax).

Back

The subject of study in this article is the evaluation of the performance issues of cloud services implemented using FPGA technology. The goal is to improve the performance of clou...

Аналіз застосування технологій ПЛІС в складі IoT

The subject of study in this article and work is the modern technologies of programmable logic devices (PLD) classified as FPGA, and the peculiarities of its application in Interne...

The impact of natural closed depressions on soil organic carbon storage in eroded loess landscapes of East Poland

AbstractSoil erosion in loess landscapes results in soil organic carbon (SOC) redistribution and storage in SOC pools. Understanding the SOC dynamics is important because changes i...

High-Performance FPGA Acceleration for Transformer-Based Models

Foundation neural networks—large-scale, pre-trained models such as transformers—have rapidly emerged as the cornerstone of state-of-the-art artificial intelligence systems across n...

TREE: Bridging the gap between reconfigurable computing and secure execution

Trusted Execution Environments (TEEs) have become a pivotal technology for securing a wide spectrum of security-sensitive applications. With modern computing systems shifting to he...

Comparación de enfoques de desarrollo HDL y HLL en FPGA para aplicaciones de procesamiento de imágenes

Desde su invención a medidados de los 90, las FPGA han destacado por su gran poder de cómputo, bajo consumo energético y alta flexibilidad al reconfigurar su arquitectura interna p...

Soil carbon sequestration through crops rotation in a Mediterranean Cambisols: measurement and modelling

<p>Soil carbon sequestration (SCS) has been identified by the IPCC as one of the most promising and cheap methodology to reduce atmospheric CO<sub>2&...

Virtualizable hardware/software design infrastructure for dynamically partially reconfigurable systems

In most existing works, reconfigurable hardware modules are still managed as conventional hardware devices. Further, the software reconfiguration overhead incurred by loading corre...

Email:
Password:

Email:

SOC Reconfigurable Architecture for Software-Trained Neural Networks on FPGA

Related Results