Javascript must be enabled to continue!
Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models
View through CrossRef
Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses. In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset. We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths. In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied. Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations. The proposed quantization method managed to compress the four models more than 85%, with an error increase of 0.6, 0, 2.1, and 0.2 percentage points, respectively. By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error. Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints. The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations.
Title: Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models
Description:
Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications.
There is a growing interest in running RNNs to support these applications on edge devices.
However, RNNs have large memory and computational demands that make them challenging to implement on edge devices.
Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision.
Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses.
In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset.
We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths.
In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied.
Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations.
The proposed quantization method managed to compress the four models more than 85%, with an error increase of 0.
6, 0, 2.
1, and 0.
2 percentage points, respectively.
By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error.
Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints.
The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations.
Related Results
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED]Keanu Reeves CBD Gummies ==❱❱ Huge Discounts:[HURRY UP ] Absolute Keanu Reeves CBD Gummies (Available)Order Online Only!! ❰❰= https://www.facebook.com/Keanu-Reeves-CBD-G...
Energy-efficient architectures for recurrent neural networks
Energy-efficient architectures for recurrent neural networks
Deep Learning algorithms have been remarkably successful in applications such as Automatic Speech Recognition and Machine Translation. Thus, these kinds of applications are ubiquit...
Development of a Recurrent Neural Network Model for Prediction of Dengue Importation
Development of a Recurrent Neural Network Model for Prediction of Dengue Importation
ObjectiveWe aim to develop a prediction model for the number of imported cases of infectious disease by using the recurrent neural network (RNN) with the Elman algorithm1, a type o...
Research on Quantization Parameter Decision Scheme for High Efficiency Video Coding
Research on Quantization Parameter Decision Scheme for High Efficiency Video Coding
High-Efficiency Video Coding (HEVC) is one of the most widely studied coding standards. It still uses the block-based hybrid coding framework of Advanced Video Coding (AVC), and co...
Preventive Mechanisms Against Cyberbullying in Social Media Environments
Preventive Mechanisms Against Cyberbullying in Social Media Environments
Cyberbullying has become more common on social media sites. Since people of all ages use social media frequently, it's really important to make these platforms safer from cyberbull...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Dynamic Quantization of Digital Filter Coefficients
Dynamic Quantization of Digital Filter Coefficients
The possibility of quantizing the coefficients of a digital filter in the concept of dynamic mathematical
programming, as a dynamic process of step-by-step quantization of co...
Progress of shrink polymer micro- and nanomanufacturing
Progress of shrink polymer micro- and nanomanufacturing
AbstractTraditional lithography plays a significant role in the fabrication of micro- and nanostructures. Nevertheless, the fabrication process still suffers from the limitations o...

