Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

View through CrossRef
Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses. In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset. We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths. In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied. Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations. The proposed quantization method managed to compress the four models more than 85%, with an error increase of 0.6, 0, 2.1, and 0.2 percentage points, respectively. By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error. Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints. The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations.
Title: Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models
Description:
Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications.
There is a growing interest in running RNNs to support these applications on edge devices.
However, RNNs have large memory and computational demands that make them challenging to implement on edge devices.
Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision.
Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses.
In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset.
We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths.
In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied.
Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations.
The proposed quantization method managed to compress the four models more than 85%, with an error increase of 0.
6, 0, 2.
1, and 0.
2 percentage points, respectively.
By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error.
Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints.
The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations.

Related Results

[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED]Keanu Reeves CBD Gummies ==❱❱ Huge Discounts:[HURRY UP ] Absolute Keanu Reeves CBD Gummies (Available)Order Online Only!! ❰❰= https://www.facebook.com/Keanu-Reeves-CBD-G...
Energy-efficient architectures for recurrent neural networks
Energy-efficient architectures for recurrent neural networks
Deep Learning algorithms have been remarkably successful in applications such as Automatic Speech Recognition and Machine Translation. Thus, these kinds of applications are ubiquit...
Constrained Quantization for Probability Distributions
Constrained Quantization for Probability Distributions
In this work, we extend the classical framework of quantization for Borel probability measures defined on normed spaces Rk by introducing and analyzing the notions of the nth const...
Mitigating Quantization Errors Due to Activation Spikes in Gated Linear Unit-Based Large Language Models
Mitigating Quantization Errors Due to Activation Spikes in Gated Linear Unit-Based Large Language Models
Modern large language models (LLMs) achieve state-of-the-art performance through architectural advancements but require high computational costs for inference. Post-training quanti...
An Analytical Solution of Residual Stresses for Shrink-Fit Two-Layer Cylinders After Autofrettage Based on Actual Material Behavior
An Analytical Solution of Residual Stresses for Shrink-Fit Two-Layer Cylinders After Autofrettage Based on Actual Material Behavior
To enhance the pressure capacity and the life of a pressure vessel, different processes such as shrink-fit and autofrettage are usually employed. For autofrettaged and shrink-fit m...
Research on Quantization Parameter Decision Scheme for High Efficiency Video Coding
Research on Quantization Parameter Decision Scheme for High Efficiency Video Coding
High-Efficiency Video Coding (HEVC) is one of the most widely studied coding standards. It still uses the block-based hybrid coding framework of Advanced Video Coding (AVC), and co...
Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone
Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone
Intense sound exposure that exceeds the pain threshold of human auditory sensitivity, known as acoustic trauma, causes significant and extensive changes in the auditory system. Thr...
Preventive Mechanisms Against Cyberbullying in Social Media Environments
Preventive Mechanisms Against Cyberbullying in Social Media Environments
Cyberbullying has become more common on social media sites. Since people of all ages use social media frequently, it's really important to make these platforms safer from cyberbull...

Back to Top