Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Quantization Aware Factorization for Deep Neural Network Compression

View through CrossRef
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it’s prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
Title: Quantization Aware Factorization for Deep Neural Network Compression
Description:
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed.
A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model.
Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid.
We compress neural network weights with a devised algorithm and evaluate it’s prediction quality and performance.
We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.

Related Results

Factorization structures, cones, and polytopes
Factorization structures, cones, and polytopes
Abstract Factorization structures occur in toric differential and discrete geometry and can be viewed in multiple ways, e.g., as objects determining substantial classes of expli...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
Factorization Machines with libFM
Factorization Machines with libFM
Factorization approaches provide high accuracy in several important prediction problems, for example, recommender systems. However, applying factorization approaches to a new predi...
Implications of Deep Compression with Complex Neural Networks
Implications of Deep Compression with Complex Neural Networks
Deep learning and neural networks have become increasingly popular in the area of artificial intelligence. These models have the capability to solve complex problems, such as image...
Deep convolutional neural network and IoT technology for healthcare
Deep convolutional neural network and IoT technology for healthcare
Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...
Comparison of PCA and Autoencoder Compression for Telemetry of Logging-While-Drilling NMR Measurements
Comparison of PCA and Autoencoder Compression for Telemetry of Logging-While-Drilling NMR Measurements
Compression is an essential aspect of real-time operations as the bandwidth of transmitted information is very limited during logging while drilling. Processing of nuclear magnetic...
Improving the performance of 3D image model compression based on optimized DEFLATE algorithm
Improving the performance of 3D image model compression based on optimized DEFLATE algorithm
AbstractThis study focuses on optimizing and designing the Delayed-Fix-Later Awaiting Transmission Encoding (DEFLATE) algorithm to enhance its compression performance and reduce th...
Fuzzy Chaotic Neural Networks
Fuzzy Chaotic Neural Networks
An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...

Back to Top