Javascript must be enabled to continue!

Quantization Aware Factorization for Deep Neural Network Compression

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it’s prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.

AI Access Foundation

Daria Cherniuk Stanislav Abukhovich Anh-Huy Phan Ivan Oseledets Andrzej Cichocki Julia Gusak

Journal of Artificial Intelligence Research

2024

Title: Quantization Aware Factorization for Deep Neural Network Compression

Description:

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.

Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed.

A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy.

This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model.

Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid.

We compress neural network weights with a devised algorithm and evaluate it’s prediction quality and performance.

We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.

Back

Abstract Factorization structures occur in toric differential and discrete geometry and can be viewed in multiple ways, e.g., as objects determining substantial classes of expli...

Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review

Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...

Factorization Machines with libFM

Factorization approaches provide high accuracy in several important prediction problems, for example, recommender systems. However, applying factorization approaches to a new predi...

Implications of Deep Compression with Complex Neural Networks

Deep learning and neural networks have become increasingly popular in the area of artificial intelligence. These models have the capability to solve complex problems, such as image...

Deep convolutional neural network and IoT technology for healthcare

Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...

Comparison of PCA and Autoencoder Compression for Telemetry of Logging-While-Drilling NMR Measurements

Compression is an essential aspect of real-time operations as the bandwidth of transmitted information is very limited during logging while drilling. Processing of nuclear magnetic...

Improving the performance of 3D image model compression based on optimized DEFLATE algorithm

AbstractThis study focuses on optimizing and designing the Delayed-Fix-Later Awaiting Transmission Encoding (DEFLATE) algorithm to enhance its compression performance and reduce th...

Fuzzy Chaotic Neural Networks

An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...

Email:
Password:

Email:

Quantization Aware Factorization for Deep Neural Network Compression

Related Results