Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Efficient Patch Pruning for Vision Transformers via Patch Similarity

View through CrossRef
Vision Transformers (ViTs) have emerged as a powerful alternative to convolutional neural networks (CNNs) for visual recognition tasks due to their ability to model long-range dependencies in images through self-attention. However, the computational complexity and memory consumption of ViTs scale quadratically with the number of input patches, making them inefficient, especially for high-resolution images. In this work, we propose a simple yet effective method for patch pruning based on patch similarity, aimed at improving the efficiency of ViTs without compromising their performance. The core idea is to selectively prune patches that exhibit high similarity, reducing redundant information processing while preserving crucial spatial and contextual information. First, we compute a similarity matrix between patches using a distance measure derived from their feature representations. Based on this similarity measure, we identify clusters of highly similar patches, which are subsequently pruned in a manner that minimizes information loss. We show that pruning patches with high redundancy leads to a more compact representation while maintaining the overall performance of the ViT in various image classification tasks. We further explore the impact of different similarity thresholds and pruning strategies on model accuracy and computational efficiency. Experimental results on standard benchmark datasets such as ImageNet demonstrate that our patch pruning method achieves significant reductions in computation and memory usage, with only a marginal decrease in accuracy. In addition, our approach offers flexibility in balancing the trade-off between speed and accuracy, making it a viable solution for deploying Vision Transformers on resource-constrained devices. The simplicity of the method and its effectiveness make it a promising approach for enhancing the scalability and applicability of ViTs, particularly in real-world scenarios where efficiency is paramount.
Title: Efficient Patch Pruning for Vision Transformers via Patch Similarity
Description:
Vision Transformers (ViTs) have emerged as a powerful alternative to convolutional neural networks (CNNs) for visual recognition tasks due to their ability to model long-range dependencies in images through self-attention.
However, the computational complexity and memory consumption of ViTs scale quadratically with the number of input patches, making them inefficient, especially for high-resolution images.
In this work, we propose a simple yet effective method for patch pruning based on patch similarity, aimed at improving the efficiency of ViTs without compromising their performance.
The core idea is to selectively prune patches that exhibit high similarity, reducing redundant information processing while preserving crucial spatial and contextual information.
First, we compute a similarity matrix between patches using a distance measure derived from their feature representations.
Based on this similarity measure, we identify clusters of highly similar patches, which are subsequently pruned in a manner that minimizes information loss.
We show that pruning patches with high redundancy leads to a more compact representation while maintaining the overall performance of the ViT in various image classification tasks.
We further explore the impact of different similarity thresholds and pruning strategies on model accuracy and computational efficiency.
Experimental results on standard benchmark datasets such as ImageNet demonstrate that our patch pruning method achieves significant reductions in computation and memory usage, with only a marginal decrease in accuracy.
In addition, our approach offers flexibility in balancing the trade-off between speed and accuracy, making it a viable solution for deploying Vision Transformers on resource-constrained devices.
The simplicity of the method and its effectiveness make it a promising approach for enhancing the scalability and applicability of ViTs, particularly in real-world scenarios where efficiency is paramount.

Related Results

Reducing Computational Complexity in Vision Transformers Using Patch Slimming
Reducing Computational Complexity in Vision Transformers Using Patch Slimming
Vision Transformers (ViTs) have emerged as a dominant class of deep learning models for image recognition tasks, demonstrating superior performance compared to traditional Convolut...
DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable de...
A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)
A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)
Objective: The main purpose of the research was investigate whether to be renewed or not without the need for re-planting by rejuvenation pruning to the aged plantations of lavandi...
Advancing Transformer Efficiency with Token Pruning
Advancing Transformer Efficiency with Token Pruning
Transformer-based models have revolutionized natural language processing (NLP), achieving state-of-the-art performance across a wide range of tasks. However, their high computation...
Efficient Layer Optimizations for Deep Neural Networks
Efficient Layer Optimizations for Deep Neural Networks
Deep neural networks (DNNs) have technical issues such as long training time as the network size increases. Parameters require significant memory, which may cause migration issues ...
Refining intra-patch connectivity measures in landscape fragmentation and connectivity indices
Refining intra-patch connectivity measures in landscape fragmentation and connectivity indices
Abstract Context. Measuring intra-patch connectivity, i.e. the connectivity within a habitat patch, is important to evaluate landscape fragmentation and connectivity. Howev...
Effect of pruning on the growth and yield of cucumber (Cucumis sativus L.) Mercy Varieties
Effect of pruning on the growth and yield of cucumber (Cucumis sativus L.) Mercy Varieties
This study aims to determine the effect of pruning on the growth and yield of cucumber variety Mercy. This research was organized using a Randomized Group Design (RAK) with treatme...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...

Back to Top