Javascript must be enabled to continue!
Efficient Patch Pruning for Vision Transformers via Patch Similarity
View through CrossRef
Vision Transformers (ViTs) have emerged as a powerful alternative to convolutional neural networks (CNNs) for visual recognition tasks due to their ability to model long-range dependencies in images through self-attention. However, the computational complexity and memory consumption of ViTs scale quadratically with the number of input patches, making them inefficient, especially for high-resolution images. In this work, we propose a simple yet effective method for patch pruning based on patch similarity, aimed at improving the efficiency of ViTs without compromising their performance. The core idea is to selectively prune patches that exhibit high similarity, reducing redundant information processing while preserving crucial spatial and contextual information. First, we compute a similarity matrix between patches using a distance measure derived from their feature representations. Based on this similarity measure, we identify clusters of highly similar patches, which are subsequently pruned in a manner that minimizes information loss. We show that pruning patches with high redundancy leads to a more compact representation while maintaining the overall performance of the ViT in various image classification tasks. We further explore the impact of different similarity thresholds and pruning strategies on model accuracy and computational efficiency. Experimental results on standard benchmark datasets such as ImageNet demonstrate that our patch pruning method achieves significant reductions in computation and memory usage, with only a marginal decrease in accuracy. In addition, our approach offers flexibility in balancing the trade-off between speed and accuracy, making it a viable solution for deploying Vision Transformers on resource-constrained devices. The simplicity of the method and its effectiveness make it a promising approach for enhancing the scalability and applicability of ViTs, particularly in real-world scenarios where efficiency is paramount.
Title: Efficient Patch Pruning for Vision Transformers via Patch Similarity
Description:
Vision Transformers (ViTs) have emerged as a powerful alternative to convolutional neural networks (CNNs) for visual recognition tasks due to their ability to model long-range dependencies in images through self-attention.
However, the computational complexity and memory consumption of ViTs scale quadratically with the number of input patches, making them inefficient, especially for high-resolution images.
In this work, we propose a simple yet effective method for patch pruning based on patch similarity, aimed at improving the efficiency of ViTs without compromising their performance.
The core idea is to selectively prune patches that exhibit high similarity, reducing redundant information processing while preserving crucial spatial and contextual information.
First, we compute a similarity matrix between patches using a distance measure derived from their feature representations.
Based on this similarity measure, we identify clusters of highly similar patches, which are subsequently pruned in a manner that minimizes information loss.
We show that pruning patches with high redundancy leads to a more compact representation while maintaining the overall performance of the ViT in various image classification tasks.
We further explore the impact of different similarity thresholds and pruning strategies on model accuracy and computational efficiency.
Experimental results on standard benchmark datasets such as ImageNet demonstrate that our patch pruning method achieves significant reductions in computation and memory usage, with only a marginal decrease in accuracy.
In addition, our approach offers flexibility in balancing the trade-off between speed and accuracy, making it a viable solution for deploying Vision Transformers on resource-constrained devices.
The simplicity of the method and its effectiveness make it a promising approach for enhancing the scalability and applicability of ViTs, particularly in real-world scenarios where efficiency is paramount.
Related Results
Reducing Computational Complexity in Vision Transformers Using Patch Slimming
Reducing Computational Complexity in Vision Transformers Using Patch Slimming
Vision Transformers (ViTs) have emerged as a dominant class of deep learning models for image recognition tasks, demonstrating superior performance compared to traditional Convolut...
DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable de...
Effect of Pruning Intensities on the Performance of Fruit Plants under Mid-Hill Condition of Eastern Himalayas: Case Study on Guava
Effect of Pruning Intensities on the Performance of Fruit Plants under Mid-Hill Condition of Eastern Himalayas: Case Study on Guava
Current study was undertaken to highlight the effect of pruning on improving vigor of old orchards and increasing performance in terms of fruit yield and quality under water and nu...
A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)
A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)
Objective: The main purpose of the research was investigate whether to be renewed or not without the need for re-planting by rejuvenation pruning to the aged plantations of lavandi...
The Influence of Pruning on the Growth and Wood Properties of Populus deltoides “Nanlin 3804”
The Influence of Pruning on the Growth and Wood Properties of Populus deltoides “Nanlin 3804”
During the natural growth of trees, a large number of branches are formed, with a negative impact on timber quality. Therefore, pruning is an essential measure in forest cultivatio...
Advancing Transformer Efficiency with Token Pruning
Advancing Transformer Efficiency with Token Pruning
Transformer-based models have revolutionized natural language processing (NLP), achieving state-of-the-art performance across a wide range of tasks. However, their high computation...
Strategic pruning for manipulation of cropping cycles to maximize off season yield in guava (Psidium guajava L.) cv. Lalit
Strategic pruning for manipulation of cropping cycles to maximize off season yield in guava (Psidium guajava L.) cv. Lalit
Guava (Psidium guajava L.) is a wonderful fruit crop responding incredibly well to pruning practices, so pruning is an essential management tool to regulate crop load, manipulate f...
Efficient Layer Optimizations for Deep Neural Networks
Efficient Layer Optimizations for Deep Neural Networks
Deep neural networks (DNNs) have technical issues such as long training time as the network size increases. Parameters require significant memory, which may cause migration issues ...

