Javascript must be enabled to continue!
Token Pruning for Efficient NLP, Vision, and Speech Models
View through CrossRef
The rapid growth of Transformer-based architectures has led to significant advancements in natural language processing (NLP), computer vision, and speech processing. However, their increasing computational demands pose challenges for real-time inference, edge deployment, and energy efficiency. Token pruning has emerged as a promising solution to mitigate these issues by dynamically reducing sequence lengths during model execution while preserving task performance. This survey provides a comprehensive review of token pruning techniques, categorizing them based on their methodologies, such as static vs. dynamic pruning, early exit strategies, and adaptive token selection. We explore their effectiveness across various domains, including text classification, machine translation, object detection, and speech recognition. Additionally, we discuss the trade-offs between efficiency and accuracy, challenges in generalization, and the integration of token pruning with other model compression techniques. Finally, we outline future research directions, emphasizing self-supervised token selection, multimodal pruning, and hardware-aware optimization. By consolidating recent advancements, this survey aims to serve as a foundational reference for researchers and practitioners seeking to enhance the efficiency of deep learning models through token pruning.
Title: Token Pruning for Efficient NLP, Vision, and Speech Models
Description:
The rapid growth of Transformer-based architectures has led to significant advancements in natural language processing (NLP), computer vision, and speech processing.
However, their increasing computational demands pose challenges for real-time inference, edge deployment, and energy efficiency.
Token pruning has emerged as a promising solution to mitigate these issues by dynamically reducing sequence lengths during model execution while preserving task performance.
This survey provides a comprehensive review of token pruning techniques, categorizing them based on their methodologies, such as static vs.
dynamic pruning, early exit strategies, and adaptive token selection.
We explore their effectiveness across various domains, including text classification, machine translation, object detection, and speech recognition.
Additionally, we discuss the trade-offs between efficiency and accuracy, challenges in generalization, and the integration of token pruning with other model compression techniques.
Finally, we outline future research directions, emphasizing self-supervised token selection, multimodal pruning, and hardware-aware optimization.
By consolidating recent advancements, this survey aims to serve as a foundational reference for researchers and practitioners seeking to enhance the efficiency of deep learning models through token pruning.
Related Results
Advancing Transformer Efficiency with Token Pruning
Advancing Transformer Efficiency with Token Pruning
Transformer-based models have revolutionized natural language processing (NLP), achieving state-of-the-art performance across a wide range of tasks. However, their high computation...
DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable de...
A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)
A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)
Objective: The main purpose of the research was investigate whether to be renewed or not without the need for re-planting by rejuvenation pruning to the aged plantations of lavandi...
Token-Level Pruning in Attention Models
Token-Level Pruning in Attention Models
Transformer-based models have revolutionized natural language processing (NLP), achieving state-of-the-art performance across a wide range of tasks. However, their high computation...
Efficient Layer Optimizations for Deep Neural Networks
Efficient Layer Optimizations for Deep Neural Networks
Deep neural networks (DNNs) have technical issues such as long training time as the network size increases. Parameters require significant memory, which may cause migration issues ...
Effect of pruning on the growth and yield of cucumber (Cucumis sativus L.) Mercy Varieties
Effect of pruning on the growth and yield of cucumber (Cucumis sativus L.) Mercy Varieties
This study aims to determine the effect of pruning on the growth and yield of cucumber variety Mercy. This research was organized using a Randomized Group Design (RAK) with treatme...
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
High-resolution Vision-Language Models (VLMs) are widely used in multimodal tasks to enhance accuracy by preserving detailed image information. However, these models often generate...
Research and application of high-power pruning robot based on RTK positioning and heavy load mountings
Research and application of high-power pruning robot based on RTK positioning and heavy load mountings
Aiming at the problem that short circuit tripping may be caused by the
insufficient safe distance between trees and lower phase conductors in
high voltage transmission line corrido...

