Javascript must be enabled to continue!

Token Pruning for Efficient NLP, Vision, and Speech Models

The rapid growth of Transformer-based architectures has led to significant advancements in natural language processing (NLP), computer vision, and speech processing. However, their increasing computational demands pose challenges for real-time inference, edge deployment, and energy efficiency. Token pruning has emerged as a promising solution to mitigate these issues by dynamically reducing sequence lengths during model execution while preserving task performance. This survey provides a comprehensive review of token pruning techniques, categorizing them based on their methodologies, such as static vs. dynamic pruning, early exit strategies, and adaptive token selection. We explore their effectiveness across various domains, including text classification, machine translation, object detection, and speech recognition. Additionally, we discuss the trade-offs between efficiency and accuracy, challenges in generalization, and the integration of token pruning with other model compression techniques. Finally, we outline future research directions, emphasizing self-supervised token selection, multimodal pruning, and hardware-aware optimization. By consolidating recent advancements, this survey aims to serve as a foundational reference for researchers and practitioners seeking to enhance the efficiency of deep learning models through token pruning.

MDPI AG

Yong Jianhong

2025

Title: Token Pruning for Efficient NLP, Vision, and Speech Models

Description:

The rapid growth of Transformer-based architectures has led to significant advancements in natural language processing (NLP), computer vision, and speech processing.

However, their increasing computational demands pose challenges for real-time inference, edge deployment, and energy efficiency.

Token pruning has emerged as a promising solution to mitigate these issues by dynamically reducing sequence lengths during model execution while preserving task performance.

This survey provides a comprehensive review of token pruning techniques, categorizing them based on their methodologies, such as static vs.

dynamic pruning, early exit strategies, and adaptive token selection.

We explore their effectiveness across various domains, including text classification, machine translation, object detection, and speech recognition.

Additionally, we discuss the trade-offs between efficiency and accuracy, challenges in generalization, and the integration of token pruning with other model compression techniques.

Finally, we outline future research directions, emphasizing self-supervised token selection, multimodal pruning, and hardware-aware optimization.

By consolidating recent advancements, this survey aims to serve as a foundational reference for researchers and practitioners seeking to enhance the efficiency of deep learning models through token pruning.

Back

Transformer-based models have revolutionized natural language processing (NLP), achieving state-of-the-art performance across a wide range of tasks. However, their high computation...

Ground-Level Pruning at Right Time Improves Flower Yield of Old Plantation of Rosa damascena Without Compromising the Quality of Essential Oil

The essential oil of Rosa damascena is extensively used as a key natural ingredient in the perfume and cosmetic industries. However, the productivity and quality of rose oil are a ...

AI and Incidental Findings

Photo by Accuray on Unsplash INTRODUCTION Delayed and missed follow-up on incidental findings threatens patient health and is a major financial risk for healthcare systems. The hea...

Accelerating NLP with Token Pruning: A Survey of Methods and Applications

Transformer-based models have revolutionized natural language processing (NLP) by achieving state-of-the-art performance across a wide range of tasks. However, their high computati...

The Role of Token Pruning in Efficient Transformer Architectures

The rapid advancements in deep learning have led to the widespread adoption of Transformer-based models, which now power a variety of natural language processing (NLP) applications...

DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks

The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable de...

Effect of Pruning Intensities on the Performance of Fruit Plants under Mid-Hill Condition of Eastern Himalayas: Case Study on Guava

Current study was undertaken to highlight the effect of pruning on improving vigor of old orchards and increasing performance in terms of fruit yield and quality under water and nu...

A research on rejuvenation pruning of lavandin (Lavandula x intermedia Emeric ex Loisel.)

Objective: The main purpose of the research was investigate whether to be renewed or not without the need for re-planting by rejuvenation pruning to the aged plantations of lavandi...

Email:
Password:

Email:

Token Pruning for Efficient NLP, Vision, and Speech Models

Related Results