Javascript must be enabled to continue!

A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization

The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems. While extractive methods have been widely studied, abstractive summarization for Urdu remains largely unexplored due to the language’s complex morphology and rich literary tradition. This paper systematically evaluates four transformer-based language models (BERT-Urdu, BART, mT5, and GPT-2) for Urdu abstractive summarization, comparing their performance against conventional machine learning and deep learning approaches. Using multiple Urdu datasets—including the Urdu Summarization Corpus, Fake News Dataset, and Urdu-Instruct-News—we show that fine-tuned Transformer Language Models (TLMs) consistently outperform traditional methods, with the multilingual mT5 model achieving a 0.42 absolute improvement in F1-score over the best baseline. Our analysis reveals that mT5’s architecture is particularly effective at handling Urdu-specific challenges such as right-to-left script processing, diacritic interpretation, and complex verb–noun compounding. Furthermore, we present empirically validated hyperparameter configurations and training strategies for Urdu ATS, establishing transformer-based approaches as the new state-of-the-art for Urdu summarization. Notably, mT5 outperforms Seq2Seq baselines by up to 20% in ROUGE-L, underscoring the efficacy of Transformer-based models for low-resource languages. This work contributes both a systematic review of prior research and a novel empirical benchmark for advancing Urdu abstractive summarization.

MDPI AG

Muhammad Azhar Adeen Amjad Deshinta Arrova Dewi Shahreen Kasim

Information

2025

Title: A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization

Description:

The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems.

While extractive methods have been widely studied, abstractive summarization for Urdu remains largely unexplored due to the language’s complex morphology and rich literary tradition.

This paper systematically evaluates four transformer-based language models (BERT-Urdu, BART, mT5, and GPT-2) for Urdu abstractive summarization, comparing their performance against conventional machine learning and deep learning approaches.

Using multiple Urdu datasets—including the Urdu Summarization Corpus, Fake News Dataset, and Urdu-Instruct-News—we show that fine-tuned Transformer Language Models (TLMs) consistently outperform traditional methods, with the multilingual mT5 model achieving a 0.

42 absolute improvement in F1-score over the best baseline.

Our analysis reveals that mT5’s architecture is particularly effective at handling Urdu-specific challenges such as right-to-left script processing, diacritic interpretation, and complex verb–noun compounding.

Furthermore, we present empirically validated hyperparameter configurations and training strategies for Urdu ATS, establishing transformer-based approaches as the new state-of-the-art for Urdu summarization.

Notably, mT5 outperforms Seq2Seq baselines by up to 20% in ROUGE-L, underscoring the efficacy of Transformer-based models for low-resource languages.

This work contributes both a systematic review of prior research and a novel empirical benchmark for advancing Urdu abstractive summarization.

Back

The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems. While extractive methods have been widely studied, ...

Abstractive text summarization of low-resourced languages using deep learning

Background Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarizat...

Automatic summarization of Malayalam documents using clause identification method

<span>Text summarization is an active research area in the field of natural language processing. Huge amount of information in the internet necessitates the development of au...

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

Automatic Text Summarization Berdasarkan Pendekatan Statistika pada Dokumen Berbahasa Indonesia

Abstract—Propelled by the modern technological innovations data and text will be more abundant throughout the year. With this much text, automatic text summarization is needed now ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Automatic text summarization based on extractive-abstractive method

The choice of this study has a significant impact on daily life. In various fields such as journalism, academia, business, and more, large amounts of text need to be processed quic...

Email:
Password:

Email:

A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization

Related Results