Javascript must be enabled to continue!
Photometric Ligature Extraction Technique for Urdu Optical Character Recognition
View through CrossRef
Urdu Optical Character Recognition (OCR) based on character level recognition (analytical approach) is less popular as compared to ligature level recognition (holistic approach) due to its added complexity, characters and strokes overlapping. This paper presents a holistic approach Urdu ligature extraction technique. The proposed Photometric Ligature Extraction (PLE) technique is independent of font size and column layout and is capable to handle non-overlapping and all inter and intra overlapping ligatures. It uses a customized photometric filter along with the application of X-shearing and padding with connected component analysis, to extract complete ligatures instead of extracting primary and secondary ligatures separately. A total of ~ 2,67,800 ligatures were extracted from scanned Urdu Nastaliq printed text images with an accuracy of 99.4%. Thus, the proposed framework outperforms the existing Urdu Nastaliq text extraction and segmentation algorithms. The proposed PLE framework can also be applied to other languages using the Nastaliq script style, languages such as Arabic, Persian, Pashto, and Sindhi.
Engineering, Technology & Applied Science Research
Title: Photometric Ligature Extraction Technique for Urdu Optical Character Recognition
Description:
Urdu Optical Character Recognition (OCR) based on character level recognition (analytical approach) is less popular as compared to ligature level recognition (holistic approach) due to its added complexity, characters and strokes overlapping.
This paper presents a holistic approach Urdu ligature extraction technique.
The proposed Photometric Ligature Extraction (PLE) technique is independent of font size and column layout and is capable to handle non-overlapping and all inter and intra overlapping ligatures.
It uses a customized photometric filter along with the application of X-shearing and padding with connected component analysis, to extract complete ligatures instead of extracting primary and secondary ligatures separately.
A total of ~ 2,67,800 ligatures were extracted from scanned Urdu Nastaliq printed text images with an accuracy of 99.
4%.
Thus, the proposed framework outperforms the existing Urdu Nastaliq text extraction and segmentation algorithms.
The proposed PLE framework can also be applied to other languages using the Nastaliq script style, languages such as Arabic, Persian, Pashto, and Sindhi.
Related Results
Implementasi Pembelajaran IPS Sebagai Penguatan Pendidikan Karakter di Sekolah Dasar
Implementasi Pembelajaran IPS Sebagai Penguatan Pendidikan Karakter di Sekolah Dasar
This study aims to analyze the implementation of social studies learning as strengthening character education in elementary schools. The research method used is a qualitative descr...
DIGITAL ORTHOGRAPHY AND LINGUISTICS IDENTITY: THE SOCIOLINGUISTIC IMPLICATIONS OF ERRONEOUS URDU CAPTIONS IN DIGITAL MEDIA
DIGITAL ORTHOGRAPHY AND LINGUISTICS IDENTITY: THE SOCIOLINGUISTIC IMPLICATIONS OF ERRONEOUS URDU CAPTIONS IN DIGITAL MEDIA
Social media platforms have played a significant role in which Urdu is being recognized more frequently through different means especially through captions and subtitles that help ...
Recognition of Nastaliq Urdu Text using Multi-SVM
Recognition of Nastaliq Urdu Text using Multi-SVM
Optical Character Recognition has emerged as an attractive research field nowadays. Lot of work has been done in Urdu script based on various approaches and diverse methodologies h...
Services of Radio Pakistan in the Promotion of Urdu Language & Literature
Services of Radio Pakistan in the Promotion of Urdu Language & Literature
Radio is one of the most amazing and effective inventions of the last century. Radio Pakistan came into being with the independence of Pakistan in 1947. From the very beginning, Ra...
Experimental investigations of the photometric properties of Phobos simulantÂ
Experimental investigations of the photometric properties of Phobos simulantÂ
Deriving quantitative regolith properties from photometric remote sensing data remains a challenge. Many photometric models are empirical, where the parameters lack direct physical...
Evaluating Classical and Transformer-Based Models for Urdu Abstractive Text Summarization: A Systematic Review
Evaluating Classical and Transformer-Based Models for Urdu Abstractive Text Summarization: A Systematic Review
The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems. While extractive methods have been widely studied, ...
A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization
A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization
The rapid growth of digital content in Urdu has created an urgent need for effective automatic text summarization (ATS) systems. While extractive methods have been widely studied, ...
Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture
Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture
Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP), focusing on identifying and extracting entities such as names, locations, organizations, ...

