Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Optical character recognition based document image quality assessment

View through CrossRef
Optical Character Recognition (OCR) systems play a crucial role in digitizing documents. However, their performance significantly deteriorates when handling low-quality images. Even advanced OCR systems struggle if the input is visually or structurally poor. Therefore, achieving high OCR accuracy requires assessing document image quality in terms of how well characters can be recognized, not just visual clarity. In this work, we propose a Document Image Quality Assessment (DIQA) model that predicts OCR accuracy without requiring the actual execution of an OCR engine. To assess document image quality for OCR performance, twelve distinct features are extracted that capture various aspects of sharpness, focus, edge clarity, and structural distortion. Instead of relying on subjective human opinion scores, we generate labels by measuring the actual OCR accuracy using modern engines like PaddleOCR and Keras OCR. These accuracy scores, calculated using the Levenshtein distance, serve as ground truth labels for training. Using the extracted features and corresponding OCR-based labels, we train the machine learning models to learn the relationship between image characteristics and OCR performance. The proposed models are evaluated using statistical metrics such as RMSE, PLCC, and SROCC to determine the most effective predictor. Our experiments demonstrate the importance of using OCR scores as labels, and the results show that our approach yields improved performance compared to existing baseline methodologies.
Title: Optical character recognition based document image quality assessment
Description:
Optical Character Recognition (OCR) systems play a crucial role in digitizing documents.
However, their performance significantly deteriorates when handling low-quality images.
Even advanced OCR systems struggle if the input is visually or structurally poor.
Therefore, achieving high OCR accuracy requires assessing document image quality in terms of how well characters can be recognized, not just visual clarity.
In this work, we propose a Document Image Quality Assessment (DIQA) model that predicts OCR accuracy without requiring the actual execution of an OCR engine.
To assess document image quality for OCR performance, twelve distinct features are extracted that capture various aspects of sharpness, focus, edge clarity, and structural distortion.
Instead of relying on subjective human opinion scores, we generate labels by measuring the actual OCR accuracy using modern engines like PaddleOCR and Keras OCR.
These accuracy scores, calculated using the Levenshtein distance, serve as ground truth labels for training.
Using the extracted features and corresponding OCR-based labels, we train the machine learning models to learn the relationship between image characteristics and OCR performance.
The proposed models are evaluated using statistical metrics such as RMSE, PLCC, and SROCC to determine the most effective predictor.
Our experiments demonstrate the importance of using OCR scores as labels, and the results show that our approach yields improved performance compared to existing baseline methodologies.

Related Results

Implementasi Pembelajaran IPS Sebagai Penguatan Pendidikan Karakter di Sekolah Dasar
Implementasi Pembelajaran IPS Sebagai Penguatan Pendidikan Karakter di Sekolah Dasar
This study aims to analyze the implementation of social studies learning as strengthening character education in elementary schools. The research method used is a qualitative descr...
Theoretical study of laser-cooled SH<sup>–</sup> anion
Theoretical study of laser-cooled SH<sup>–</sup> anion
The potential energy curves, dipole moments, and transition dipole moments for the <inline-formula><tex-math id="M13">\begin{document}${{\rm{X}}^1}{\Sigma ^ + }$\end{do...
Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model
Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model
<sec> <b>Purpose:</b> The interaction of intense, ultrashort laser pulses with atoms gives rise to rich non-perturbative phenomena, which are encoded within th...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
Double Exposure
Double Exposure
I. Happy Endings Chaplin’s Modern Times features one of the most subtly strange endings in Hollywood history. It concludes with the Tramp (Chaplin) and the Gamin (Paulette Godda...
Transformation of recording features in an electronic environment
Transformation of recording features in an electronic environment
The article deals with one of the main theoretical problems of document science related to the definition of document features. This problem is also of applied importance, since wh...
Deep Learning Based Image Recognition for 5G Smart IoT Applications
Deep Learning Based Image Recognition for 5G Smart IoT Applications
Abstract With the advent of the 5G era,the development of massive data learning algorithms and in-depth research on neural networks, deep learning methods are widely used i...
Ukrainian Embroidery as a Type of Document
Ukrainian Embroidery as a Type of Document
The purpose of the article is to determine the general and specific features of Ukrainian embroidery as a type of carrier of documented information. The methodology. We chose the ...

Back to Top