Javascript must be enabled to continue!

Japanese-Mobile-Receipt-OCR-1.3K: A Comprehensive Dataset Analysis and Fine-tuned Vision-Language Model for Structured Receipt Data Extraction

Abstract We introduce Japanese-Mobile-Receipt-OCR-1.3K, a curated dataset of 1,300 real-world Japanese receipt images captured via mobile phones and annotated with 34,727 text entries. We also present a fine-tuned vision–language model for end-to-end structured receipt extraction.Our dataset analysis quantifies linguistic and layout characteristics that challenge receipt understanding. These include a heavy-tailed token length distribution (mean 9.3 tokens, maximum 255), diverse text complexity across fields, and marked heterogeneity in character composition with substantial proportions of Kanji, Kana, and numerals. We further assess semantic coverage by quantifying numeric, monetary , and temporal expressions, and measuring named entity recognition coverage across common receipt fields. Leveraging these insights, we adapt a 3B-parameter vision–language backbone to produce structured JSON outputs capturing hierarchical field relationships and standardized numeric and currency formats.Extensive experiments show consistent improvements over strong baselines. Our approach achieves notable gains in field naming consistency, hierarchical structure accuracy, and numeric formatting, alongside reductions in word error rate and character error rate.This work delivers a complete pipeline—from dataset curation and statistical analysis to model adaptation and evaluation—establishing a robust benchmark and practical methodologies for Japanese receipt understanding.

Springer Science and Business Media LLC

Sabari Nathan

2025

Title: Japanese-Mobile-Receipt-OCR-1.3K: A Comprehensive Dataset Analysis and Fine-tuned Vision-Language Model for Structured Receipt Data Extraction

Description:

Abstract We introduce Japanese-Mobile-Receipt-OCR-1.

3K, a curated dataset of 1,300 real-world Japanese receipt images captured via mobile phones and annotated with 34,727 text entries.

We also present a fine-tuned vision–language model for end-to-end structured receipt extraction.

Our dataset analysis quantifies linguistic and layout characteristics that challenge receipt understanding.

These include a heavy-tailed token length distribution (mean 9.

3 tokens, maximum 255), diverse text complexity across fields, and marked heterogeneity in character composition with substantial proportions of Kanji, Kana, and numerals.

We further assess semantic coverage by quantifying numeric, monetary , and temporal expressions, and measuring named entity recognition coverage across common receipt fields.

Leveraging these insights, we adapt a 3B-parameter vision–language backbone to produce structured JSON outputs capturing hierarchical field relationships and standardized numeric and currency formats.

Extensive experiments show consistent improvements over strong baselines.

Our approach achieves notable gains in field naming consistency, hierarchical structure accuracy, and numeric formatting, alongside reductions in word error rate and character error rate.

This work delivers a complete pipeline—from dataset curation and statistical analysis to model adaptation and evaluation—establishing a robust benchmark and practical methodologies for Japanese receipt understanding.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Zero to hero

Western images of Japan tell a seemingly incongruous story of love, sex and marriage – one full of contradictions and conflicting moral codes. We sometimes hear intriguing stories ...

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Utilizing Large Language Models for Geoscience Literature Information Extraction

Extracting information from unstructured and semi-structured geoscience literature is a crucial step in conducting geological research. The traditional machine learning extraction ...

Parent's Perception Regarding the Effects of Excessive Use of Mobile Phone on Children's Health: A Sociological Study in City Dera Ghazi Khan

The use of mobile phones among children has major effects on their health. Excessive and unrestricted use of mobile phones can contribute to various physical and psychological prob...

Warehouse Receipt Guarantee Fund As Protection For Holders Or Recipients Of Warehouse Receipt Guarantee Rights

Introduction: The agricultural sector is the backbone of the Indonesian economy. However, farmers still face various challenges, including limited access to finance, fluctuating co...

Exploring Historical Labor Markets: Computational Approaches to Job Title Extraction

Historical job advertisements provide invaluable insights into the evolution of labor markets and societaldynamics. However, extracting structured information, such as job titles, ...

Everyday Life in the "Tourist Zone"

This article makes a case for the everyday while on tour and argues that the ability to continue with everyday routines and social relationships, while at the same time moving thro...

Email:
Password:

Email:

Japanese-Mobile-Receipt-OCR-1.3K: A Comprehensive Dataset Analysis and Fine-tuned Vision-Language Model for Structured Receipt Data Extraction

Related Results