Javascript must be enabled to continue!
Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition
View through CrossRef
The day-to-day working of an organization produces a massive volume of unstructured data in the form of invoices, legal contracts, mortgage processing forms, and many more. Organizations can utilize the insights concealed in such unstructured documents for their operational benefit. However, analyzing and extracting insights from such numerous and complex unstructured documents is a tedious task. Hence, the research in this area is encouraging the development of novel frameworks and tools that can automate the key information extraction from unstructured documents. However, the availability of standard, best-quality, and annotated unstructured document datasets is a serious challenge for accomplishing the goal of extracting key information from unstructured documents. This work expedites the researcher’s task by providing a high-quality, highly diverse, multi-layout, and annotated invoice documents dataset for extracting key information from unstructured documents. Researchers can use the proposed dataset for layout-independent unstructured invoice document processing and to develop an artificial intelligence (AI)-based tool to identify and extract named entities in the invoice documents. Our dataset includes 630 invoice document PDFs with four different layouts collected from diverse suppliers. As far as we know, our invoice dataset is the only openly available dataset comprising high-quality, highly diverse, multi-layout, and annotated invoice documents.
Title: Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition
Description:
The day-to-day working of an organization produces a massive volume of unstructured data in the form of invoices, legal contracts, mortgage processing forms, and many more.
Organizations can utilize the insights concealed in such unstructured documents for their operational benefit.
However, analyzing and extracting insights from such numerous and complex unstructured documents is a tedious task.
Hence, the research in this area is encouraging the development of novel frameworks and tools that can automate the key information extraction from unstructured documents.
However, the availability of standard, best-quality, and annotated unstructured document datasets is a serious challenge for accomplishing the goal of extracting key information from unstructured documents.
This work expedites the researcher’s task by providing a high-quality, highly diverse, multi-layout, and annotated invoice documents dataset for extracting key information from unstructured documents.
Researchers can use the proposed dataset for layout-independent unstructured invoice document processing and to develop an artificial intelligence (AI)-based tool to identify and extract named entities in the invoice documents.
Our dataset includes 630 invoice document PDFs with four different layouts collected from diverse suppliers.
As far as we know, our invoice dataset is the only openly available dataset comprising high-quality, highly diverse, multi-layout, and annotated invoice documents.
Related Results
Theoretical study of laser-cooled SH<sup>–</sup> anion
Theoretical study of laser-cooled SH<sup>–</sup> anion
The potential energy curves, dipole moments, and transition dipole moments for the <inline-formula><tex-math id="M13">\begin{document}${{\rm{X}}^1}{\Sigma ^ + }$\end{do...
Invoice Automation And Processing System
Invoice Automation And Processing System
This project presents an Automated Invoice Processing System developed to simplify, automate, and streamline invoice handling in a digital business environment. The system is capab...
An OCR-Based Intelligent System for Automated Invoice Data Extraction
An OCR-Based Intelligent System for Automated Invoice Data Extraction
Accurate and efficient extraction of invoice data is an essential requirement in modern business and
financial operations, where organizations process large numbers of in- voices e...
What does e-invoice data bring to SNA and Real-Time Economy?
What does e-invoice data bring to SNA and Real-Time Economy?
Abstract
Governments are exploring the use of big data to improve economic statistics. Big data is characterized by its large volume, high velocity, and variety of informat...
Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model
Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model
<sec> <b>Purpose:</b> The interaction of intense, ultrashort laser pulses with atoms gives rise to rich non-perturbative phenomena, which are encoded within th...
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Abstract
Introduction
The short half-life of standard factor VIII (FVIII) products means that frequent injections (3 to 4 times/week) are needed for e...
PENGEMBANGAN SISTEM INFORMASI INVOICE BERBASIS WEBSITE PADA PT. XYZ
PENGEMBANGAN SISTEM INFORMASI INVOICE BERBASIS WEBSITE PADA PT. XYZ
Sistem informasi invoice merupakan sistem yang berfungsi untuk mengelola dokumen penagihan yang ditujukan kepada konsumen oleh instansi atau perusahaan. Seirin...
Crescentic IgM κ MIDD with Thrombotic Microangiopathy in Waldenström’s Macroglobulinemia: A Rare and Aggressive MGRS Presentation: A Case Report and Literature Review
Crescentic IgM κ MIDD with Thrombotic Microangiopathy in Waldenström’s Macroglobulinemia: A Rare and Aggressive MGRS Presentation: A Case Report and Literature Review
Introduction: Monoclonal immunoglobulin deposition disease (MIDD) is a rare monoclonal gammopathy of renal significance (MGRS) characterized by linear, non-organized deposition of ...

