Javascript must be enabled to continue!
Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks
View through CrossRef
There are millions of scanned documents worldwide in around 4 thousand
languages. Searching for information in a scanned document requires a
text layer to be available and indexed. Preparation of a text layer
requires recognition of character and sub-region patterns and
associating with a human interpretation. Developing an optical character
recognition (OCR) system for each and every language is a very difficult
task if not impossible. There is a strong need for systems that add on
top of the existing OCR technologies by learning from them and unifying
disparate multitude of many a system. In this regard, we propose an
algorithm that leverages the fact that we are dealing with scanned
documents of handwritten text regions from across diverse domains and
language settings. We observe that the text regions have consistent
bounding box sizes and any large font or tiny font scenarios can be
handled in preprocessing or postprocessing phases. The image subregions
are smaller in size in scanned text documents compared to subregions
formed by common objects in general purpose images. We propose and
validate the hypothesis that a much simpler convolution neural network
(CNN) having very few layers and less number of filters can be used for
detecting individual subregion classes. For detection of several
hundreds of classes, multiple such simpler models can be pooled to
operate simultaneously on a document. The advantage of going by pools of
subregion specific models is the ability to deal with incremental
addition of hundreds of newer classes over time, without disturbing the
previous models in the continual learning scenario. Such an approach has
distinctive advantage over using a single monolithic model where
subregions classes share and interfere via a bulky common neural
network. We report here an efficient algorithm for building a subregion
specific lightweight CNN models. The training data for the CNN proposed,
requires engineering synthetic data points that consider both pattern of
interest and non-patterns as well. We propose and validate the
hypothesis that an image canvas in which optimal amount of pattern and
non-pattern can be formulated using a means squared error loss function
to influence filter for training from the data. The CNN hence trained
has the capability to identify the character-object in presence of
several other objects on a generalized test image of a scanned document.
In this setting some of the key observations are in a CNN, learning a
filter depends not only on the abundance of patterns of interest but
also on the presence of a non-pattern context. Our experiments have led
to some of the key observations - (i) a pattern cannot be over-expressed
in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a
non-pattern can be of salt and pepper type noise and finally (iv) it is
sufficient to provide a non-pattern context to a modest representation
of a pattern to result in strong individual sub-region class models. We
have carried out studies and reported \textit{mean
average precision} scores on various data sets including (1) MNIST
digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small
alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34),
(6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari
words(93.20) and also on medical prescriptions and observed
high-performance metrics of mean average precision over 90%. The
algorithm serves as a kernel in the automatic annotation of digital
documents in diverse scenarios such as annotation of ancient manuscripts
and hand-written health records.
Title: Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks
Description:
There are millions of scanned documents worldwide in around 4 thousand
languages.
Searching for information in a scanned document requires a
text layer to be available and indexed.
Preparation of a text layer
requires recognition of character and sub-region patterns and
associating with a human interpretation.
Developing an optical character
recognition (OCR) system for each and every language is a very difficult
task if not impossible.
There is a strong need for systems that add on
top of the existing OCR technologies by learning from them and unifying
disparate multitude of many a system.
In this regard, we propose an
algorithm that leverages the fact that we are dealing with scanned
documents of handwritten text regions from across diverse domains and
language settings.
We observe that the text regions have consistent
bounding box sizes and any large font or tiny font scenarios can be
handled in preprocessing or postprocessing phases.
The image subregions
are smaller in size in scanned text documents compared to subregions
formed by common objects in general purpose images.
We propose and
validate the hypothesis that a much simpler convolution neural network
(CNN) having very few layers and less number of filters can be used for
detecting individual subregion classes.
For detection of several
hundreds of classes, multiple such simpler models can be pooled to
operate simultaneously on a document.
The advantage of going by pools of
subregion specific models is the ability to deal with incremental
addition of hundreds of newer classes over time, without disturbing the
previous models in the continual learning scenario.
Such an approach has
distinctive advantage over using a single monolithic model where
subregions classes share and interfere via a bulky common neural
network.
We report here an efficient algorithm for building a subregion
specific lightweight CNN models.
The training data for the CNN proposed,
requires engineering synthetic data points that consider both pattern of
interest and non-patterns as well.
We propose and validate the
hypothesis that an image canvas in which optimal amount of pattern and
non-pattern can be formulated using a means squared error loss function
to influence filter for training from the data.
The CNN hence trained
has the capability to identify the character-object in presence of
several other objects on a generalized test image of a scanned document.
In this setting some of the key observations are in a CNN, learning a
filter depends not only on the abundance of patterns of interest but
also on the presence of a non-pattern context.
Our experiments have led
to some of the key observations - (i) a pattern cannot be over-expressed
in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a
non-pattern can be of salt and pepper type noise and finally (iv) it is
sufficient to provide a non-pattern context to a modest representation
of a pattern to result in strong individual sub-region class models.
We
have carried out studies and reported \textit{mean
average precision} scores on various data sets including (1) MNIST
digits(95.
77), (2) E-MNIST capital alphabet(81.
26), (3) EMNIST small
alphabet(73.
32) (4) Kannada digits(95.
77), (5) Kannada letters(90.
34),
(6) Devanagari letters(100) (7) Telugu words(93.
20) (8) Devanagari
words(93.
20) and also on medical prescriptions and observed
high-performance metrics of mean average precision over 90%.
The
algorithm serves as a kernel in the automatic annotation of digital
documents in diverse scenarios such as annotation of ancient manuscripts
and hand-written health records.
Related Results
Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks
Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks
<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available a...
Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks
Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks
There are millions of scanned documents worldwide in around 4 thousand
languages. Searching for information in a scanned document requires a
text layer to be available and indexed....
The value of the malignant subregion-based texture analysis in predicting the Ki-67 status in breast cancer
The value of the malignant subregion-based texture analysis in predicting the Ki-67 status in breast cancer
ObjectiveTo evaluate the value of the malignant subregion-based texture analysis in predicting Ki-67 status in breast cancer.Materials and methodsThe dynamic contrast-enhanced magn...
Tiling Periodicity
Tiling Periodicity
We contribute to combinatorics and algorithmics of words by introducing new types of periodicities in words. A tiling period of a word w is partial word u such that w can be decomp...
To tile or not to tile?
To tile or not to tile?
Soils and landscapes vary within centimeters to decameters, which is not captured by state-of-the-art land-surface models that operate on kilometer scale. This leads to potential m...
Shallow Gas In The Oseberg, Brage And Troll Fields North Sea, 60°30' N
Shallow Gas In The Oseberg, Brage And Troll Fields North Sea, 60°30' N
Abstract
An integrated approach using geological, seismic, geotechnical and well log data have been used to investigate the presence of shallow gas in the Oseberg...
Practice of Ultra-Deepwater Shallow Well Construction in Nature Gas Hydrate and Shallow Gas Formation
Practice of Ultra-Deepwater Shallow Well Construction in Nature Gas Hydrate and Shallow Gas Formation
Abstract
Due to the large water depth and geological structure, a large amount of nature gas hydrate (NGH) and shallow gas are buried in the shallow layer in the dee...
Robot assisted tiling of glass mosaics with image processing
Robot assisted tiling of glass mosaics with image processing
PurposeThis paper describes a robotic system developed for tiling mosaics based on image processing according to customer expectations.Design/methodology/approachMany varieties of ...

