Stochastic Segment Modeling for Offline Handwriting Recognition.
Natarajan, Premkumar.
2012
-
Abstract: Much of
the world's information in industry/office settings continues to be initially recorded
in the form of handwritten documents. Examples include notes taken at meetings,
lectures, and even medical records. One important drawback inherent to such handwritten
notes is that the information contained in them is opaque to electronic data management
systems. That drawback can be ... read moreovercome by employing technology that is capable of
automatically generating electronic transcriptions of the handwritten text. Such
technology is referred to in the research literature as offline handwriting recognition
technology. Over the past decade, the hidden Markov model (HMM) has become the paradigm
of choice for the task of offline handwriting recognition. In this dissertation, we
present a new Stochastic Segment Modeling technique for recognition of offline
handwriting. The technique builds upon an existing HMM-based system and incorporates
broader, long-term context into the recognition process. Such long-term context is
typically encoded in the form of structural features extracted from segments of
handwritten text. Traditionally, structural features have been used only in recognition
approaches that rely on accurate segmentation of words into smaller units (sub-words or
characters). However, such segmentation-based approaches do not perform well on
real-world handwritten images, because breaks and merges in glyphs typically create new
connected components that are not observed in the training data. To mitigate the problem
of having to derive accurate segmentation from connected components, we present a novel
framework where a HMM-based recognition system trained on shorter-span
fixed-width-window features is used to generate candidate 2-D character images (the
"Stochastic Segments"). A separate classifier that uses structural features extracted
from the stochastic character segments generates a new set of scores that are
independent of the HMM scores. Finally, the scores from the HMM system and from the
structural feature classifier are used in combination to generate a final hypothesis
that is better than the results from either the HMM or from structural matching alone.
We demonstrate the efficacy of our approach by reporting experimental results on a large
corpus of handwritten Arabic documents.
Thesis (Ph.D.)--Tufts University, 2012.
Submitted to the Dept. of Electrical Engineering.
Advisor: Joseph Noonan.
Committee: John Makhoul, Brian Tracey, and John Hogan.
Keywords: Electrical engineering, and Computer science.read less - ID:
- 6969zc33b
- Component ID:
- tufts:20942
- To Cite:
- TARC Citation Guide EndNote