Paper
29 January 1999 Robust language-independent OCR system
Zhidong A. Lu, Issam Bazzi, Andras Kornai, John Makhoul, Premkumar S. Natarajan, Richard Schwartz
Author Affiliations +
Proceedings Volume 3584, 27th AIPR Workshop: Advances in Computer-Assisted Recognition; (1999) https://doi.org/10.1117/12.339811
Event: The 27th AIPR Workshop: Advances in Computer-Assisted Recognition, 1998, Washington, DC, United States
Abstract
We present a language-independent optical character recognition system that is capable, in principle, of recognizing printed text from most of the world's languages. For each new language or script the system requires sample training data along with ground truth at the text-line level; there is no need to specify the location of either the lines or the words and characters. The system uses hidden Markov modeling technology to model each character. In addition to language independence, the technology enhances performance for degraded data, such as fax, by using unsupervised adaptation techniques. Thus far, we have demonstrated the language-independence of this approach for Arabic, English, and Chinese. Recognition results are presented in this paper, including results on faxed data.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhidong A. Lu, Issam Bazzi, Andras Kornai, John Makhoul, Premkumar S. Natarajan, and Richard Schwartz "Robust language-independent OCR system", Proc. SPIE 3584, 27th AIPR Workshop: Advances in Computer-Assisted Recognition, (29 January 1999); https://doi.org/10.1117/12.339811
Lens.org Logo
CITATIONS
Cited by 40 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Feature extraction

Speech recognition

Data modeling

Databases

Detection and tracking algorithms

Image quality

RELATED CONTENT


Back to Top