Paper
13 January 2003 Document image improvement for OCR as a classification problem
Kristen M. Summers
Author Affiliations +
Proceedings Volume 5010, Document Recognition and Retrieval X; (2003) https://doi.org/10.1117/12.476023
Event: Electronic Imaging 2003, 2003, Santa Clara, CA, United States
Abstract
In support of the goal of automatically selecting methods of enhancing an image to improve the accuracy of OCR on that image, we consider the problem of determining whether to apply each of a set of methods as a supervised classification problem for machine learning. We characterize each image according to a combination of two sets of measures: a set that are intended to reflect the degree of particular types of noise present in documents in a single font of Roman or similar script and a more general set based on connected component statistics. We consider several potential methods of image improvement, each of which constitutes its own 2-class classification problem, according to whether transforming the image with this method improves the accuracy of OCR. In our experiments, the results varied for the different image transformation methods, but the system made the correct choice in 77% of the cases in which the decision affected the OCR score (in the range [0,1]) by at least .01, and it made the correct choice 64% of the time overall.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kristen M. Summers "Document image improvement for OCR as a classification problem", Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); https://doi.org/10.1117/12.476023
Lens.org Logo
CITATIONS
Cited by 10 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image classification

Machine learning

Image processing

Image enhancement

Image analysis

Neural networks

Back to Top