Paper
30 March 1995 Preliminary evaluation of histogram-based binarization algorithms
Junichi Kanai, Kevin O. Grover
Author Affiliations +
Proceedings Volume 2422, Document Recognition II; (1995) https://doi.org/10.1117/12.205823
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1995, San Jose, CA, United States
Abstract
To date, most optical character recognition (OCR) systems process binary document images, and the quality of the input image strongly affects their performance. Since a binarization process is inherently lossy, different algorithms typically produce different binary images from the same gray scale image. The objective of this research is to study effects of global binarization algorithms on the performance of OCR systems. Several binarization methods were examined: the best fixed threshold value for the data set, the ideal histogram method, and Otsu's algorithm. Four contemporary OCR systems and 50 hard copy pages containing 91,649 characters were used in the experiments. These pages were digitized at 300 dpi and 8 bits/pixel, and 36 different threshold values (ranging from 59 to 199 in increments of 4) were used. The resulting 1,800 binary images were processed by all four OCR systems. All systems made approximately 40% more errors from images generated by Otsu's method than those of the ideal histogram method. Two of the systems made approximately the same number of errors from images generated by the best fixed threshold value and Otsu's method.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Junichi Kanai and Kevin O. Grover "Preliminary evaluation of histogram-based binarization algorithms", Proc. SPIE 2422, Document Recognition II, (30 March 1995); https://doi.org/10.1117/12.205823
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image processing

Binary data

Scanners

Error analysis

Detection and tracking algorithms

Statistical analysis

RELATED CONTENT


Back to Top