Translator Disclaimer
1 June 1994 Discrimination of handwritten from machine-printed text
Author Affiliations +
The problem of discriminating handwritten from machine-printed text is important for character recognition applications because most recognition algorithms for handwritten text differ considerably from those for machine-printed text. Therefore, an efficient segregation of the two streams is necessary prior to recognition in order to minimize systems cost and complexity. Several techniques have been proposed based on character connectivity and heuristics; but very few achieve results at the 99% level. The technique described in this paper has been proven to yield performance figures in the high 99% on tens of thousands of IRS tax forms and postal envelopes. The technique proposed is based on the use of density of black to white for a given binary field and the overall density of pixels for a gray-scale field as a main discrimination feature. First, the given field is boxed very closely and its boundaries are isolated in space. A horizontal histogram is extracted for this field, and the total number of black pixels is computed. The amount of black pixels per unit area is generated for binary text, and the sum of all pixels is generated for gray-level text. When tested on a large number of samples, these densities cluster following distinct normal distributions for handwritten and machine-printed text respectively. Fuzzy thresholds are set where the two normal curves cross with a confidence interval of 99%. The samples whose densities fall below the threshold are considered handwritten and the samples whose densities fall above the threshold are considered machine-printed.
© (1994) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Steve Chahal "Discrimination of handwritten from machine-printed text", Proc. SPIE 2238, Hybrid Image and Signal Processing IV, (1 June 1994);

Back to Top