Translator Disclaimer
21 December 2000 Text segmentation of machine-printed Gurmukhi script
Author Affiliations +
Proceedings Volume 4307, Document Recognition and Retrieval VIII; (2000)
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
This paper describes a scheme for text segmentation of machine printed Gurmukhi script documents. There has been a tremendous research in text segmentation of machine printed Roman script documents. In contrast there has been very little reported research on text segmentation of Indian language scripts in general and Gurmukhi script in particular. Research in the field of text segmentation of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectivity of characters on the headline, two or more characters in a word having intersecting minimum bounding rectangles along horizontal direction, multi-component characters, touching characters which are present even in clean documents and horizontally overlapping text segments. In our proposed method we have used horizontal projection profile to successively divide the text area into small sub-areas or horizontal strips each of which contains (1) A set of text lines or (2) A single text line or (3) Sub-parts of text lines. Using vertical projection profile the horizontal strips are physically split into smaller units such as words, characters or sub characters depending on the type of the strip. Finally each of this unit is segmented into a set of connected components. The classifier is trained to recognize these connected components which are later merged to form character(s).
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gurpreet Singh Lehal and Chandan Singh "Text segmentation of machine-printed Gurmukhi script", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000);

Back to Top