Paper
8 May 2012 Arabic handwritten: pre-processing and segmentation
Author Affiliations +
Abstract
This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.
© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Makki Maliki, Sabah Jassim, Naseer Al-Jawad, and Harin Sellahewa "Arabic handwritten: pre-processing and segmentation", Proc. SPIE 8406, Mobile Multimedia/Image Processing, Security, and Applications 2012, 84060D (8 May 2012); https://doi.org/10.1117/12.917555
Lens.org Logo
CITATIONS
Cited by 6 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Optical character recognition

Algorithm development

Image segmentation

Detection and tracking algorithms

Principal component analysis

Computer security

RELATED CONTENT


Back to Top