Paper
14 February 2015 A unified approach for development of Urdu Corpus for OCR and demographic purpose
Prakash Choudhary, Neeta Nain, Mushtaq Ahmed
Author Affiliations +
Proceedings Volume 9445, Seventh International Conference on Machine Vision (ICMV 2014); 944526 (2015) https://doi.org/10.1117/12.2180903
Event: Seventh International Conference on Machine Vision (ICMV 2014), 2014, Milan, Italy
Abstract
This paper presents a methodology for the development of an Urdu handwritten text image Corpus and application of Corpus linguistics in the field of OCR and information retrieval from handwritten document. Compared to other language scripts, Urdu script is little bit complicated for data entry. To enter a single character it requires a combination of multiple keys entry. Here, a mixed approach is proposed and demonstrated for building Urdu Corpus for OCR and Demographic data collection. Demographic part of database could be used to train a system to fetch the data automatically, which will be helpful to simplify existing manual data-processing task involved in the field of data collection such as input forms like Passport, Ration Card, Voting Card, AADHAR, Driving licence, Indian Railway Reservation, Census data etc. This would increase the participation of Urdu language community in understanding and taking benefit of the Government schemes. To make availability and applicability of database in a vast area of corpus linguistics, we propose a methodology for data collection, mark-up, digital transcription, and XML metadata information for benchmarking.
© (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Prakash Choudhary, Neeta Nain, and Mushtaq Ahmed "A unified approach for development of Urdu Corpus for OCR and demographic purpose", Proc. SPIE 9445, Seventh International Conference on Machine Vision (ICMV 2014), 944526 (14 February 2015); https://doi.org/10.1117/12.2180903
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Image segmentation

Optical character recognition

Data processing

Structural design

Computing systems

Detection and tracking algorithms

Back to Top