Analysing a text or part of it is key to handwriting identification. Generally, handwriting is learnt over time and people
develop habits in the style of writing. These habits are embedded in special parts of handwritten text. In Arabic each
word consists of one or more sub-word(s). The end of each sub-word is considered to be a connect stroke. The main
hypothesis in this paper is that sub-words are essential reflection of Arabic writer's habits that could be exploited for
writer identification. Testing this hypothesis will be based on experiments that evaluate writer's identification, mainly
using K nearest neighbor from group of sub-words extracted from longer text. The experimental results show that using a
group of sub-words could be used to identify the writer with a successful rate between 52.94 % to 82.35% when top1 is
used, and it can go up to 100% when top5 is used based on K nearest neighbor. The results show that majority of writers
are identified using 7 sub-words with a reliability confident of about 90% (i.e. 90% of the rejected templates have
significantly larger distances to the tested example than the distance from the correctly identified template). However
previous work, using a complete word, shows successful rate of at most 90% in top 10.
This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character
Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by
the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that
are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these
problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the
diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also
investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that
yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels
to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental
rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We
shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases
and in-house created databases. These algorithms help improve character segmentation accuracy by transforming
handwritten Arabic text into a form that could benefit from analysis of printed text.
Natural languages like Arabic, Kurdish, Farsi (Persian), Urdu, and any other similar languages have many features,
which make them different from other languages like Latin's script. One of these important features is diacritics. These
diacritics are classified as: compulsory like dots which are used to identify/differentiate letters, and optional like short
vowels which are used to emphasis consonants. Most indigenous and well trained writers often do not use all or some of
these second class of diacritics, and expert readers can infer their presence within the context of the writer text. In this
paper, we investigate the use of diacritics shapes and other characteristic as parameters of feature vectors for Arabic
writer identification/verification. Segmentation techniques are used to extract the diacritics-based feature vectors from
examples of Arabic handwritten text.
The results of evaluation test will be presented, which has been carried out on an in-house database of 50 writers. Also
the viability of using diacritics for writer recognition will be demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.