Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.
Proc. SPIE. 9405, Image Processing: Machine Vision Applications VIII
KEYWORDS: Video, Video surveillance, Facial recognition systems, Video compression, Image resolution, Detection and tracking algorithms, Light sources and illumination, Digital cameras, Associative arrays, Intelligence systems
Face images from video sequences captured in unconstrained environments usually contain several kinds of variations, e.g. pose, facial expression, illumination, image resolution and occlusion. Motion blur and compression artifacts also deteriorate recognition performance. Besides, in various practical systems such as law enforcement, video surveillance and e-passport identification, only a single still image per person is enrolled as the gallery set. Many existing methods may fail to work due to variations in face appearances and the limit of available gallery samples. In this paper, we propose a novel approach for still-to-video face recognition in unconstrained environments. By assuming that faces from still images and video frames share the same identity space, a regularized least squares regression method is utilized to tackle the multi-modality problem. Regularization terms based on heuristic assumptions are enrolled to avoid overfitting. In order to deal with the single image per person problem, we exploit face variations learned from training sets to synthesize virtual samples for gallery samples. We adopt a learning algorithm combining both affine/convex hull-based approach and regularizations to match image sets. Experimental results on a real-world dataset consisting of unconstrained video sequences demonstrate that our method outperforms the state-of-the-art methods impressively.
As smartphones and touch screens are more and more popular, on-line signature verification technology can be used as
one of personal identification means for mobile computing. In this paper, a novel Laplacian Spectral Analysis (LSA)
based on-line signature verification method is presented and an integration framework of LSA and Dynamic Time
Warping (DTW) based methods for practical application is proposed. In LSA based method, a Laplacian matrix is
constructed by regarding the on-line signature as a graph. The signature’s writing speed information is utilized in the
Laplacian matrix of the graph. The eigenvalue spectrum of the Laplacian matrix is analyzed and used for signature
verification. The framework to integrate LSA and DTW methods is further proposed. DTW is integrated at two stages.
First, it is used to provide stroke matching results for the LSA method to construct the corresponding graph better.
Second, the on-line signature verification results by DTW are fused with that of the LSA method. Experimental results
on public signature database and practical signature data on mobile phones proved the effectiveness of the proposed
Nowadays, video has gradually become the mainstream of dissemination media for its rich information capacity and intelligibility, and texts in videos often carry significant semantic information, thus making great contribution to video content understanding and construction of content-based video retrieval system. Text-based video analyses usually consist of text detection, localization, tracking, segmentation and recognition. There has been a large amount of research done on video text detection and tracking, but most solutions focus on text content processing in static frames, few making full use of redundancy between video frames. In this paper, a unified framework for text detection, localization and tracking in video frames is proposed. We select edge and corner distribution of text blocks as text features, localizing and tracking are performed. By making good use of redundancy between frames, location relations and motion characteristics are determined, thus effectively reduce false-alarm and raise correct rate in localizing. Tracking schemes are proposed for static and rolling texts respectively. Through multi-frame integration, text quality is promoted, so is correct rate of OCR. Experiments demonstrate the reduction of false-alarm and the increase of correct rate of localization and recognition.
This paper investigated the problem of orientation detection for document images with Chinese characters. These images
may be in four orientations: right side up, up-side down, 90° and 270° rotated counterclockwise. First, we presented the
structure of text-recognition-based orientation detection algorithm. Text line verification and orientation judgment
methods were mainly discussed, afterwards multiple experiments were carried. Distance-difference based text line
verification and confidence based text line verification were proposed and compared with methods without text line
verification. Then, a picture-based orientation detection framework was adopted for the situation where no text line was
detected. This high-level classification problem was solved by relatively low-level vision features including Color
Moments (CM) and Edge Direction Histogram (EDH), with distant-based classification scheme. Finally, confidencebased
classifier combination strategy was employed in order to make full use of the complementarity between different
features and classifiers. Experiments showed that both text line verification methods were able to improve the accuracy
of orientation detection, and picture-based orientation detection had a good performance for no-text image set.
Offline Chinese handwritten character string recognition is one of the most important research fields in pattern
recognition. Due to the free writing style, large variability in character shapes and different geometric characteristics,
Chinese handwritten character string recognition is a challenging problem to deal with. However, among the current
methods over-segmentation and merging method which integrates geometric information, character recognition
information and contextual information, shows a promising result. It is found experimentally that a large part of errors
are segmentation error and mainly occur around non-Chinese characters. In a Chinese character string, there are not only
wide characters namely Chinese characters, but also narrow characters like digits and letters of the alphabet. The
segmentation error is mainly caused by uniform geometric model imposed on all segmented candidate characters. To
solve this problem, post processing is employed to improve recognition accuracy of narrow characters. On one hand,
multi-geometric models are established for wide characters and narrow characters respectively. Under multi-geometric
models narrow characters are not prone to be merged. On the other hand, top rank recognition results of candidate paths
are integrated to boost final recognition of narrow characters. The post processing method is investigated on two
datasets, in total 1405 handwritten address strings. The wide character recognition accuracy has been improved lightly
and narrow character recognition accuracy has been increased up by 10.41% and 10.03% respectively. It indicates that
the post processing method is effective to improve recognition accuracy of narrow characters.
Proc. SPIE. 7879, Imaging and Printing in a Web 2.0 World II
KEYWORDS: Curium, Feature extraction, Distance measurement, Classification systems, Printing, Simulation of CCA and DLA aggregates, Statistical analysis, Intelligence systems, Digital photography, Scanners
Automatic picture orientation recognition is of great significance in many applications such as consumer gallery
management, webpage browsing, content-based searching or web printing. We try to solve this high-level classification
problem by relatively low-level features including Spacial Color Moment (CM) and Edge Direction Histogram (EDH).
An improved distance-based classification scheme is adopted as our classifier. We propose an input-vector-rotating
strategy, which is computationally more efficient than several conventional schemes, instead of collecting and training
samples for all four classes. Then we research on the classifier combination algorithm to make full use of the
complementarity between different features and classifiers. Our classifier combination methods include two levels:
feature-level and measurement-level. And we present two classifier combination structures (parallel and cascaded) at
measurement-level with a rejection option. As the precondition of measurement-level methods, the theory of Classifier's
Confidence Analysis (CCA) is introduced with the definition of concepts such as classifier's confidence and generalized
confidence. The classification system finally approached 90% recognition accuracy on a wide unconstrained consumer
Proc. SPIE. 7879, Imaging and Printing in a Web 2.0 World II
KEYWORDS: Image segmentation, Optical character recognition, Image resolution, Detection and tracking algorithms, Image processing algorithms and systems, Video, Binary data, Simulation of CCA and DLA aggregates, Feature extraction, Machine learning
Web images constitute an important part of web document and become a powerful medium of expression, especially for
the images containing text. The text embedded in web images often carry semantic information related to layout and
content of the pages. Statistics show that there is a significant need to detect and recognize text from web images. In this
paper, we first give a short review of these methods proposed for text detection and recognition in web images; then a
framework to extract from web images is presented, including stages of text localization and recognition. In text
localization stage, localization method is applied to generate text candidates and a two-stage strategy is utilized to select
text candidates, then text regions are localized using a coarse-to-fine text lines extraction algorithm. For text recognition,
two text region binarization methods have been proposed to improve the performance of text recognition in web images.
Experimental results for text localization and recognition prove the effectiveness of these methods. Additionally, a
recognition evaluation for text regions in web images has been conducted for benchmark.
Proc. SPIE. 7540, Imaging and Printing in a Web 2.0 World; and Multimedia Content Access: Algorithms and Systems IV
KEYWORDS: Detection and tracking algorithms, Performance modeling, Image segmentation, Visual process modeling, Optical character recognition, Data processing, Inspection, Gaussian filters, Sensors, Video
Detection of characters regions is a meaningful research work for both highlighting region of interest and recognition for
further information processing. A lot of researches have been performed on character localization and extraction and this
leads to the great needs of performance evaluation scheme to inspect detection algorithms. In this paper, two probability
models are established to accomplish evaluation tasks for different applications respectively. For highlighting region of
interest, a Gaussian probability model, which simulates the property of a low-pass Gaussian filter of human vision
system (HVS), was constructed to allocate different weights to different character parts. It reveals the greatest potential
to describe the performance of detectors, especially, when the result detected is an incomplete character, where other
methods cannot effectively work. For the recognition destination, we also introduced a weighted probability model to
give an appropriate description for the contribution of detection results to final recognition results. The validity of
performance evaluation models proposed in this paper are proved by experiments on web images and natural scene
images. These models proposed in this paper may also be able to be applied in evaluating algorithms of locating other
objects, like face detection and more wide experiments need to be done to examine the assumption.
Proc. SPIE. 7534, Document Recognition and Retrieval XVII
KEYWORDS: Optical character recognition, Statistical modeling, Feature extraction, Rule based systems, Intelligence systems, Image segmentation, Data modeling, Information science, Information technology, Electronics engineering
We have developed an online pinyin recognition system which combined hmm method and statistic method together.
Pinyin recognition is useful for those who may forget how to write a certain Chinese character but know how to
pronounce it. We combined HMM model and statistic model to segment a word and recognize it. We have achieved a
writer-independent accuracy of 91.37% for 17745 unconstrained-style Pinyin syllables.
Eye blink detection is one of the important problems in computer vision. It has many applications such as face live
detection and driver fatigue analysis. The existing methods towards eye blink detection can be roughly divided into two
categories: contour template based and appearance based methods. The former one usually can extract eye contours
accurately. However, different templates should be involved for the closed and open eyes separately. These methods are
also sensitive to illumination changes. In the appearance based methods, image patches of open-eyes and closed-eyes are
collected as positive and negative samples to learn a classifier, but eye contours can not be accurately extracted. To
overcome drawbacks of the existing methods, this paper proposes an effective eye blink detection method based on an
improved eye contour extraction technique. In our method, eye contour model is represented by 16 landmarks therefore
it can describe both open and closed eyes. Each landmark is accurately recognized by fast classifier which is trained from
the appearance around this landmark. Experiments have been conducted on YALE and another large data set consisting
of frontal face images to extract the eye contour. The experimental results show that the proposed method is capable of
affording accurate eye location and robust in closed eye condition. It also performs well in the case of illumination
variants. The average time cost of our method is about 140ms on Pentium IV 2.8GHz PC 1G RAM, which satisfies the
real-time requirement for face video sequences. This method is also applied in a face live detection system and the
results are promising.
This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free
style document, global property such as character size, line direction can hardly be concluded, which reveals a grave
limitation in traditional layout analysis.
'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity
function found on gradient information to locate text areas where gradient within a window have large magnitude and
various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via
statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with
cost function based on the probability model.
Proc. SPIE. 7247, Document Recognition and Retrieval XVI
KEYWORDS: Image segmentation, Optical character recognition, Image restoration, Cameras, 3D image restoration, 3D image processing, 3D modeling, Image enhancement, Detection and tracking algorithms, Fuzzy logic
In camera-based optical character recognition (OCR) applications, warping is a primary problem. Warped document
images should be restored before they are recognized by traditional OCR algorithm. This paper presents a novel
restoration approach, which first makes an estimation of baseline and vertical direction estimation based on rough line
and character segmentation, then selects several key points and determines their restoration mapping as a result of the
estimation step, at last performs Thin-Plate Splines (TPS) interpolation on full page image using these key points
mapping. The restored document image is expected to have straight baselines and erect character direction. This method
can restore arbitrary local warping as well as keep the restoration result natural and smooth, consequently improves the
performance of the OCR application. Experiments on several camera captured warped document images show
effectiveness of this approach.
Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be
digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type
variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some
characteristics, for example, one character may be part of another character, we define the character set for recognition
according to the segmented components, and the components are combined into characters by rule-based post-processing
module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented.
For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and
connected components. As Mongolian has different font-types which are categorized into two major groups, the
parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is
introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant
character recognition kernels are integrated. Experiments show that the presented methods are effective. The text
recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
In this paper, we present an algorithm of estimating new-view vehicle speed. Different from far-view scenario, near-view
image provides more specific vehicle information such as body texture and vehicle identifier which makes it practical for
individual vehicle speed estimation. The algorithm adopts the idea of Vanishing Point to calibrate camera parameters and
Gaussian Mixture Model (GMM) to detect moving vehicles. After calibrating, it transforms image coordinates to the
real-world coordinates using a simple model - the Pinhole Model and calculates the vehicle speed in real-world
coordinates. Adopting the idea of Vanishing Point, this algorithm only needs two pre-measured parameters: camera
height and distance between camera and middle road line, other information such as camera orientation, focal length, and vehicle speed can be extracted from video data.
Post-processing of OCR is a bottleneck of the document image processing system. Proof reading is necessary since the
current recognition rate is not enough for publishing. The OCR system provides every recognition result with a confident
or unconfident label. People only need to check unconfident characters while the error rate of confident characters is low
enough for publishing. However, the current algorithm marks too many unconfident characters, so optimization of OCR
results is required. In this paper we propose an algorithm based on pattern matching to decrease the number of
unconfident results. If an unconfident character matches a confident character well, its label could be changed into a
confident one. Pattern matching makes use of original character images, so it could reduce the problem caused by image
normalization and scanned noises. We introduce WXOR, WAN, and four-corner based pattern matching to improve the
effect of matching, and introduce confidence analysis to reduce the errors of similar characters. Experimental results
show that our algorithm achieves improvements of 54.18% in the first image set that contains 102,417 Chinese
characters, and 49.85% in the second image set that contains 53,778 Chinese characters.
Document authentication decides whether a given document is from a specific individual or not. In this paper, we propose a new document authentication method in physical (after document printed out) domain by embedding deformation characters. When an author writers a document to a specific individual or organization, a unique error-correcting code which serves as his Personal Identification Number (PIN) is proposed and then some characters in the text line are deformed according to his PIN. By doing so, the writer's personal information is embedded in the document. When the document is received, it is first scanned and recognized by an OCR module, and then the deformed characters are detected to get the PIN, which can be used to decide the originality of the document. So the document authentication can be viewed as a kind of communication problems in which the identity of a document from a writer is being "transmitted" over a channel. The channel consists of the writer's PIN, the document, and the encoding rule. Experimental result on deformation character detection is very promising, and the availability and practicability of the proposed method is verified by a practical system.
The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM and S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM and S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM and S and thus preserves acceptable decoded image quality.
Style is an important feature of printed or handwritten characters. But it is not studied thoroughly compared with character recognition. In this paper, we try to learn how many typical styles exist in a kind of real world form images. A hierarchical clustering method has been developed and tested. A cross recognition error rate constraint is proposed to reduce the false combinations in the hierarchical clustering process, and a cluster selecting method is used to delete redundant or unsuitable clusters. Only a similarity measure between any patterns is needed by the algorithm. It is tested on a template matching based similarity measure which can be extended to any other feature and distance measure easily. The detailed comparing on every step’s effects is shown in the paper. Total 16 kinds of typical styles are found out, and by giving each character in each style a prototype for recognition, a 0.78% error rate is achieved by recognizing the testing set.
Proc. SPIE. 5010, Document Recognition and Retrieval X
KEYWORDS: Optical character recognition, Image processing, Error analysis, Image segmentation, Image restoration, Intelligence systems, Image classification, Data analysis, Digital libraries, Process control
This paper introduces a newly designed general-purpose Chinese document data capture system - Tsinghua OCR Network Edition (TONE). The system aimed to cut down the high cost in the process of digitalizing mass Chinese paper documents. Our first step was to divide the whole data-entry process into a few single-purpose procedures. Then based on these procedures, a production-line-like system configuration was developed. By design, the management cost was reduced directly by substituting automated task scheduling for traditional manual assignment, and indirectly by adopting well-designed quality control mechanism. Classification distances, character image positions, and context grammars are synthesized to reject questionable characters. Experiments showed that when 19.91% of the characters are rejected, the residual error rate could be 0.0097% (below one per ten thousand characters). This finally improved the error-rejecting module to be applicable. According to the cost distribution (specially, the manual correction occupies 70% of total) in the data companies, the estimated total cost reduction could be over 50%.
Different character recognition problems have their own specific characteristics. The state-of-art OCR technologies take different recognition approaches, which are most effective, to recognize different types of characters. How to identify character type automatically, then use specific recognition engines, has not brought enough attention among researchers. Most of the limited researches are based on the whole document image, a block of text or a text line. This paper addresses the problem of character type identification independent of its content, including handwritten/printed Chinese character identification, and printed Chinese/English character identification, based on only one character. Exploiting some effective features, such as run-lengths histogram features and stroke density histogram features, we have got very promising result. The identification correct rate is higher than 98% in our experiments.
Character recognition in low quality and low-resolution images is still a challenging problem. In this paper a gray-scale image based character recognition algorithm is proposed, which is specially suit to gray scale images captured from real world or very low quality character recognition. In our research, we classify the deformations of the low quality and low-resolution character images into two categories: (1) High spatial frequency deformations derived from either the blur distortion by the point spread function (PSF) of scanners or cameras, random noises, or character deformations; (b) Low spatial frequency deformations mainly derived from the large- scale background variations. The traditional recognition methods based on binary images cannot give satisfactory results in these images because these deformations will result in great amount of strokes touch or stroke broken in the binarization process. In the proposed method, we directly extract transform features on the gray-scale character images, which will avoid the shortcomings produced by binarization process. Our method differs from the existing gray-scale methods in that it avoids the difficult and unstable step of finding character structures in the images. By applying adequate feature selection algorithms, such as linear discriminant analysis (LDA) or principal component analysis (PCA), we can select the low frequency components that preserve the fundamental shape of characters and discard the high frequency deformation components. We also develop a gray- level histogram based algorithm using native integral ratio (NIR) technique to find a threshold to remove the backgrounds of character images while maintaining the details of the character strokes as much as possible. Experiments have shown that this method is especially effective for recognition of images of low quality and low-resolution.