Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.
The analysis of 2D structured documents often requires localizing data inside of a document during the recognition process. In this paper we present LearnPos a new generic tool, independent of any document recognition system. LearnPos models and evaluates positioning from a learning set of documents. Thanks to LearnPos, the user is helped to define the physical structure of the document. He then can concentrate his efforts on the definition of the logical structure of the documents. LearnPos is able to furnish spatial information for both absolute and relative spatial relations, in interaction with the user. Our method can handle spatial relations compose of distinct zones and is able to furnish appropriate order and point of view to minimize errors. We prove that resulting models can be successfully used for structured document recognition, while reducing the manual exploration of the data set of documents.
In this paper, we present our new method for the segmentation of handwritten text pages into lines, which has been submitted to ICDAR'2013 handwritten segmentation competition. This method is based on two levels of perception of the image: a rough perception based on a blurred image, and a precise perception based on the presence of connected components. The combination of those two levels of perception enables to deal with the difficulties of handwritten text segmentation: curvature, irregular slope and overlapping strokes. Thus, the analysis of the blurred image is efficient in images with high density of text, whereas the use of connected components enables to connect the text lines in the pages with low text density. The combination of those two kinds of data is implemented with a grammatical description, which enables to externalize the knowledge linked to the page model. The page model contains a strategy of analysis that can be associated to an applicative goal. Indeed, the text line segmentation is linked to the kind of data that is analysed: homogeneous text pages, separated text blocks or unconstrained text. This method obtained a recognition rate of more than 98% on last ICDAR'2013 competition.
This paper presents a new method to address the problem of handwritten text segmentation into text lines
and words. Thus, we propose a method based on the cooperation among points of view that enables the
localization of the text lines in a low resolution image, and then to associate the pixels at a higher level of
resolution. Thanks to the combination of levels of vision, we can detect overlapping characters and re-segment
the connected components during the analysis. Then, we propose a segmentation of lines into words based on the
cooperation among digital data and symbolic knowledge. The digital data are obtained from distances inside a
Delaunay graph, which gives a precise distance between connected components, at the pixel level. We introduce
structural rules in order to take into account some generic knowledge about the organization of a text page.
This cooperation among information gives a bigger power of expression and ensures the global coherence of the
recognition. We validate this work using the metrics and the database proposed for the segmentation contest of
ICDAR 2009. Thus, we show that our method obtains very interesting results, compared to the other methods
of the literature. More precisely, we are able to deal with slope and curvature, overlapping text lines and varied
kinds of writings, which are the main difficulties met by the other methods.
This paper presents an improvement to a document layout analysis system, offering a possible solution to Sayre's paradox ("a letter must be recognized before it can be segmented; and it must be segmented before it can be recognized"). This improvement, based on stochastic parsing, allows integration of statistical information, obtained from recognizers, during syntactic layout analysis. We present how this fusion of numeric and symbolic information in a feedback loop can be applied to syntactic methods to simplify document description. To limit combinatorial explosion during exploration of solutions, we devised an operator that allows optional activation of the stochastic parsing mechanism. Our evaluation on 1250 handwritten business letters shows this method allows the improvement of global recognition scores.
GaAs micro-nanodisks (typical disk size 5 μm × 200 nm in our work) are good candidates for boosting optomechanical
phenomena thanks to their ability to confine both optical and mechanical energy in a sub-micron interaction volume. We
present results of optomechanical characterization of GaAs disks by near-field optical coupling from a tapered silica
nano-waveguide. Whispering gallery modes with optical Q factor up to a few 105 are observed. Critical coupling, optical
resonance doublet splitting and mode identification are discussed. We eventually show an optomechanical phenomenon
of optical force attraction of the silica taper to the disk. This phenomenon shows that mechanical and optical degrees of
freedom naturally couple at the micro-nanoscale.
Proc. SPIE. 7527, Human Vision and Electronic Imaging XV
KEYWORDS: Visual process modeling, Visualization, Image segmentation, Retina, Image resolution, Feature extraction, Cobalt, Human vision and color perception, Chemical elements, Document image analysis
This work addresses the problem of document image analysis, and more particularly the topic of document
structure recognition in old, damaged and handwritten document. The goal of this paper is to present the interest
of the human perceptive vision for document analysis. We focus on two aspects of the model of perceptive vision:
the perceptive cycle and the visual attention. We present the key elements of the perceptive vision that can be
used for document analysis.
Thus, we introduce the perceptive vision in an existing method for document structure recognition, which
enable both to show how we used the properties of the perceptive vision and to compare the results obtained
with and without perceptive vision. We apply our method for the analysis of several kinds of documents (archive
registers, old newspapers, incoming mails . . . ) and show that the perceptive vision significantly improves their
recognition. Moreover, the use of the perceptive vision simplifies the description of complex documents. At last,
the running time is often reduced.
This paper presents a system to extract the logical structure of handwritten mail documents. It consists in two
joined tasks: the segmentation of documents into blocks and the labeling of such blocks. The main considered
label classes are: addressee details, sender details, date, subject, text body, signature. This work has to face
with difficulties of unconstrained handwritten documents: variable structure and writing.
We propose a method based on a geometric analysis of the arrangement of elements in the document. We
give a description of the document using a two-dimension grammatical formalism, which makes it possible to
easily introduce knowledge on mail into a generic parser. Our grammatical parser is LL(k), which means several
combinations are tried before extracting the good one. The main interest of this approach is that we can deal
with low structured documents. Moreover, as the segmentation into blocks often depends on the associated
classes, our method is able to retry a different segmentation until labeling succeeds.
We validated this method in the context of the French national project RIMES, which proposed a contest on
a large base of documents. We obtain a recognition rate of 91.7% on 1150 images.