Paper
21 March 2013 Graph-based layout analysis for PDF documents
Canhui Xu, Zhi Tang, Xin Tao, Yun Li, Cao Shi
Author Affiliations +
Proceedings Volume 8664, Imaging and Printing in a Web 2.0 World IV; 866407 (2013) https://doi.org/10.1117/12.2005608
Event: IS&T/SPIE Electronic Imaging, 2013, Burlingame, California, United States
Abstract
To increase the flexibility and enrich the reading experience of e-book on small portable screens, a graph based method is proposed to perform layout analysis on Portable Document Format (PDF) documents. Digital born document has its inherent advantages like representing texts and fractional images in explicit form, which can be straightforwardly exploited. To integrate traditional image-based document analysis and the inherent meta-data provided by PDF parser, the page primitives including text, image and path elements are processed to produce text and non text layer for respective analysis. Graph-based method is developed in superpixel representation level, and page text elements corresponding to vertices are used to construct an undirected graph. Euclidean distance between adjacent vertices is applied in a top-down manner to cut the graph tree formed by Kruskal’s algorithm. And edge orientation is then used in a bottom-up manner to extract text lines from each sub tree. On the other hand, non-textual objects are segmented by connected component analysis. For each segmented text and non-text composite, a 13-dimensional feature vector is extracted for labelling purpose. The experimental results on selected pages from PDF books are presented.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Canhui Xu, Zhi Tang, Xin Tao, Yun Li, and Cao Shi "Graph-based layout analysis for PDF documents", Proc. SPIE 8664, Imaging and Printing in a Web 2.0 World IV, 866407 (21 March 2013); https://doi.org/10.1117/12.2005608
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Visualization

Composites

Analytical research

Feature extraction

Image analysis

Image processing

Back to Top