Paper
3 April 1997 Use of document structure analysis to retrieve information from documents in digital libraries
Debashish Niyogi, Sargur N. Srihari
Author Affiliations +
Proceedings Volume 3027, Document Recognition IV; (1997) https://doi.org/10.1117/12.270074
Event: Electronic Imaging '97, 1997, San Jose, CA, United States
Abstract
This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired, and are parsed to determine the type of document from which information is desired, the syntactic level of the information desired, and the level of analysis required to extract the information. Using these clauses in the query, a set of salient documents are retrieved, layout analysis and logical structure derivation are performed on the retrieved documents, and the documents are then analyzed in detail to extract the relevant logical components. A 'document browser' application, being developed based on this approach, allows a user to interactively specify queries on the documents in the digital library using a graphical user interface, provides feedback about the candidate documents at each stage of the retrieval process, and allows refinements of the query based on the intermediate results of the search. Results of a query are displayed either as an image or as formatted text.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Debashish Niyogi and Sargur N. Srihari "Use of document structure analysis to retrieve information from documents in digital libraries", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); https://doi.org/10.1117/12.270074
Lens.org Logo
CITATIONS
Cited by 19 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital libraries

Analytical research

Databases

Image processing

Control systems

Optical character recognition

Photography

RELATED CONTENT

Non-Manhattan layout extraction algorithm
Proceedings of SPIE (March 21 2013)
DRR is a teenager
Proceedings of SPIE (January 28 2008)
Graph-based table recognition system
Proceedings of SPIE (March 07 1996)

Back to Top