KEYWORDS: Optical character recognition, Optical correlators, Visualization, Image processing, Java, Software development, Image processing software, Databases, Digital signal processing, Data archive systems
Most organizations usually have large archives of paper documents that they maintain. These archives typically contain valuable information and data, which are imaged to provide electronic access. However, once a document is either printed or imaged, these organizations had no efficient method of retrieving information from these documents. The only methods available to retrieve information from them were to either manually read them or to convert them to ASCII text using optical character recognition (OCR). For most of the archives with large numbers of documents, these methods are problematic. Manual searches are not feasible. OCR, on the other hand, can be CPU intensive and prone to error. In addition, for many foreign languages, OCR engines do not exist.
By contrast, our system provides an innovative approach to the problem of retrieving information from imaged document archives utilizing a client/server architecture. Since its beginning in 1999, we have made significant advances in the development of a system that employs optical correlation (OC) technology (either software or hardware) to access directly the textual and graphic information contained in imaged paper documents therefore eliminating the OCR process. It provides a fast, accurate means of accessing this information directly from multilingual documents. In addition, our system can also rapidly and accurately detect the presence of duplicate documents within an archive using optical correlation techniques.
In this paper, we describe the present system and selected examples of its capabilities. We also present some performance results (accuracy, speed, etc.) against test document sets.
Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). Many of these organizations are converting their paper archives to electronic images, which are then stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources and provide for rapid access to the information contained within these imaged documents. To meet this need, Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provide a means for the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and has the potential to determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.
Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provides the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited; e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.
Current financial, schedule and risk constraints mandate reuse of software components when building large-scale simulations. While integration of simulation components into larger systems is a well-understood process, it is extremely difficult to do while ensuring that the results are correct. Illgen Simulation Technologies Incorporated and Litton PRC have joined forces to provide tools to integrate simulations with confidence. Illgen Simulation Technologies has developed an extensible and scaleable, n-tier, client- server, distributed software framework for integrating legacy simulations, models, tools, utilities, and databases. By utilizing the Internet, Java, and the Common Object Request Brokering Architecture as the core implementation technologies, the framework provides built-in scalability and extensibility.
Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). In addition, many organizations are converting their paper archives to electronic images, which are stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources. The Imaged Document Optical Correlation and Conversion System (IDOCCS) provides a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provides the search and retrieval capability of document images. The IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and can even determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo, or documents with a particular individual's signature block, can be singled out. With this dual capability, IDOCCS outperforms systems that rely on optical character recognition as a basis for indexing and storing only the textual content of documents for later retrieval.
As distributed modeling and simulation finds great success in the marketplace, its underlying technology has begun to reveal significant limitations. For example, the lack of high fidelity will limit a simulation's ability to represent more robust situations such as design and test of complex systems, analysis of manufacturing and lifecycle support, high fidelity training support, and C4I planning. Consequently, simulations must now begin to reproduce, as faithfully as possible, the high-fidelity interactions that are critical to analyzing and formulating successful designs, tactics, and business strategies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.