Presentation + Paper
28 October 2022 Dynamic-automatic pipelines for finding topic-specific information clusters using NLP methods in connection with a model-driven approach
Tobias Dorrn, Achim Kuwertz
Author Affiliations +
Abstract
Finding and extracting topic-specific information from free-text sources is an important task for classifying and distinguishing content of information systems. Such a compression process of information, in which non-relevant text parts can also be ignored, is also advantageous with regard to the further machine processing and evaluation of topic-specific documents. State-of-the-art approaches normally use well-trained modern Natural Language Processing (NLP) methods to solve such tasks. However, use cases can arise where no suitable training data sets are available to adequately prepare or fine-tune the NLP methods used. In this paper, we want to detail a model-driven approach, applying an XML data model to an application-specific scenario, combining different NLP methods into a dynamic automated NLP pipeline. The goal of this pipeline is the automatic extraction of specific information (related to certain domains or topics) from text documents allowing a structured further processing of this information. Specifically, a scenario is considered where such information has to be aligned to a given information model, defining e.g. the terms relevant for the further processing. The solution approaches described here deal with a scenario in which information clusters on a specific topic can be obtained from a given data set, even without domain-specific model training. The basis is the use of a dynamic (i.e., using different NLP methods and models) and fully automatic (i.e., using different topics at the same time) pipeline architecture combined with an XML data model. The presented approach details and extends our earlier work and gives new qualitative and first quantitative results.
Conference Presentation
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tobias Dorrn and Achim Kuwertz "Dynamic-automatic pipelines for finding topic-specific information clusters using NLP methods in connection with a model-driven approach", Proc. SPIE 12276, Artificial Intelligence and Machine Learning in Defense Applications IV, 1227602 (28 October 2022); https://doi.org/10.1117/12.2648385
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Systems modeling

Associative arrays

Data processing

Roads

Computer security

Solid modeling

Back to Top