Paper
21 June 2024 Self-correcting text-aware strategy for text-caption
Junliang Lian, Xiaochun Cao, Pengwen Dai
Author Affiliations +
Proceedings Volume 13167, International Conference on Remote Sensing, Mapping, and Image Processing (RSMIP 2024); 131673L (2024) https://doi.org/10.1117/12.3029640
Event: International Conference on Remote Sensing, Mapping and Image Processing (RSMIP 2024), 2024, Xiamen, China
Abstract
Nowadays there is a problem of "semantic gap" in computer vision. The existing text-based caption can not fully utilize the textual information in the image and the text recognition step may generate false recognition results, but there is no modification mechanism to correct the errors. In this work, we improve the existing approach by proposing a text-aware recognizer to extract image text information from the input data and generate corresponding text descriptions and text features. Considering the relationship between the text object and the image content, in order to improve the semantic errors in the text description sentences generated by the character recognizer, we introduce the caption-rectify module, which can better improve the text information involved in the image and model the text information recognized in the textcaps dataset. Seriously speaking, we propose to use the current state-of-the-art text recognizer to detect characters and generate contextual descriptions of images. Moreover, we propose a correction mechanism and demonstrate qualitatively and quantitatively that the correction can make the final caption statement consistent with the textual information in the image, improving the semantic accuracy of the text description. We validated our approach on text caption task, thoroughly analyzed each module, and showed significant improvements compared with the current advanced model LSTM-R and CNMT.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Junliang Lian, Xiaochun Cao, and Pengwen Dai "Self-correcting text-aware strategy for text-caption", Proc. SPIE 13167, International Conference on Remote Sensing, Mapping, and Image Processing (RSMIP 2024), 131673L (21 June 2024); https://doi.org/10.1117/12.3029640
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Education and training

Image processing

Sensors

Image compression

Performance modeling

Image fusion

Back to Top