Paper
12 October 2022 Improvement of attention modules for image captioning using pixel-wise semantic information
Author Affiliations +
Proceedings Volume 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022); 123422Y (2022) https://doi.org/10.1117/12.2644743
Event: Fourteenth International Conference on Digital Image Processing (ICDIP 2022), 2022, Wuhan, China
Abstract
Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhihao Chen, Keisuke Doman, and Yoshito Mekada "Improvement of attention modules for image captioning using pixel-wise semantic information", Proc. SPIE 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022), 123422Y (12 October 2022); https://doi.org/10.1117/12.2644743
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Visualization

Computer programming

Classification systems

Feature extraction

Image visualization

Performance modeling

Back to Top