Improvement of attention modules for image captioning using pixel-wise semantic information

Zhihao Chen; Keisuke Doman; Yoshito Mekada

doi:10.1117/12.2644743

12 October 2022 Improvement of attention modules for image captioning using pixel-wise semantic information

Zhihao Chen, Keisuke Doman, Yoshito Mekada

Proceedings Volume 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022); 123422Y (2022) https://doi.org/10.1117/12.2644743
Event: Fourteenth International Conference on Digital Image Processing (ICDIP 2022), 2022, Wuhan, China

Abstract

Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.

Citation Download Citation

Zhihao Chen, Keisuke Doman, and Yoshito Mekada "Improvement of attention modules for image captioning using pixel-wise semantic information", Proc. SPIE 12342, Fourteenth International Conference on Digital Image Processing (ICDIP 2022), 123422Y (12 October 2022); https://doi.org/10.1117/12.2644743

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
7 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Image segmentation

Visualization

Computer programming

Classification systems

Feature extraction

Image visualization

Performance modeling

Show All Keywords

Keywords/Phrases

Search In:

Publication Years