Semantic enhancement methods for image captioning

Luming Cui; Lin Li

doi:10.1117/12.2667270

1 March 2023 Semantic enhancement methods for image captioning

Luming Cui, Lin Li

Proceedings Volume 12588, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022); 1258807 (2023) https://doi.org/10.1117/12.2667270
Event: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), 2022, Chongqing, China

Abstract

Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.

Citation Download Citation

Luming Cui and Lin Li "Semantic enhancement methods for image captioning", Proc. SPIE 12588, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), 1258807 (1 March 2023); https://doi.org/10.1117/12.2667270

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Visualization

Semantics

Information visualization

Image enhancement

Visual process modeling

Performance modeling

Data modeling

Show All Keywords

Keywords/Phrases

Search In:

Publication Years