28 March 2019 Long short-term memory network with external memories for image caption generation
Teng Jiang, Chengjun Zhan, Yupu Yang
Author Affiliations +
Funded by: National Natural Science Foundation of China (NSFC)
Abstract
In long short-term memory (LSTM) neural networks, the input gates and output gates control information flowing into and out of memory cells. For sequence-to-sequence learning problems, each element is input into the network only once. If the input gates are closed at a certain step, the information is lost and is not input again. The same problem exists for the output gates. Therefore, the input and output gates do not fully support the roles of gating. An LSTM network with external memories, in which separate memories are installed for the input and output gates, is proposed. Information that is blocked by the input gates is preserved in the input memories, enabling the cells to read these memories when necessary. Similarly, information blocked by the output gates is preserved in the output memories and flows out to hidden units of the network at an appropriate time. In addition, a dynamic attention model is proposed to take into account the attention history. It provides guidance when predicting the attention weights at each step. The proposed model exploits attention-based encoder–decoder architecture to generate image captions. Experiments were conducted on three benchmark datasets, namely Flickr8k, Flickr30k, and MSCOCO, to demonstrate the effectiveness of the proposed approach. Captions generated by the proposed method are longer and more informative than those obtained with the original LSTM network.
© 2019 SPIE and IS&T 1017-9909/2019/$25.00 © 2019 SPIE and IS&T
Teng Jiang, Chengjun Zhan, and Yupu Yang "Long short-term memory network with external memories for image caption generation," Journal of Electronic Imaging 28(2), 023022 (28 March 2019). https://doi.org/10.1117/1.JEI.28.2.023022
Received: 20 October 2018; Accepted: 5 March 2019; Published: 28 March 2019
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Network architectures

Performance modeling

Data modeling

Neural networks

Computer programming

Image processing

Computer vision technology

Back to Top