IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification

Zijie Wang; Aichun Zhu; Zhe Zheng; Jing Jin; Zhouxin Xue; Gang Hua

doi:10.1117/1.JEI.29.4.043028

28 August 2020 IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification

Zijie Wang, Aichun Zhu, Zhe Zheng, Jing Jin, Zhouxin Xue, Gang Hua

Author Affiliations +

Journal of Electronic Imaging, Vol. 29, Issue 4, 043028 (August 2020). https://doi.org/10.1117/1.JEI.29.4.043028

Abstract

Given a natural language description, description-based person re-identification aims to retrieve images of the matched person from a large-scale visual database. Due to the existing modality heterogeneity, it is challenging to measure the cross-modal similarity between images and text descriptions. Many of the existing approaches usually utilize a deep-learning model to encode local and global fine-grained features with a strict uniform partition strategy. This breaks the part coherence, making it difficult to capture meaningful information from the within-part and semantic information among body parts. To address this issue, we proposed an inner-cross-modal attentional multigranular network (IMG-Net) to incorporate inner-modal self-attention and cross-modal hard-region attention with the fine-grained model for extracting the multigranular semantic information. Specifically, the inner-modal self-attention module is proposed to address the within-part consistency broken problem using both spatial-wise and channel-wise information. Following it is a multigranular feature extraction module, which is used to extract rich local and global visual and textual features with the help of group normalization (GN). Then a cross-modal hard-region attention module is proposed to obtain the local visual representation and phrase representation. Furthermore, a GN is used instead of batch normalization for the accurate batch statistics estimation. Comprehensive experiments with ablation analysis demonstrate that IMG-Net achieves the state-of-the-art performance on the CUHK-PEDES dataset and outperforms other previous methods significantly.

Citation Download Citation

Zijie Wang, Aichun Zhu, Zhe Zheng, Jing Jin, Zhouxin Xue, and Gang Hua "IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification," Journal of Electronic Imaging 29(4), 043028 (28 August 2020). https://doi.org/10.1117/1.JEI.29.4.043028

Received: 17 April 2020; Accepted: 6 August 2020; Published: 28 August 2020

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available