28 August 2020 IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification
Zijie Wang, Aichun Zhu, Zhe Zheng, Jing Jin, Zhouxin Xue, Gang Hua
Author Affiliations +
Abstract

Given a natural language description, description-based person re-identification aims to retrieve images of the matched person from a large-scale visual database. Due to the existing modality heterogeneity, it is challenging to measure the cross-modal similarity between images and text descriptions. Many of the existing approaches usually utilize a deep-learning model to encode local and global fine-grained features with a strict uniform partition strategy. This breaks the part coherence, making it difficult to capture meaningful information from the within-part and semantic information among body parts. To address this issue, we proposed an inner-cross-modal attentional multigranular network (IMG-Net) to incorporate inner-modal self-attention and cross-modal hard-region attention with the fine-grained model for extracting the multigranular semantic information. Specifically, the inner-modal self-attention module is proposed to address the within-part consistency broken problem using both spatial-wise and channel-wise information. Following it is a multigranular feature extraction module, which is used to extract rich local and global visual and textual features with the help of group normalization (GN). Then a cross-modal hard-region attention module is proposed to obtain the local visual representation and phrase representation. Furthermore, a GN is used instead of batch normalization for the accurate batch statistics estimation. Comprehensive experiments with ablation analysis demonstrate that IMG-Net achieves the state-of-the-art performance on the CUHK-PEDES dataset and outperforms other previous methods significantly.

© 2020 SPIE and IS&T 1017-9909/2020/$28.00© 2020 SPIE and IS&T
Zijie Wang, Aichun Zhu, Zhe Zheng, Jing Jin, Zhouxin Xue, and Gang Hua "IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification," Journal of Electronic Imaging 29(4), 043028 (28 August 2020). https://doi.org/10.1117/1.JEI.29.4.043028
Received: 17 April 2020; Accepted: 6 August 2020; Published: 28 August 2020
Lens.org Logo
CITATIONS
Cited by 20 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Feature extraction

Information visualization

Performance modeling

Convolution

Databases

Statistical analysis

RELATED CONTENT


Back to Top