Human-object interaction (HOI) detection task is defined as inferring all the < human, verb, object > triplets in the image, which helps computers to obtain a more comprehensive understanding of the visual scene. Most existing HOI detection methods focus on instance local features, and rarely consider the information from backgrounds. Our core idea is that the relationship between human, object and other backgrounds contains important cues to facilitate HOI detection. According to the short-term memory selection (STMS) mechanism, we regard the interaction relationship as the result of human and object stimulating the union area, and simulate the stimulation process by the recurrent neural network. The features in the union area of human and object are taken as the input of RNN, human and object are the two inputs of RNN, and the output is the representation of the interaction relationship. Combined with the visual features and spatial features of instances, a multi-stream network is utilized to detect HOIs in the image. Experiments on V-COCO and HICO-DET show that the proposed model achieves better performance, verifying the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.