To understand the visual world, a device knows not only the instances but also how they interact. Humans are at the center of such interactions. Detection of human–object interaction (HOI) is one of the growing research fields in computer vision. However, identifying HOIs due to the large label space of verbs and their interaction with various object types still needs much research. We focus on HOIs in images, which is necessary for a deeper understanding of the scene. In addition to two-dimensional (2D) information, such as the appearance of humans and objects and their spatial location, three-dimensional (3D) status, especially in the configuration of the human body and object as well as their location and spatial, can play an important role in learning HOI. The mapping of 2D to 3D world adds depth information to the problem. These issues led us to collect 3D information along with the 2D features of the images to provide more accurate results. We show 3D attributes, such as face transformation, the viewing angle, the position of an object, and its related location to the human face, can improve HOI learning. The results of experiments on large-scale data show that our method has been able to improve the outcome of interactions. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Feature extraction
Laser range finders
3D image processing
Optical spheres
Eye
Visualization
Computer vision technology