With the development of wearable cameras a new environment has emerged, the egocentric perspective, and with it the computer vision task of detecting the hands and disambiguating them left from right. In order to address this challenge, we use an Attention Network with various egocentric hand properties to make the final classification. These hand features are inspired by the egocentric perspective and include the hand location in the image, the hand size, the fact there is at most only one object of each hand class and the probability of each hand to appear in the image. In addition, we use the YOLO object detector and its Tiny version to see their impact on the overall performance and speed, which is needed for wearable devices. Finally, we compare them with current object and hand detection approaches.
|