The solution to the problem of human actions recognition on video sequences is one of the key areas on the path to the development and implementation of computer vision systems in various spheres of life. Such areas as video surveillance systems, monitoring, contactless control interfaces, video processing as a preliminary stage of processing, etc. While additional sources of information (such as depth sensors, thermal sensors) allows to get more informative features, and thus increase the reliability and stability of recognition. In this work we focus on the simultaneous extraction of skeleton and 3D local binary dense micro-block difference information of an object from a visible and thermal image. The proposed algorithm is a four-stage procedure: (a) fusion information from visible cameras and thermal sensors based on the PLIP model (parameterized model of logarithmic image processing), (b) image preprocessing using a 3D Gabor filter, (c) a descriptor calculation using 3D local binary dense micro-block difference with skeleton points, and (d) classification. The proposed algorithm is based on capturing 3D sub-volumes located inside a video sequence patch and calculating the difference in intensities between these sub-volumes; for intensified motion, used the convolution with a bank of 3D arbitrarily oriented Gabor filters. We calculate the local features for pre-processed frames, such as 3D local binary dense micro-block difference (3D LBDMD). As a result of the experiments, it was shown that the method proposed in this study shows good performance compared to other state-of-the-art methods.
|