Human pose estimation serves as a critical foundation for various subsequent tasks such as human-computer interaction, motion analysis, and action recognition. Current human pose estimation networks often demand larger model parameters and computational resources to continuously enhance estimation accuracy, posing challenges for implementation on low-powered edge computing devices. This study, through experimentation, identified a prevalent issue of redundancy in convolution channels within existing human pose estimation network models. Notably, strong similarities were observed among feature maps output by convolution channels. In response to these challenges, this paper introduces a lightweight yet accurate human pose estimation network model, designed to be applicable to most edge computing devices while maintaining high estimation accuracy. The proposed model initiates improvements to ordinary convolutions in the network by employing a reparametrizable partial convolution for redundant reduction. Simultaneously, it enriches the diversity of the extracted features. Furthermore, an effective multiscale cross-attention mechanism is designed to fuse features at different stages of the backbone network. This approach enhances accuracy while mitigating the severe decrease in inference speed associated with excessive multiscale fusion. Through these design strategies, the proposed model achieves a balance between accuracy and speed, with a smaller computational and parameter footprint. Experimental validation on the COCO and MPII datasets verifies the effectiveness of the proposed method.
At present, the method based on two-stream network has achieved good recognition performance in action recognition, however, its real-time performance is obstructed due to the high computational cost of optical flow. Temporal Segment Network (TSN), a successful example based on the two-stream network, achieves high recognition performance but cannot be processed in real time. In this paper, the motion vector TSN (MV-TSN) is proposed by introducing the motion vector into temporal segment networks, which greatly speeds up the processing speed of TSN. In order to solve the problem of performance degradation caused by the motion vectors lacking fine structure information, we propose a knowledge transfer strategy, which initializes the MV-TSN with the fine knowledge learned by optical flow. The experimental results show that the proposed method achieves a comparable recognition performance to the previous state-of-the-art approaches on UCF-101 and HMDB-51, and the processing speed is 206.2 fps, which is 13 times of the original TSN.
KEYWORDS: Databases, Detection and tracking algorithms, Feature extraction, 3D modeling, RGB color model, Video surveillance, Video, Data modeling, Data centers, Spine
Skeleton-based methods have been proposed to detect and recognize meaningful human motion. It is known that most of them must contain some parameters. To achieve better recognition performance, various evolutionary schemes have been applied to select the optimal parameters in each phase of these human recognition methods. Experimental evaluations of various parameters, in terms of action recognition performance, should be done for obtaining the optimal parameter. In this paper, we propose an adaptive skeleton-based human action recognition system which can automatically adjust the experimental parameters according to the input data. We first extract some spatiotemporal local features by obtaining position differences of joints, which models actions over time. Then a two-layer affinity propagation (AP) algorithm is employed to select crucial postures. Our experiment results demonstrates that the proposed method works well for different dataset.
With the evolution and spread of motion capture system , human motion data can be obtained and saved in a variety of forms. Due to the huge amounts of human motion data, we need a more effective method to analyze for the target motion sequence. In this paper, an efficient system is presented for aligning motion based on the similarity of the motion data distribution, the process of analysis system includes two primary phases: offline motion database building and online aligning. We will get a result of similarity matching which is processed to describe corresponding relationship between two motion sequences. As a method to measure motion similarity, segmented Dynamic Time Warping algorithm is explored to support the aligning matches. The experimental results show that it's precise and efficient to align motion segments, finally we programmed the user interface based on the proposed system.
KEYWORDS: RGB color model, 3D modeling, Video, Video surveillance, Principal component analysis, Matrices, Motion models, Feature extraction, 3D image processing, Data modeling
In this paper, we propose a new effective and robust framework to recognize human actions from depth map sequence. Firstly, 3D motion trail model (3DMTM) is extracted to represent the temporal motion information. Then, two effective heterogeneous features are proposed to descried actions more comprehensive based on 3DMTM. By computing Multilayer Histograms of Oriented Gradient (MHOG) on 3DMTM, 3DMTM-MHOG is obtained to describe local detail information of different actions. Combining Gist and 3DMTM, we can get 3DMTM-Gist to model holistic structural feature of actions. The feature-level fusion method is utilized to merge two descriptors to form the final feature. Lastly, support vector machine (SVM) classification is used for multi-class action recognition. Experimental results on public depth action dataset (MSR Action3D dataset) show that our method is superior to the state-of-the-art methods.
In this paper, we propose a novel feature extraction method for face recognition based on two dimensional fractional Fourier transform (2D-FrFT). First, we extract the phase information of facial image in 2D-FrFT, which is called the generalized phase spectra (GPS). Then, we present an improved two-dimensional separability judgment (I2DSJ) to select appropriate order parameters for discrete fractional Fourier transform. Finally, multiple orders’ generalized phase spectrum bands (MGPSB) fusion is proposed. In order to make full use of the discriminative information from different orders for face recognition, the proposed approach merges different orders’ generalized phase spectra (GPS) of 2D-FrFT. The proposed method is no need to construct the subspace through the feature extraction methods and has less computation cost. Experimental results on the public face databases demonstrate that our method outperforms the representative methods.
In recent years, fractional Fourier transform (FRFT) which contains both spatial and frequency information has been a hot topic. Image registration (IR), as an important preprocessing procedure, is very promising to be implemented on FRFT domain. A novel method based on the properties of FRFT and conventional phase correlation technique is proposed in this paper. This method not only can get more accurate results than previous FRFT-based methods, but also avoids the iterative operation, which greatly reduces the computation complexity. Simulation results prove the proposed superiority than existing methods based on FRFT.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.