With the rise of smart devices, there is an escalating need for lightweight methods in human pose estimation (HPE). Although existing 2D HPE techniques have demonstrated impressive performance on public datasets, they still suffer from high model complexity and latency issues in practical applications. To address the challenge, this paper proposes a novel approach for lightweight 2D HPE. By utilizing shuffle blocks instead of the traditional ResNet, we significantly reduce the model size and computational requirements. Moreover, our method employs the SimCC algorithm to transform the pose estimation task into a coordinate classification task. By discretizing the continuous coordinate values into multiple sub-pixel intervals, we effectively reduce the quantization error encountered in traditional heatmap-based methods. To further enhance the precision of our model, we incorporate a self-attention mechanism into the network, thereby leveraging its benefits for improved accuracy. This mechanism enables refined joint point representation to improve the robustness of the feedforward network with a gated linear unit in the Transformer layer. We conduct a comprehensive evaluation of our method on the MPII and COCO datasets, assessing its performance in terms of model parameters, computational complexity, and accuracy. Furthermore, we perform ablation experiments on the MPII dataset to analyze the individual impact of each component in our approach. The experimental results demonstrate that our lightweight model achieves similar pose estimation performance compared to other lightweight models while having 60% computational complexity.
|