Xuanyu Liu, Shuliang Zhang, Junjie Hu, Peiyu Mao
Journal of Electronic Imaging, Vol. 33, Issue 04, 043012, (July 2024) https://doi.org/10.1117/1.JEI.33.4.043012
TOPICS: Facial recognition systems, Convolution, Detection and tracking algorithms, Deformation, Target detection, Education and training, Feature extraction, Data modeling, Performance modeling, Neural networks
The detection of multiple faces in unconstrained environment in deep learning suffers from insufficient detection accuracy and inefficiency; at the same time, the detection of blurred, occluded, and very small faces is even more unsatisfactory. The detection of blurred, occluded, and very small faces in multiple face detection in unconstrained environment is a hard problem in face detection nowadays. It is difficult to balance the detection accuracy and real-time efficiency in face detection with the improved RetinaFace chosen in this study. Therefore, in order to improve the efficiency of detecting blurred, occluded, and very small faces among multiple faces in unconstrained environments, we introduce deformable convolution, feature pyramid networks (FPN), and coordinate attention (CA) attention mechanism based on RetinaFace algorithm. Deformable convolution can be dynamically adjusted according to the shape and deformation of the recognized object and is no longer limited to a fixed-size square receptive field to improve the image feature extraction capability of the convolutional layer. FPN enhances the feature semantic information of the lower layers with a small increase in computational effort and improves the robustness of the detection algorithm to detect targets of different sizes. CA is a novel, lightweight, and efficient attention mechanism module for improving model performance, which can be easily integrated into mobile networks to improve accuracy with little additional computational overhead. The improved ResRetinaFace algorithm does not increase the computational overhead too much while improving the recognition accuracy, and it can better combine the characteristics of multiple postures and deformations of faces in complex scenes, adapt to the deformation state of faces’ postures, and provide more effective features for face detection, so as to pay better attention to the detection target and enhance the network characterization ability. Meanwhile, the improved algorithm combines the feature pyramid with the context module, which improves the detection effect in the case of blurred, occluded, and very small faces. The experimental outcomes demonstrate that, in contrast to the method before enhancement, the accuracy rates for easy, medium, and hard classification scenarios on the WIDER FACE dataset, utilizing the ResNet50 backbone network, are 94.83%, 93.28%, and 84.99%, respectively. Accompanied by a frames-per-second rate of 7.704, this meets the precision and real-time criteria for face measurement tasks. Validation on the WIDER FACE dataset further affirms that ResRetinaFace consistently achieves reliable face detection while maintaining high detection efficiency.