In recent years, the application of age and gender estimation from face images are becoming increasingly wider and deeper. Existing age and gender estimation pipelines usually process images through machine learning, like SVM, AdaBoost and etc. However, the performance gain of such method is usually limited to handle images with strict conditions or simple backgrounds. At present, age and gender estimation in an open environment still face enormous challenges. In this paper, we introduce a method based on double channel convolutional neural network (CNN) for accurate age and gender estimation in complex scenarios. To start with, detecting face regions with single-face or multifaces. Secondly, utilizing the face alignment based on the facial landmark detection. Finally, using double channel CNN structure with Xgboost to train the model for age and gender estimation. Experiments show that the proposed method based on double channel CNN can achieve a higher accuracy at comparable time cost compared with single channel CNN method and is robust to face images from wild conditions.
Recently, object detection has been widely used in power systems to assist fault diagnosis of transmission lines. However, it is still faced with great challenges due to multi-size targets existing in a single inspection image. Current state-of-art object detection pipelines, like Faster R-CNN, perform well on large objects with low resolution, but usually fail to detect small objects due to low resolution and poor representation. Many existing object detectors for this problem typically exploit feature pyramids, multi-scale image inputs, etc., which can attain high accuracy but is computation and memory consuming. In this paper, we propose an improved cascaded Faster R-CNNs framework that reduces the computational cost while maintaining high detection accuracy to cope with multi-size object detection in high-resolution inspection images, where the first-stage Faster R-CNN is used to detect large objects while the second one detects small objects relative to large objects. We further merge the first-stage and the second into a single network by sharing convolutional features–using the semantic context between multi-size targets, the first stage tells the second where to look. For the "tell" step, we just map the bounding box coordinates of large objects detected in the first stage to the VGG16 network, crop the corresponding feature maps and feed them to the following second stage. Experiments on the test datasets demonstrate that our method achieves a higher detection mAP of 87.6% at 5FPS on an NVidia Titan X compared with the one-stage Faster R-CNN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.