Understanding human facial expressions is one of the key steps to achieving human-computer interaction. However, the facial expression is a combination of an expressive component called facial behavior and a neutral component of a person. The most commonly used taxonomy to describe facial behaviors is the Facial Action Coding System (FACS). FACS segments the visible effects of facial muscle activation into 30+ action units (AUs). So, we introduce a method to recognize AUs by extracting information of the expressive component through a de-expression learning procedure, called De-expression Residue Learning (DeRL). Firstly, we train a Generative Adversarial Network named cGAN to filter out the expressive information and generate the corresponding neutral face image. Then, we use the intermediate layers, which contains the action unit information, to recognition AUs. Our work alleviates problems of AUs recognition based on the pixel level difference, which is unreliable due to the variation between images i.e., rotation, translation and lighting condition changes, or the feature level difference, which is also unstable as the expression information may vary according to the identity information. As for experiments, we use the data augmentation method to avoid overfitting and trained deep network to recognition AUs on CK+ datasets. The results reveal that our work achieves more competitive performance than several other popular approaches.
|