Ground truth annotation of the occurrence and intensity of FACS Action Unit (AU) activation requires great amount of attention. The efforts towards achieving a common platform for AU evaluation have been addressed in the FG 2015 Facial Expression Recognition and Analysis challenge (FERA 2015). Participants are invited to estimate AU occurrence and intensity on a common benchmark dataset. Conventional approaches towards achieving automated methods are to train multiclass classifiers or to use regression models. In this paper, we propose a novel application of a deep convolutional neural network (CNN) to recognize AUs as part of FERA 2015 challenge. The 7 layer network is composed of 3 convolutional layers and a max-pooling layer. The final fully connected layers provide the classification output. For the selected tasks of the challenge, we have trained two different networks for the two different datasets, where one focuses on the AU occurrences and the other on both occurrences and intensities of the AUs. The occurrence and intensity of AU activation are estimated using specific neuron activations of the output layer. This way, we are able to create a single network architecture that could simultaneously be trained to produce binary and continuous classification output.
@en