Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition
In this paper, we present a sparsity-aware deep network for automatic 3D/4D facial expression recognition (FER). We first propose a novel augmentation method to combat the data limitation problem for deep learning, specifically given 3D/4D face meshes. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing randomized channel concatenation. Encoded in the given 3D landmarks, we also introduce an effective way to capture the facial muscle movements from three orthogonal plans (TOP), the TOP-landmarks over multi-views. Importantly, we then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views. This is not only effective for a higher recognition accuracy but also computationally convenient. For training, the TOP-landmarks and sparse representations are used to train a long short-term memory (LSTM) network for 4D data, and a pre-trained network for 3D data. The refined predictions are achieved when the learned features collaborate over multi-views. Extensive experimental results achieved on the Bosphorus, BU-3DFE, BU-4DFE and BP4D-Spontaneous datasets show the significance of our method over the state-of-the-art methods and demonstrate its effectiveness by reaching a promising accuracy of 99.69% on BU-4DFE for 4D FER.