Self-Supervised Learning via Multi-view Facial Rendezvous for 3D/4D Affect Recognition
In this paper, we present Multi-view Facial Rendezvous (MiFaR): a novel multi-view self-supervised learning model for 3D/4D facial affect recognition. Our self-supervised learning architecture has the capability to learn collaboratively via multi-views. For each view, our model learns to compute the embeddings via different encoders and robustly aims to correlate two distorted versions of the input batch. We additionally present a novel loss function that not only leverages the correlation associated with the underlying facial patterns among multi-views but it is also robust and consistent towards different batch sizes. Finally, our model is equipped with distributed training to ensure better learning along with computational convenience. We conduct extensive experiments and report ablations to validate the competence of our model on widely-used datasets for 3D/4D FER.