Deep & Deformable
Deep Convolutional Neural Networks (DCNNs) are currently the method of choice for tasks such that objects and parts detections. Before the advent of DCNNs the method of choice for part detection in a supervised setting (i.e., when part annotations are available) were strongly supervised Deformable Part-based Models (DPMs) on Histograms of Gradients (HoGs) features. Recently, efforts were made to combine the powerful DCNNs features with DPMs which provide an explicit way to model relation between parts. Nevertheless, none of the proposed methodologies provides a unification of DCNNs with strongly supervised DPMs. In this paper, we propose, to the best of our knowledge, the first methodology that jointly trains a strongly supervised DPM and in the same time learns the optimal DCNN features. The proposed methodology not only exploits the relationship between parts but also contains an inherent mechanism for mining of hard-negatives. We demonstrate the power of the proposed approach in facial landmark detection “in-the-wild” where we provide state-of-the-art results for the problem of facial landmark localisation in standard benchmarks such as 300W and 300VW.