Feature fusion with deep supervision for remote-sensing image scene classification
The convolutional neural networks (CNNs) have shown an intrinsic ability to automatically extract high level representations for image classification, but there is a major hurdle to their deployment in the remote-sensing domain because of a relative lack of training data. Moreover, traditional fusion methods use either low-level features or score-based fusion to fuse the features. In order to address the aforementioned issues, we employed a deep supervision (DS) strategy to enhance the generalization performance in the intermediate layers of the AlexNet model for remote-sensing image scene classification. The proposed DS strategy not only prevents from overfitting, but also extracts the features more transparently. Secondly, the canonical correlation analysis (CCA) is adopted as a feature fusion strategy to further refine the features with more discriminative power. The fused AlexNet features achieved by the proposed framework have much higher discrimination than the pure features. Extensive experiments on two challenging datasets: 1) UC MERCED data set and 2) WHU-RS dataset demonstrate that the two proposed approaches both enhance the performance of the original AlexNet architecture, and also outperform several state-of-the-art methods currently in use.