skip to main content

Title: Clustering Augmented Self-Supervised Learning: An Application to Land Cover Mapping
Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due to the unavailability of such large-scale labeled datasets and the heterogeneity of data sources caused by the varying spatial and spectral resolution of different sensors. Self-supervised learning is an alternative approach that learns feature representation from unlabeled images without using any human annotations. In this paper, we introduce a new method for land cover mapping by using a clustering-based pretext task for self-supervised learning. We demonstrate the effectiveness of the method on two societally relevant applications from the aspect of segmentation performance, discriminative feature representation learning, and the underlying cluster structure. We also show the effectiveness of the active sampling using the clusters obtained from our method in improving the mapping accuracy given a limited budget of annotating.
Authors:
; ; ; ;
Award ID(s):
1838159
Publication Date:
NSF-PAR ID:
10294945
Journal Name:
DEEPSPATIA 2021: 2nd ACM SIGKDD Workshop on Deep Learning for Spatiotemporal Data, Applications, and Systems
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vectormore »representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy.« less
  2. Flooding is one of the leading threats of natural disasters to human life and property, especially in densely populated urban areas. Rapid and precise extraction of the flooded areas is key to supporting emergency-response planning and providing damage assessment in both spatial and temporal measurements. Unmanned Aerial Vehicles (UAV) technology has recently been recognized as an efficient photogrammetry data acquisition platform to quickly deliver high-resolution imagery because of its cost-effectiveness, ability to fly at lower altitudes, and ability to enter a hazardous area. Different image classification methods including SVM (Support Vector Machine) have been used for flood extent mapping. In recent years, there has been a significant improvement in remote sensing image classification using Convolutional Neural Networks (CNNs). CNNs have demonstrated excellent performance on various tasks including image classification, feature extraction, and segmentation. CNNs can learn features automatically from large datasets through the organization of multi-layers of neurons and have the ability to implement nonlinear decision functions. This study investigates the potential of CNN approaches to extract flooded areas from UAV imagery. A VGG-based fully convolutional network (FCN-16s) was used in this research. The model was fine-tuned and a k-fold cross-validation was applied to estimate the performance of the modelmore »on the new UAV imagery dataset. This approach allowed FCN-16s to be trained on the datasets that contained only one hundred training samples, and resulted in a highly accurate classification. Confusion matrix was calculated to estimate the accuracy of the proposed method. The image segmentation results obtained from FCN-16s were compared from the results obtained from FCN-8s, FCN-32s and SVMs. Experimental results showed that the FCNs could extract flooded areas precisely from UAV images compared to the traditional classifiers such as SVMs. The classification accuracy achieved by FCN-16s, FCN-8s, FCN-32s, and SVM for the water class was 97.52%, 97.8%, 94.20% and 89%, respectively.« less
  3. Surface meltwater generated on ice shelves fringing the Antarctic Ice Sheet can drive ice-shelf collapse, leading to ice sheet mass loss and contributing to global sea level rise. A quantitative assessment of supraglacial lake evolution is required to understand the influence of Antarctic surface meltwater on ice-sheet and ice-shelf stability. Cloud computing platforms have made the required remote sensing analysis computationally trivial, yet a careful evaluation of image processing techniques for pan-Antarctic lake mapping has yet to be performed. This work paves the way for automating lake identification at a continental scale throughout the satellite observational record via a thorough methodological analysis. We deploy a suite of different trained supervised classifiers to map and quantify supraglacial lake areas from multispectral Landsat-8 scenes, using training data generated via manual interpretation of the results from k-means clustering. Best results are obtained using training datasets that comprise spectrally diverse unsupervised clusters from multiple regions and that include rock and cloud shadow classes. We successfully apply our trained supervised classifiers across two ice shelves with different supraglacial lake characteristics above a threshold sun elevation of 20°, achieving classification accuracies of over 90% when compared to manually generated validation datasets. The application of our trainedmore »classifiers produces a seasonal pattern of lake evolution. Cloud shadowed areas hinder large-scale application of our classifiers, as in previous work. Our results show that caution is required before deploying ‘off the shelf’ algorithms for lake mapping in Antarctica, and suggest that careful scrutiny of training data and desired output classes is essential for accurate results. Our supervised classification technique provides an alternative and independent method of lake identification to inform the development of a continent-wide supraglacial lake mapping product.« less
  4. Manually annotating complex scene point cloud datasets is both costly and error-prone. To reduce the reliance on labeled data, we propose a snapshot-based self-supervised method to enable direct feature learning on the unlabeled point cloud of a complex 3D scene. A snapshot is defined as a collection of points sampled from the point cloud scene. It could be a real view of a local 3D scan directly captured from the real scene, or a virtual view of such from a large 3D point cloud dataset. First the snapshots go through a self-supervised pipeline including both part contrasting and snapshot clustering for feature learning. Then a weakly-supervised approach is implemented by training a standard SVM classifier on the learned features with a small fraction of labeled data. We evaluate the weakly-supervised approach for point cloud classification by using varying numbers of labeled data and study the minimal numbers of labeled data for a successful classification. Experiments are conducted on three public point cloud datasets, and the results have shown that our method is capable of learning effective features from the complex scene data without any labels.
  5. Abstract—The emergence of remote sensing technologies cou- pled with local monitoring workstations enables us the un- precedented ability to monitor the environment in large scale. Information mining from multi-channel geo-spatiotemporal data however poses great challenges to many computational sustainability applications. Most existing approaches adopt various dimensionality reduction techniques without fully taking advantage of the spatiotemporal nature of the data. In addition, the lack of labeled training data raises another challenge for modeling such data. In this work, we propose a novel semi-supervised attention-based deep representation model that learns context-aware spatiotemporal representations for prediction tasks. A combination of convolutional neural networks with a hybrid attention mechanism is adopted to extract spatial and temporal variations in the geo-spatiotemporal data. Recognizing the importance of capturing more complete temporal dependencies, we propose the hybrid attention mechanism which integrates a learnable global query into the classic self-attention mechanism. To overcome the data scarcity issue, sampled spatial and temporal context that naturally reside in the largely-available unlabeled geo-spatiotemporal data are exploited to aid meaningful representation learning. We conduct experiments on a large-scale real-world crop yield prediction task. The results show that our methods significantly outperforms existing state-of-the-art yield prediction methods, especially under the stress of trainingmore »data scarcity.« less