skip to main content

Title: Semi-supervised contrastive learning for remote sensing: identifying ancient urbanization in the south-central Andes
Archaeology has long faced fundamental issues of sampling and scalar representation. Traditionally, the local-to-regional-scale views of settlement patterns are produced through systematic pedestrian surveys. Recently, systematic manual survey of satellite and aerial imagery has enabled continuous distributional views of archaeological phenomena at interregional scales. However, such ‘brute force’ manual imagery survey methods are both time- and labour-intensive, as well as prone to inter-observer differences in sensitivity and specificity. The development of self-supervised learning methods (e.g. contrastive learning) offers a scalable learning scheme for locating archaeological features using unlabelled satellite and historical aerial images. However, archaeological features are generally only visible in a very small proportion relative to the landscape, while the modern contrastive-supervised learning approach typically yields an inferior performance on highly imbalanced datasets. In this work, we propose a framework to address this long-tail problem. As opposed to the existing contrastive learning approaches that typically treat the labelled and unlabelled data separately, our proposed method reforms the learning paradigm under a semi-supervised setting in order to fully utilize the precious annotated data (<7% in our setting). Specifically, the highly unbalanced nature of the data is employed as the prior knowledge in order to form pseudo negative pairs by ranking the similarities between unannotated image patches and annotated anchor images. In this study, we used 95,358 unlabelled images and 5,830 labelled images in order to solve the issues associated with detecting ancient buildings from a long-tailed satellite image dataset. From the results, our semi-supervised contrastive learning model achieved a promising testing balanced accuracy of 79.0%, which is a 3.8% improvement as compared to other state-of-the-art approaches.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Taylor and Francis
Date Published:
Journal Name:
International Journal of Remote Sensing
Page Range / eLocation ID:
1922 to 1938
Subject(s) / Keyword(s):
["Archaeology, Remote Sensing, Semi-Contrastif Learning"]
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Macromolecular structure classification from cryo-electron tomography (cryo-ET) data is important for understanding macro-molecular dynamics. It has a wide range of applications and is essential in enhancing our knowledge of the sub-cellular environment. However, a major limitation has been insufficient labelled cryo-ET data. In this work, we use Contrastive Self-supervised Learning (CSSL) to improve the previous approaches for macromolecular structure classification from cryo-ET data with limited labels. We first pretrain an encoder with unlabelled data using CSSL and then fine-tune the pretrained weights on the downstream classification task. To this end, we design a cryo-ET domain-specific data-augmentation pipeline. The benefit of augmenting cryo-ET datasets is most prominent when the original dataset is limited in size. Overall, extensive experiments performed on real and simulated cryo-ET data in the semi-supervised learning setting demonstrate the effectiveness of our approach in macromolecular labeling and classification. 
    more » « less
  2. Abstract

    Information about the spatial distribution of species lies at the heart of many important questions in ecology. Logistical limitations and collection biases, however, limit the availability of such data at ecologically relevant scales. Remotely sensed information can alleviate some of these concerns, but presents challenges associated with accurate species identification and limited availability of field data for validation, especially in high diversity ecosystems such as tropical forests.

    Recent advances in machine learning offer a promising and cost‐efficient approach for gathering a large amount of species distribution data from aerial photographs. Here, we propose a novel machine learning framework, artificial perceptual learning (APL), to tackle the problem of weakly supervised pixel‐level mapping of tree species in forests. Challenges arise from limited availability of ground labels for tree species, lack of precise segmentation of tree canopies and misalignment between visible canopies in the aerial images and stem locations associated with ground labels. The proposed APL framework addresses these challenges by constructing a workflow using state‐of‐the‐art machine learning algorithms.

    We develop and illustrate the proposed framework by implementing a fine‐grain mapping of three species, the palmPrestoea acuminataand the tree speciesCecropia schreberianaandManilkara bidentata, over a 5,000‐ha area of El Yunque National Forest in Puerto Rico. These large‐scale maps are based on unlabelled high‐resolution aerial images of unsegmented tree canopies. Misaligned ground‐based labels, available for <1% of these images, serve as the only weak supervision. APL performance is evaluated using ground‐based labels and high‐quality human segmentation using Amazon Mechanical Turk, and compared to a basic workflow that relies solely on labelled images.

    Receiver operating characteristic (ROC) curves and Intersection over Union (IoU) metrics demonstrate that APL substantially outperforms the basic workflow and attains human‐level cognitive economy, with 50‐fold time savings. For the palm andC. schreberiana, the APL framework has high pixelwise accuracy and IoU with reference to human segmentations. ForM.bidentata, APL predictions are congruent with ground‐based labels. Our approach shows great potential for leveraging existing data from global forest plot networks coupled with aerial imagery to map tree species at ecologically meaningful spatial scales.

    more » « less
  3. Abstract

    Classifying images using supervised machine learning (ML) relies on labeled training data—classes or text descriptions, for example, associated with each image. Data‐driven models are only as good as the data used for training, and this points to the importance of high‐quality labeled data for developing a ML model that has predictive skill. Labeling data is typically a time‐consuming, manual process. Here, we investigate the process of labeling data, with a specific focus on coastal aerial imagery captured in the wake of hurricanes that affected the Atlantic and Gulf Coasts of the United States. The imagery data set is a rich observational record of storm impacts and coastal change, but the imagery requires labeling to render that information accessible. We created an online interface that served labelers a stream of images and a fixed set of questions. A total of 1,600 images were labeled by at least two or as many as seven coastal scientists. We used the resulting data set to investigate interrater agreement: the extent to which labelers labeled each image similarly. Interrater agreement scores, assessed with percent agreement and Krippendorff's alpha, are higher when the questions posed to labelers are relatively simple, when the labelers are provided with a user manual, and when images are smaller. Experiments in interrater agreement point toward the benefit of multiple labelers for understanding the uncertainty in labeling data for machine learning research.

    more » « less
  4. The microtopography associated with ice-wedge polygons governs many aspects of Arctic ecosystem, permafrost, and hydrologic dynamics from local to regional scales owing to the linkages between microtopography and the flow and storage of water, vegetation succession, and permafrost dynamics. Wide-spread ice-wedge degradation is transforming low-centered polygons into high-centered polygons at an alarming rate. Accurate data on spatial distribution of ice-wedge polygons at a pan-Arctic scale are not yet available, despite the availability of sub-meter-scale remote sensing imagery. This is because the necessary spatial detail quickly produces data volumes that hamper both manual and semi-automated mapping approaches across large geographical extents. Accordingly, transforming big imagery into ‘science-ready’ insightful analytics demands novel image-to-assessment pipelines that are fueled by advanced machine learning techniques and high-performance computational resources. In this exploratory study, we tasked a deep-learning driven object instance segmentation method (i.e., the Mask R-CNN) with delineating and classifying ice-wedge polygons in very high spatial resolution aerial orthoimagery. We conducted a systematic experiment to gauge the performances and interoperability of the Mask R-CNN across spatial resolutions (0.15 m to 1 m) and image scene contents (a total of 134 km2) near Nuiqsut, Northern Alaska. The trained Mask R-CNN reported mean average precisions of 0.70 and 0.60 at thresholds of 0.50 and 0.75, respectively. Manual validations showed that approximately 95% of individual ice-wedge polygons were correctly delineated and classified, with an overall classification accuracy of 79%. Our findings show that the Mask R-CNN is a robust method to automatically identify ice-wedge polygons from fine-resolution optical imagery. Overall, this automated imagery-enabled intense mapping approach can provide a foundational framework that may propel future pan-Arctic studies of permafrost thaw, tundra landscape evolution, and the role of high latitudes in the global climate system. 
    more » « less
  5. Extracting roads in aerial images has numerous applications in artificial intelligence and multimedia computing, including traffic pattern analysis and parking space planning. Learning deep neural networks, though very successful, demands vast amounts of high-quality annotations, of which acquisition is time-consuming and expensive. In this work, we propose a semi-supervised approach for image-based road extraction where only a small set of labeled images are available for training to address this challenge. We design a pixel-wise contrastive loss to self-supervise the network training to utilize the large corpus of unlabeled images. The key idea is to identify pairs of overlapping image regions (positive) or non-overlapping image regions (negative) and encourage the network to make similar outputs for positive pairs or dissimilar outputs for negative pairs. We also develop a negative sampling strategy to filter false negative samples during the process. An iterative procedure is introduced to apply the network over raw images to generate pseudo-labels, filter and select high-quality labels with the proposed contrastive loss, and re-train the network with the enlarged training dataset. We repeat these iterative steps until convergence. We validate the effectiveness of the proposed methods by performing extensive experiments on the public SpaceNet3 and DeepGlobe Road datasets. Results show that our proposed method achieves state-of-the-art results on public image segmentation benchmarks and significantly outperforms other semi-supervised methods.

    more » « less