skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning to Interpret Satellite Images using Wikipedia
Despite recent progress in computer vision, fine-grained interpretation of satellite images remains challenging because of a lack of labeled training data. To overcome this limitation, we construct a novel dataset called WikiSatNet by pairing geo-referenced Wikipedia articles with satellite imagery of their corresponding locations. We then propose two strategies to learn representations of satellite images by predicting properties of the corresponding articles from the images. Leveraging this new multi-modal dataset, we can drastically reduce the quantity of human-annotated labels and time required for downstream tasks. On the recently released fMoW dataset, our pre-training strategies can boost the performance of a model pre-trained on ImageNet by up to 4.5% in F1 score.  more » « less
Award ID(s):
1651565
PAR ID:
10136095
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Page Range / eLocation ID:
3620 to 3626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The rapid intensification (RI) of tropical cyclones (TC), defined here as an intensity increase of ≥ 30 kt in 24 hours, is a difficult but important forecasting problem. Operational RI forecasts have considerably improved since the late 2000s, largely thanks to better statistical models, including machine learning (ML). Most ML applications use scalars from the Statistical Hurricane Intensity Prediction Scheme (SHIPS) development dataset as predictors, describing the TC history, near-TC environment, and satellite presentation of the TC. More recent ML applications use convolutional neural networks (CNN), which can ingest full satellite images (or time series of images) and freely “decide” which spatiotemporal features are important for RI. However, two questions remain unanswered: (1) Does image convolution significantly improve RI skill? (2) What strategies do CNNs use for RI prediction – and can we gain new insights from these strategies? We use an ablation experiment to answer the first question and explainable artificial intelligence (XAI) to answer the second. Convolution leads to only a small performance gain, likely because, as revealed by XAI, the CNN’s main strategy uses image features already well described in scalar predictors used by pre-existing RI models. This work makes three additional contributions to the literature: (1) NNs with SHIPS data outperform pre-existing models in some aspects; (2) NNs provide well calibrated uncertainty quantification (UQ), while pre-existing models have no UQ; (3) the NN without SHIPS data performs surprisingly well and is fairly independent of pre-existing models, suggesting its potential value in an operational ensemble. 
    more » « less
  2. Global warming is one of the world’s most pressing issues. The study of its effects on the polar ice caps and other arctic environments, however, can be hindered by the often dangerous and difficult to navigate terrain found there. Multi-terrain autonomous vehicles can assist researchers by providing a mobile platform on which to collect data in these harsh environments while avoiding any risk to human life and speeding up the research process. The mechanical design and ultimate efficacy of these autonomous robotic vehicles depends largely on the specific missions they are deployed for, but terrain conditions can vary wildly geographically as well as seasonally, making mission planning for these unmanned vehicles more difficult. This paper proposes the use of various UNet-based neural network architectures to generate digital elevation maps from satellite images, and explores and compares their efficacy on a single set of training and validation datasets generated from satellite imagery. These digital elevation maps generated by the model could be used by researchers not only to track the change in arctic topography over time, but to quickly provide autonomous exploratory research rovers with the topographical information necessary to decide on optimal paths during the mission. This paper analyzes different model architectures and training schemes: a traditional UNet, a traditional UNet with data augmentation, a UNet with a single active skip-layer vision transformer (ViT), and a UNet with multiple active skip-layer ViT. Each model was trained on a dataset of satellite images and corresponding digital elevation maps of Ellesmere Island, Canada. Utilizing ViTs did not demonstrate a significant improvement in UNet performance, though this could change with longer training. This paper proposes opportunities to improve performance for these neural networks, as well as next steps for further research, including improving the diversity of images in the dataset, generating a testing dataset from a completely different geographic location, and allowing the models more time to train. 
    more » « less
  3. Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain. While large-scale pre-trained models are useful for image classification across domains, it remains unclear if they can be applied in a zero-shot manner to more complex tasks like ReC. We present ReCLIP, a simple but strong zero-shot baseline that repurposes CLIP, a state-of-the-art large-scale model, for ReC. Motivated by the close connection between ReC and CLIP’s contrastive pre-training objective, the first component of ReCLIP is a region-scoring method that isolates object proposals via cropping and blurring, and passes them to CLIP. However, through controlled experiments on a synthetic dataset, we find that CLIP is largely incapable of performing spatial reasoning off-the-shelf. We reduce the gap between zero-shot baselines from prior work and supervised models by as much as 29% on RefCOCOg, and on RefGTA (video game imagery), ReCLIP’s relative improvement over supervised ReC models trained on real images is 8%. 
    more » « less
  4. We present a new annotated microscopic cellular image dataset to improve the effectiveness of machine learning methods for cellular image analysis. Cell counting is an important step in cell analysis. Typically, domain experts manually count cells in a microscopic image. Automated cell counting can potentially eliminate this tedious, time-consuming process. However, a good, labeled dataset is required for training an accurate machine learning model. Our dataset includes microscopic images of cells, and for each image, the cell count and the location of individual cells. The data were collected as part of an ongoing study investigating the potential of electrical stimulation to modulate stem cell differentiation and possible applications for neural repair. Compared to existing publicly available datasets, our dataset has more images of cells stained with more variety of antibodies (protein components of immune responses against invaders) typically used for cell analysis. The experimental results on this dataset indicate that none of the five existing models under this study are able to achieve sufficiently accurate count to replace the manual methods. The dataset is available at https://figshare.com/articles/dataset/Dataset/21970604. 
    more » « less
  5. Abstract Insect pests cause significant damage to food production, so early detection and efficient mitigation strategies are crucial. There is a continual shift toward machine learning (ML)‐based approaches for automating agricultural pest detection. Although supervised learning has achieved remarkable progress in this regard, it is impeded by the need for significant expert involvement in labeling the data used for model training. This makes real‐world applications tedious and oftentimes infeasible. Recently, self‐supervised learning (SSL) approaches have provided a viable alternative to training ML models with minimal annotations. Here, we present an SSL approach to classify 22 insect pests. The framework was assessed on raw and segmented field‐captured images using three different SSL methods, Nearest Neighbor Contrastive Learning of Visual Representations (NNCLR), Bootstrap Your Own Latent, and Barlow Twins. SSL pre‐training was done on ResNet‐18 and ResNet‐50 models using all three SSL methods on the original RGB images and foreground segmented images. The performance of SSL pre‐training methods was evaluated using linear probing of SSL representations and end‐to‐end fine‐tuning approaches. The SSL‐pre‐trained convolutional neural network models were able to perform annotation‐efficient classification. NNCLR was the best performing SSL method for both linear and full model fine‐tuning. With just 5% annotated images, transfer learning with ImageNet initialization obtained 74% accuracy, whereas NNCLR achieved an improved classification accuracy of 79% for end‐to‐end fine‐tuning. Models created using SSL pre‐training consistently performed better, especially under very low annotation, and were robust to object class imbalances. These approaches help overcome annotation bottlenecks and are resource efficient. 
    more » « less