Abstract Pre-training is a powerful paradigm in machine learning to pass information across models. For example, suppose one has a modest-sized dataset of images of cats and dogs and plans to fit a deep neural network to classify them. With pre-training, we start with a neural network trained on a large corpus of images of not just cats and dogs but hundreds of classes. We fix all network weights except the top layer(s) and fine tune on our dataset. This often results in dramatically better performance than training solely on our dataset. Here, we ask: ‘Can pre-training help the lasso?’. We propose a framework where the lasso is fit on a large dataset and then fine-tuned on a smaller dataset. The latter can be a subset of the original, or have a different but related outcome. This framework has a wide variety of applications, including stratified and multi-response models. In the stratified model setting, lasso pre-training first estimates coefficients common to all groups, then estimates group-specific coefficients during fine-tuning. Under appropriate assumptions, support recovery of the common coefficients is superior to the usual lasso trained on individual groups. This separate identification of common and individual coefficients also aids scientific understanding.
more »
« less
Learning to Interpret Satellite Images using Wikipedia
Despite recent progress in computer vision, fine-grained interpretation of satellite images remains challenging because of a lack of labeled training data. To overcome this limitation, we construct a novel dataset called WikiSatNet by pairing geo-referenced Wikipedia articles with satellite imagery of their corresponding locations. We then propose two strategies to learn representations of satellite images by predicting properties of the corresponding articles from the images. Leveraging this new multi-modal dataset, we can drastically reduce the quantity of human-annotated labels and time required for downstream tasks. On the recently released fMoW dataset, our pre-training strategies can boost the performance of a model pre-trained on ImageNet by up to 4.5% in F1 score.
more »
« less
- Award ID(s):
- 1651565
- PAR ID:
- 10136095
- Date Published:
- Journal Name:
- Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
- Page Range / eLocation ID:
- 3620 to 3626
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The rapid intensification (RI) of tropical cyclones (TC), defined here as an intensity increase of ≥ 30 kt in 24 hours, is a difficult but important forecasting problem. Operational RI forecasts have considerably improved since the late 2000s, largely thanks to better statistical models, including machine learning (ML). Most ML applications use scalars from the Statistical Hurricane Intensity Prediction Scheme (SHIPS) development dataset as predictors, describing the TC history, near-TC environment, and satellite presentation of the TC. More recent ML applications use convolutional neural networks (CNN), which can ingest full satellite images (or time series of images) and freely “decide” which spatiotemporal features are important for RI. However, two questions remain unanswered: (1) Does image convolution significantly improve RI skill? (2) What strategies do CNNs use for RI prediction – and can we gain new insights from these strategies? We use an ablation experiment to answer the first question and explainable artificial intelligence (XAI) to answer the second. Convolution leads to only a small performance gain, likely because, as revealed by XAI, the CNN’s main strategy uses image features already well described in scalar predictors used by pre-existing RI models. This work makes three additional contributions to the literature: (1) NNs with SHIPS data outperform pre-existing models in some aspects; (2) NNs provide well calibrated uncertainty quantification (UQ), while pre-existing models have no UQ; (3) the NN without SHIPS data performs surprisingly well and is fairly independent of pre-existing models, suggesting its potential value in an operational ensemble.more » « less
-
Global warming is one of the world’s most pressing issues. The study of its effects on the polar ice caps and other arctic environments, however, can be hindered by the often dangerous and difficult to navigate terrain found there. Multi-terrain autonomous vehicles can assist researchers by providing a mobile platform on which to collect data in these harsh environments while avoiding any risk to human life and speeding up the research process. The mechanical design and ultimate efficacy of these autonomous robotic vehicles depends largely on the specific missions they are deployed for, but terrain conditions can vary wildly geographically as well as seasonally, making mission planning for these unmanned vehicles more difficult. This paper proposes the use of various UNet-based neural network architectures to generate digital elevation maps from satellite images, and explores and compares their efficacy on a single set of training and validation datasets generated from satellite imagery. These digital elevation maps generated by the model could be used by researchers not only to track the change in arctic topography over time, but to quickly provide autonomous exploratory research rovers with the topographical information necessary to decide on optimal paths during the mission. This paper analyzes different model architectures and training schemes: a traditional UNet, a traditional UNet with data augmentation, a UNet with a single active skip-layer vision transformer (ViT), and a UNet with multiple active skip-layer ViT. Each model was trained on a dataset of satellite images and corresponding digital elevation maps of Ellesmere Island, Canada. Utilizing ViTs did not demonstrate a significant improvement in UNet performance, though this could change with longer training. This paper proposes opportunities to improve performance for these neural networks, as well as next steps for further research, including improving the diversity of images in the dataset, generating a testing dataset from a completely different geographic location, and allowing the models more time to train.more » « less
-
Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain. While large-scale pre-trained models are useful for image classification across domains, it remains unclear if they can be applied in a zero-shot manner to more complex tasks like ReC. We present ReCLIP, a simple but strong zero-shot baseline that repurposes CLIP, a state-of-the-art large-scale model, for ReC. Motivated by the close connection between ReC and CLIP’s contrastive pre-training objective, the first component of ReCLIP is a region-scoring method that isolates object proposals via cropping and blurring, and passes them to CLIP. However, through controlled experiments on a synthetic dataset, we find that CLIP is largely incapable of performing spatial reasoning off-the-shelf. We reduce the gap between zero-shot baselines from prior work and supervised models by as much as 29% on RefCOCOg, and on RefGTA (video game imagery), ReCLIP’s relative improvement over supervised ReC models trained on real images is 8%.more » « less
-
Masked autoencoders employ random masking to effectively reconstruct input images using self-supervised techniques, which allows for efficient training on large datasets. However, the random masking strategy does not adequately tap into information encapsulated within high-dimensional hyperspectral satellite imagery that is used in several domains. We propose a novel masking strategy, HOGMAE, based on the Histogram of Oriented Gradients that incorporates rich information inherent within satellite images during the mask creation step. Our experiments, over a hyperspectral satellite dataset, demonstrate the effectiveness of our methodology.more » « less
An official website of the United States government

