skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 5, 2025

Title: geodl: An R package for geospatial deep learning semantic segmentation using torch and terra
Convolutional neural network (CNN)-based deep learning (DL) methods have transformed the analysis of geospatial, Earth observation, and geophysical data due to their ability to model spatial context information at multiple scales. Such methods are especially applicable to pixel-level classification or semantic segmentation tasks. A variety of R packages have been developed for processing and analyzing geospatial data. However, there are currently no packages available for implementing geospatial DL in the R language and data science environment. This paper introduces the geodl R package, which supports pixel-level classification applied to a wide range of geospatial or Earth science data that can be represented as multidimensional arrays where each channel or band holds a predictor variable. geodl is built on the torch package, which supports the implementation of DL using the R and C++ languages without the need for installing a Python/PyTorch environment. This greatly simplifies the software environment needed to implement DL in R. Using geodl, geospatial raster-based data with varying numbers of bands, spatial resolutions, and coordinate reference systems are read and processed using the terra package, which makes use of C++ and allows for processing raster grids that are too large to fit into memory. Training loops are implemented with the luz package. The geodl package provides utility functions for creating raster masks or labels from vector-based geospatial data and image chips and associated masks from larger files and extents. It also defines a torch dataset subclass for geospatial data for use with torch dataloaders. UNet-based models are provided with a variety of optional ancillary modules or modifications. Common assessment metrics (i.e., overall accuracy, class-level recalls or producer’s accuracies, class-level precisions or user’s accuracies, and class-level F1-scores) are implemented along with a modified version of the unified focal loss framework, which allows for defining a variety of loss metrics using one consistent implementation and set of hyperparameters. Users can assess models using standard geospatial and remote sensing metrics and methods and use trained models to predict to large spatial extents. This paper introduces the geodl workflow, design philosophy, and goals for future development.  more » « less
Award ID(s):
2046059
PAR ID:
10584439
Author(s) / Creator(s):
; ; ;
Editor(s):
Sun, Xiaoyong
Publisher / Repository:
PLOS One
Date Published:
Journal Name:
PLOS ONE
Volume:
19
Issue:
12
ISSN:
1932-6203
Page Range / eLocation ID:
e0315127
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Kelp species provide many ecosystem services associated with their three‐dimensional structures. Among these, fast‐growth, canopy‐forming species, like giant kelpMacrocystis pyrifera, are the foundation of kelp forests across many temperate reefs. Giant kelp populations have experienced regional declines in different parts of the world. Giant kelp canopy is very dynamic and can take years to recover from disturbance, challenging comparisons of standing biomass with historical baselines. The Santa Barbara Coastal LTER (SBC LTER), curates a time series of Landsat sensed surface cover and biomass for giant kelp in the west coast of North America. In the last decade, this resource has been fundamental to understanding the species' population dynamics and drivers. However, simple ready‐to‐use summary statistics aimed at classifying regional kelp decline or recovery are not readily available to stakeholders and coastal managers. To this end, we describe here two simple metrics made available through the R package kelpdecline. First, the proportion of Landsat pixels in decline (PPD), in which current biomass is compared with a historical baseline, and second, a pixel occupancy trend (POT), in which current year pixel occupancy is compared to the time‐series long probability of occupancy. The package produces raster maps and output tables summarizing kelp decline and trends over a 0.25 × 0.25° scale. Using kelpdecline, we show how sensitivity analysis onPPDparameter variation can increase the confidence of kelp decline estimates. 
    more » « less
  2. Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data. 
    more » « less
  3. Blanchette, Jasmin; Kovacs, Laura; Pattinson, Dirk (Ed.)
    Definition packages in theorem provers provide users with means of defining and organizing concepts of interest. This system description presents a new definition package for the hybrid systems theorem prover KeYmaera X based on differential dynamic logic (dL). The package adds KeYmaera X support for user-defined smooth functions whose graphs can be implicitly characterized by dL formulas. Notably, this makes it possible to implicitly characterize functions, such as the exponential and trigonometric functions, as solutions of differential equations and then prove properties of those functions using dL's differential equation reasoning principles. Trustworthiness of the package is achieved by minimally extending KeYmaera X's soundness-critical kernel with a single axiom scheme that expands function occurrences with their implicit characterization. Users are provided with a high-level interface for defining functions and non-soundness-critical tactics that automate low-level reasoning over implicit characterizations in hybrid system proofs. 
    more » « less
  4. Evaluating classification accuracy is a key component of the training and validation stages of thematic map production, and the choice of metric has profound implications for both the success of the training process and the reliability of the final accuracy assessment. We explore key considerations in selecting and interpreting loss and assessment metrics in the context of data imbalance, which arises when the classes have unequal proportions within the dataset or landscape being mapped. The challenges involved in calculating single, integrated measures that summarize classification success, especially for datasets with considerable data imbalance, have led to much confusion in the literature. This confusion arises from a range of issues, including a lack of clarity over the redundancy of some accuracy measures, the importance of calculating final accuracy from population-based statistics, the effects of class imbalance on accuracy statistics, and the differing roles of accuracy measures when used for training and final evaluation. In order to characterize classification success at the class level, users typically generate averages from the class-based measures. These averages are sometimes generated at the macro-level, by taking averages of the individual-class statistics, or at the micro-level, by aggregating values within a confusion matrix, and then, calculating the statistic. We show that the micro-averaged producer’s accuracy (recall), user’s accuracy (precision), and F1-score, as well as weighted macro-averaged statistics where the class prevalences are used as weights, are all equivalent to each other and to the overall accuracy, and thus, are redundant and should be avoided. Our experiment, using a variety of loss metrics for training, suggests that the choice of loss metric is not as complex as it might appear to be, despite the range of choices available, which include cross-entropy (CE), weighted CE, and micro- and macro-Dice. The highest, or close to highest, accuracies in our experiments were obtained by using CE loss for models trained with balanced data, and for models trained with imbalanced data, the highest accuracies were obtained by using weighted CE loss. We recommend that, since weighted CE loss used with balanced training is equivalent to CE, weighted CE loss is a good all-round choice. Although Dice loss is commonly suggested as an alternative to CE loss when classes are imbalanced, micro-averaged Dice is similar to overall accuracy, and thus, is particularly poor for training with imbalanced data. Furthermore, although macro-Dice resulted in models with high accuracy when the training used balanced data, when the training used imbalanced data, the accuracies were lower than for weighted CE. In summary, the significance of this paper lies in its provision of readers with an overview of accuracy and loss metric terminology, insight regarding the redundancy of some measures, and guidance regarding best practices. 
    more » « less
  5. Goslee, Sarah (Ed.)
    1. The geodiv r package calculates gradient surface metrics from imagery and other gridded datasets to provide continuous measures of landscape heterogeneity for landscape pattern analysis. 2. geodiv is the first open-source, command line toolbox for calculating many gradient surface metrics and easily integrates parallel computing for applications with large images or rasters (e.g. remotely sensed data). All functions may be applied either globally to derive a single metric for an entire image or locally to create a texture image over moving windows of a user-defined extent. 3. We present a comprehensive description of the functions available through geodiv. A supplemental vignette provides an example application of geodiv to the fields of landscape ecology and biogeography. 4. geodiv allows users to easily retrieve estimates of spatial heterogeneity for a variety of purposes, enhancing our understanding of how environmental structure influences ecosystem processes. The package works with any continuous imagery and may be widely applied in many fields where estimates of surface complexity are useful. 
    more » « less