skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How far can I extrapolate my species distribution model? Exploring shape, a novel method
Species distribution and ecological niche models (hereafter SDMs) are popular tools with broad applications in ecology, biodiversity conservation, and environmental science. Many SDM applications require projecting models in environmental conditions non‐analog to those used for model training (extrapolation), giving predictions that may be statistically unsupported and biologically meaningless. We introduce a novel method, Shape, a model‐agnostic approach that calculates the extrapolation degree for a given projection data point by its multivariate distance to the nearest training data point. Such distances are relativized by a factor that reflects the dispersion of the training data in environmental space. Distinct from other approaches, Shape incorporates an adjustable threshold to control the binary discrimination between acceptable and unacceptable extrapolation degrees. We compared Shape's performance to five extrapolation metrics based on their ability to detect analog environmental conditions in environmental space and improve SDMs suitability predictions. To do so, we used 760 virtual species to define different modeling conditions determined by species niche tolerance, distribution equilibrium condition, sample size, and algorithm. All algorithms had trouble predicting species niches. However, we found a substantial improvement in model predictions when model projections were truncated independently of extrapolation metrics. Shape's performance was dependent on extrapolation threshold used to truncate models. Because of this versatility, our approach showed similar or better performance than the previous approaches and could better deal with all modeling conditions and algorithms. Our extrapolation metric is simple to interpret, captures the complex shapes of the data in environmental space, and can use any extrapolation threshold to define whether model predictions are retained based on the extrapolation degrees. These properties make this approach more broadly applicable than existing methods for creating and applying SDMs. We hope this method and accompanying tools support modelers to explore, detect, and reduce extrapolation errors to achieve more reliable models. Keywords: environmental novelty, extrapolation, Mahalanobis distance, model prediction, non‐analog environmental data, transferability  more » « less
Award ID(s):
1853697
PAR ID:
10513683
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nordic Society Oikos
Date Published:
Journal Name:
Ecography
Volume:
2024
Issue:
3
ISSN:
0906-7590
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Predictions from species distribution models (SDMs) are commonly used in support of environmental decision-making to explore potential impacts of climate change on biodiversity. However, because future climates are likely to differ from current climates, there has been ongoing interest in understanding the ability of SDMs to predict species responses under novel conditions (i.e., model transferability). Here, we explore the spatial and environmental limits to extrapolation in SDMs using forest inventory data from 11 model algorithms for 108 tree species across the western United States. Algorithms performed well in predicting occurrence for plots that occurred in the same geographic region in which they were fitted. However, a substantial portion of models performed worse than random when predicting for geographic regions in which algorithms were not fitted. Our results suggest that for transfers in geographic space, no specific algorithm was better than another as there were no significant differences in predictive performance across algorithms. There were significant differences in predictive performance for algorithms transferred in environmental space with GAM performing best. However, the predictive performance of GAM declined steeply with increasing extrapolation in environmental space relative to other algorithms. The results of this study suggest that SDMs may be limited in their ability to predict species ranges beyond the environmental data used for model fitting. When predicting climate-driven range shifts, extrapolation may also not reflect important biotic and abiotic drivers of species ranges, and thus further misrepresent the realized shift in range. Future studies investigating transferability of process based SDMs or relationships between geodiversity and biodiversity may hold promise. 
    more » « less
  2. Abstract 1. Species distribution models (SDMs) are crucial tools for understanding and predicting biodiversity patterns, yet they often struggle with limited data, biased sampling, and complex species-environment relationships. Here I present NicheFlow, a novel foundation model for SDMs that leverages generative AI to address these challenges and advance our ability to model and predict species distributions across taxa and environments. 2. NicheFlow employs a two-stage generative approach, combining species embeddings with two chained generative models, one to generate a distribution in environmental space, and a second to generate a distribution in geographic space. This architecture allows for the sharing of information across species and captures complex, non-linear relationships in environmental space. I trained NicheFlow on a comprehensive dataset of reptile distributions and evaluated its performance using both standard SDM metrics and zero-shot prediction tasks. 3. NicheFlow demonstrates good predictive performance, particularly for rare and data-deficient species. The model successfully generated plausible distributions for species not seen during training, showcasing its potential for zero-shot prediction. The learned species embeddings captured meaningful ecological information, revealing patterns in niche structure across taxa, latitude and range sizes. 4. As a proof-of-principle foundation model, NicheFlow represents a significant advance in species distribution modeling, offering a powerful tool for addressing pressing questions in ecology, evolution, and conservation biology. Its ability to model joint species distributions and generate hypothetical niches opens new avenues for exploring ecological and evolutionary questions, including ancestral niche reconstruction and community assembly processes. This approach has the potential to transform our understanding of biodiversity patterns and improve our capacity to predict and manage species distributions in the face of global change. 
    more » « less
  3. Species distribution models (SDMs), which relate recorded observations (presences) and absences or background points to environmental characteristics, are powerful tools used to generate hypotheses about the biogeography, ecology, and conservation of species. Although many researchers have examined the effects of presence and background point distributions on model outputs, they have not systematically evaluated the effects of various methods of background point sampling on the performance of a single model algorithm across many species. Therefore, a consensus on the preferred methods of background point sampling is lacking. Here, we conducted presence-background SDMs for 20 vertebrate species in North America under a variety of background point conditions, varying the number of background points used, the size of the buffer used to constrain the background points around the occurrences, and the percentage of background points sampled within the buffer (“spatial weighting”). We evaluated the accuracy and transferability of the models using Boyce index, overlap with expert-generated range maps, and area overpredicted and underpredicted by the SDM (and AUC for comparability with other studies). SDM performance is highly dependent on the species modelled but is affected by the number and spread of background points. Models with little spatial weighting had high accuracy (overlap values), but extreme extrapolation errors and overprediction. In contrast, SDMs with high transferability (high Boyce index values and low overprediction) had moderate-to-high spatial weighting. These results emphasize the importance of both background points and evaluation metric selection in SDMs. For other, more successful metrics, using many background points with spatial weighting may be preferred for models with large extents. These results can assist researchers in selecting the background point parameters most relevant for their research question, allowing them to fine-tune their hypotheses on the distribution of species through space and time. 
    more » « less
  4. Environmental conditions are dynamic, and plants respond to those dynamics on multiple time scales. Disequilibrium occurs when a response occurs more slowly than the driving environmental changes. We review evidence regarding disequilibrium in plant distributions, including their responses to paleoclimate changes, recent climate change and new species introductions. There is strong evidence that plant species distributions are often in some disequilibrium with their environmental conditions.This disequilibrium poses a challenge when projecting future species distributions using species distribution models (SDMs). Classically, SDMs assume that the set of species occurrences is an unbiased sample of the suitable environmental conditions. However, a species in disequilibrium with the environment may have higher‐than‐expected occurrence probabilities (e.g. due to extinction debts) or lower‐than‐expected occurrence probabilities (e.g. due to dispersal limitation) in different areas. If unaccounted for, this will lead to biased estimates of the environmental suitability.We review methods for avoiding such biases in SDMs, ranging from simple thinning of the occurrence dataset to complex dynamic and process‐based models. Such models require large data inputs, natural history knowledge and technical expertise, so implementing them can be challenging. Despite this, we advocate for their increased use, since process‐based models provide the best potential to account for biases in model training data and to then represent the dynamics of species occupancy as ranges shift.Synthesis. Occurrence records for a species are often in disequilibrium with climate. SDMs trained on such data will produce biased estimates of a species' niche unless this disequilibrium is addressed in the modelling. A range of tools, spanning a wide gradient of complexity and realism, can resolve this bias. 
    more » « less
  5. Dainton, John (Ed.)
    Improving models of species' distributions is essential for conservation, especially in light of global change. Species distribution models (SDMs) often rely on mean environmental conditions, yet species distributions are also a function of environmental heterogeneity and filtering acting at multiple spatial scales. Geodiversity, which we define as the variation of abiotic features and processes of Earth's entire geosphere (inclusive of climate), has potential to improve SDMs and conservation assessments, as they capture multiple abiotic dimensions of species niches, however they have not been sufficiently tested in SDMs. We tested a range of geodiversity variables computed at varying scales using climate and elevation data. We compared predictive performance of MaxEnt SDMs generated using CHELSA bioclimatic variables to those also including geodiversity variables for 31 mammalian species in Colombia. Results show the spatial grain of geodiversity variables affects SDM performance. Some variables consistently exhibited an increasing or decreasing trend in variable importance with spatial grain, showing slight scale-dependence and indicating that some geodiversity variables are more relevant at particular scales for some species. Incorporating geodiversity variables into SDMs, and doing so at the appropriate spatial scales, enhances the ability to model species-environment relationships, thereby contributing to the conservation and management of biodiversity. This article is part of the Theo Murphy meeting issue ‘Geodiversity for science and society’. 
    more » « less