skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: NicheFlow: Towards a foundation model for Species Distribution Modelling
Abstract 1. Species distribution models (SDMs) are crucial tools for understanding and predicting biodiversity patterns, yet they often struggle with limited data, biased sampling, and complex species-environment relationships. Here I present NicheFlow, a novel foundation model for SDMs that leverages generative AI to address these challenges and advance our ability to model and predict species distributions across taxa and environments. 2. NicheFlow employs a two-stage generative approach, combining species embeddings with two chained generative models, one to generate a distribution in environmental space, and a second to generate a distribution in geographic space. This architecture allows for the sharing of information across species and captures complex, non-linear relationships in environmental space. I trained NicheFlow on a comprehensive dataset of reptile distributions and evaluated its performance using both standard SDM metrics and zero-shot prediction tasks. 3. NicheFlow demonstrates good predictive performance, particularly for rare and data-deficient species. The model successfully generated plausible distributions for species not seen during training, showcasing its potential for zero-shot prediction. The learned species embeddings captured meaningful ecological information, revealing patterns in niche structure across taxa, latitude and range sizes. 4. As a proof-of-principle foundation model, NicheFlow represents a significant advance in species distribution modeling, offering a powerful tool for addressing pressing questions in ecology, evolution, and conservation biology. Its ability to model joint species distributions and generate hypothetical niches opens new avenues for exploring ecological and evolutionary questions, including ancestral niche reconstruction and community assembly processes. This approach has the potential to transform our understanding of biodiversity patterns and improve our capacity to predict and manage species distributions in the face of global change.  more » « less
Award ID(s):
2329701
PAR ID:
10583022
Author(s) / Creator(s):
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Anthropogenic pressures on biodiversity necessitate efficient and highly scalable methods to predict global species distributions. Current species distribution models (SDMs) face limitations with large-scale datasets, complex interspecies interactions, and data quality. Here, we introduce EcoVAE, a framework of autoencoder-based generative models trained separately on nearly 124 million georeferenced occurrences from taxa including plants, butterflies and mammals, to predict their global distributions at both genus and species levels. EcoVAE achieves high precision and speed, captures underlying distribution patterns through unsupervised learning, and reveals interspecies interactions viain silicoperturbation analyses. Additionally, it evaluates global sampling efforts and interpolates distributions without relying on environmental variables, offering new applications for biodiversity exploration and monitoring. 
    more » « less
  2. Abstract Understanding species distributions is a global priority for mitigating environmental pressures from human activities. Ample studies have identified key environmental (climate and habitat) predictors and the spatial scales at which they influence species distributions. However, regarding human influence, such understandings are largely lacking. Here, to advance knowledge concerning human influence on species distributions, we systematically reviewed species distribution modelling (SDM) articles and assessed current modelling efforts. We searched 12,854 articles and found only 1,429 articles using human predictors within SDMs. Collectively, these studies of >58,000 species used 2,307 unique human predictors, suggesting that in contrast to environmental predictors, there is no ‘rule of thumb’ for human predictor selection in SDMs. The number of human predictors used across studies also varied (usually one to four per study). Moreover, nearly half the articles projecting to future climates held human predictors constant over time, risking false optimism about the effects of human activities compared with climate change. Advances in using human predictors in SDMs are paramount for accurately informing and advancing policy, conservation, management and ecology. We show considerable gaps in including human predictors to understand current and future species distributions in the Anthropocene, opening opportunities for new inquiries. We pose 15 questions to advance ecological theory, methods and real-world applications. 
    more » « less
  3. Abstract AimSpecies distribution models (SDMs) are increasingly applied across macroscales using detection‐nondetection data. These models typically assume that a single set of regression coefficients can adequately describe species–environment relationships and/or population trends. However, such relationships often show nonlinear and/or spatially varying patterns that arise from complex interactions with abiotic and biotic processes that operate at different scales. Spatially varying coefficient (SVC) models can readily account for variability in the effects of environmental covariates. Yet, their use in ecology is relatively scarce due to gaps in understanding the inferential benefits that SVC models can provide compared to simpler frameworks. InnovationHere we demonstrate the inferential benefits of SVC SDMs, with a particular focus on how this approach can be used to generate and test ecological hypotheses regarding the drivers of spatial variability in population trends and species–environment relationships. We illustrate the inferential benefits of SVC SDMs with simulations and two case studies: one that assesses spatially varying trends of 51 forest bird species in the eastern United States over two decades and a second that evaluates spatial variability in the effects of five decades of land cover change on grasshopper sparrow (Ammodramus savannarum) occurrence across the continental United States. Main conclusionsWe found strong support for SVC SDMs compared to simpler alternatives in both empirical case studies. Factors operating at fine spatial scales, accounted for by the SVCs, were the primary divers of spatial variability in forest bird occurrence trends. Additionally, SVCs revealed complex species–habitat relationships with grassland and cropland area for grasshopper sparrow, providing nuanced insights into how future land use change may shape its distribution. These applications display the utility of SVC SDMs to help reveal the environmental factors that drive species distributions across both local and broad scales. We conclude by discussing the potential applications of SVC SDMs in ecology and conservation. 
    more » « less
  4. Abstract The impacts of climate change have re‐energized interest in understanding the role of climate in setting species geographic range edges. Despite the strong focus on species' distributions in ecology and evolution, defining a species range edge is theoretically and empirically difficult. The challenge of determining a range edge and its relationship to climate is in part driven by the nested nature of geography and the multidimensionality of climate, which together generate complex patterns of both climate and biotic distributions across landscapes. Because range‐limiting processes occur in both geographic and climate space, the relationship between these two spaces plays a critical role in setting range limits. With both conceptual and empirical support, we argue that three factors—climate heterogeneity, collinearity among climate variables, and spatial scale—interact to shape the spatial structure of range edges along climate gradients, and we discuss several ways that these factors influence the stability of species range edges with a changing climate. We demonstrate that geographic and climate edges are often not concordant across species ranges. Furthermore, high climate heterogeneity and low climate collinearity across landscapes increase the spectrum of possible relationships between geographic and climatic space, suggesting that geographic range edges and climatic niche limits correspond less frequently than we may expect. More empirical explorations of how the complexity of real landscapes shapes the ecological and evolutionary processes that determine species range edges will advance the development of range limit theory and its applications to biodiversity conservation in the context of changing climate. 
    more » « less
  5. Species distribution and ecological niche models (hereafter SDMs) are popular tools with broad applications in ecology, biodiversity conservation, and environmental science. Many SDM applications require projecting models in environmental conditions non‐analog to those used for model training (extrapolation), giving predictions that may be statistically unsupported and biologically meaningless. We introduce a novel method, Shape, a model‐agnostic approach that calculates the extrapolation degree for a given projection data point by its multivariate distance to the nearest training data point. Such distances are relativized by a factor that reflects the dispersion of the training data in environmental space. Distinct from other approaches, Shape incorporates an adjustable threshold to control the binary discrimination between acceptable and unacceptable extrapolation degrees. We compared Shape's performance to five extrapolation metrics based on their ability to detect analog environmental conditions in environmental space and improve SDMs suitability predictions. To do so, we used 760 virtual species to define different modeling conditions determined by species niche tolerance, distribution equilibrium condition, sample size, and algorithm. All algorithms had trouble predicting species niches. However, we found a substantial improvement in model predictions when model projections were truncated independently of extrapolation metrics. Shape's performance was dependent on extrapolation threshold used to truncate models. Because of this versatility, our approach showed similar or better performance than the previous approaches and could better deal with all modeling conditions and algorithms. Our extrapolation metric is simple to interpret, captures the complex shapes of the data in environmental space, and can use any extrapolation threshold to define whether model predictions are retained based on the extrapolation degrees. These properties make this approach more broadly applicable than existing methods for creating and applying SDMs. We hope this method and accompanying tools support modelers to explore, detect, and reduce extrapolation errors to achieve more reliable models. Keywords: environmental novelty, extrapolation, Mahalanobis distance, model prediction, non‐analog environmental data, transferability 
    more » « less