Abstract 1. Species distribution models (SDMs) are crucial tools for understanding and predicting biodiversity patterns, yet they often struggle with limited data, biased sampling, and complex species-environment relationships. Here I present NicheFlow, a novel foundation model for SDMs that leverages generative AI to address these challenges and advance our ability to model and predict species distributions across taxa and environments. 2. NicheFlow employs a two-stage generative approach, combining species embeddings with two chained generative models, one to generate a distribution in environmental space, and a second to generate a distribution in geographic space. This architecture allows for the sharing of information across species and captures complex, non-linear relationships in environmental space. I trained NicheFlow on a comprehensive dataset of reptile distributions and evaluated its performance using both standard SDM metrics and zero-shot prediction tasks. 3. NicheFlow demonstrates good predictive performance, particularly for rare and data-deficient species. The model successfully generated plausible distributions for species not seen during training, showcasing its potential for zero-shot prediction. The learned species embeddings captured meaningful ecological information, revealing patterns in niche structure across taxa, latitude and range sizes. 4. As a proof-of-principle foundation model, NicheFlow represents a significant advance in species distribution modeling, offering a powerful tool for addressing pressing questions in ecology, evolution, and conservation biology. Its ability to model joint species distributions and generate hypothetical niches opens new avenues for exploring ecological and evolutionary questions, including ancestral niche reconstruction and community assembly processes. This approach has the potential to transform our understanding of biodiversity patterns and improve our capacity to predict and manage species distributions in the face of global change.
more »
« less
This content will become publicly available on December 16, 2025
A generative deep learning approach for global species distribution prediction
Abstract Anthropogenic pressures on biodiversity necessitate efficient and highly scalable methods to predict global species distributions. Current species distribution models (SDMs) face limitations with large-scale datasets, complex interspecies interactions, and data quality. Here, we introduce EcoVAE, a framework of autoencoder-based generative models trained separately on nearly 124 million georeferenced occurrences from taxa including plants, butterflies and mammals, to predict their global distributions at both genus and species levels. EcoVAE achieves high precision and speed, captures underlying distribution patterns through unsupervised learning, and reveals interspecies interactions viain silicoperturbation analyses. Additionally, it evaluates global sampling efforts and interpolates distributions without relying on environmental variables, offering new applications for biodiversity exploration and monitoring.
more »
« less
- Award ID(s):
- 2101884
- PAR ID:
- 10643712
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Species distribution modelling (SDM), also called environmental or ecological niche modelling, has developed over the last 30 years as a widely used tool used in core areas of biogeography including historical biogeography, studies of diversity patterns, studies of species ranges, ecoregional classification, conservation assessment and projecting future global change impacts. In the 50th anniversary year ofJournal of Biogeography, I reflect on developments in species distribution modelling, illustrate how embedded the methodology has become in all areas of biogeography and speculate on future directions in the field. Challenges to species distribution modelling raised in this journal in 2006 have been addressed to a significant degree. Those challenges are clarification of the niche concept; improved sample design for species occurrence data; model parameterization; predictor selection; assessing model performance and transferability; and integrating correlative and process models of species distributions. SDM is used, often in conjunction with other evidence, to understand past species range dynamics, identify patterns and drivers of biological diversity, identify drivers of species range limits, define and delineate ecoregions, estimate the distributions of biodiversity elements in relation to protected status and to prioritize conservation action, and to forecast species range shifts in response to climate change and other global change scenarios. Areas of progress in SDM that may become more widely accessible and useful tools in biogeography include genetically informed models and community distribution models.more » « less
-
Vascular plants are diverse and a major component of terrestrial ecosystems, yet their geographic distributions remain incomplete. Here, I present a global database of vascular plant distributions by integrating species distribution models calibrated to species’ dispersal ability and natural habitats to predict native range maps for 201,681 vascular plant species into unsurveyed areas. Using these maps, I uncover unique patterns of native vascular plant diversity, endemism, and phylogenetic diversity revealing hotspots in underdocumented biodiversity-rich regions. These hotspots, based on detailed species-level maps, show a pronounced latitudinal gradient, strongly supporting the theory of increasing diversity toward the equator. I trained random forest models to extrapolate diversity patterns under unbiased global sampling and identify overlaps with modeled estimations but unveiled cryptic hotspots that were not captured by modeled estimations. Only 29% to 36% of extrapolated plant hotspots are inside protected areas, leaving more than 60% outside and vulnerable. However, the unprotected hotspots harbor species with unique attributes that make them good candidates for conservation prioritization.more » « less
-
Abstract AimAccounting for biotic interactions in species distribution models is complicated by the fact that interactions occur at the individual‐level at unknown spatial scales. Standard approaches that ignore individual‐level interactions and focus on aggregate scales are subject to the modifiable aerial unit problem (MAUP) in which incorrect inferences may arise about the sign and magnitude of interspecific effects. LocationGlobal (simulation) and North Carolina, United States (case study). TaxonNone (simulation) and Aves (case study). MethodsWe present a hierarchical species distribution model that includes a Markov point process in which the locations of individuals of one species are modelled as a function of both abiotic variables and the locations of individuals of another species. We applied the model to spatial capture‐recapture (SCR) data on two ecologically similar songbird species—hooded warbler (Setophaga citrina) and black‐throated blue warbler (Setophaga caerulescens)—that segregate over a climate gradient in the southern Appalachian Mountains, USA. ResultsA simulation study indicated that the model can identify the effects of environmental variation and biotic interactions on co‐occurring species distributions. In the case study, there were strong and opposing effects of climate on spatial variation in population densities, but spatial competition did not influence the two species' distributions. Main ConclusionsUnlike existing species distribution models, the framework proposed here overcomes the MAUP and can be used to investigate how population‐level patterns emerge from individual‐level processes, while also allowing for inference on the spatial scale of biotic interactions. Our finding of minimal spatial competition between black‐throated blue warbler and hooded warbler adds to the growing body of literature suggesting that abiotic factors may be more important than competition at low‐latitude range margins. The model can be extended to accommodate count data and binary data in addition to SCR data.more » « less
-
Enrico Pirotta (Ed.)Abstract AimUnderstanding the distribution of marine organisms is essential for effective management of highly mobile marine predators that face a variety of anthropogenic threats. Recent work has largely focused on modelling the distribution and abundance of marine mammals in relation to a suite of environmental variables. However, biotic interactions can largely drive distributions of these predators. We aim to identify how biotic and abiotic variables influence the distribution and abundance of a particular marine predator, the bottlenose dolphin (Tursiops truncatus), using multiple modelling approaches and conducting an extensive literature review. LocationWestern North Atlantic continental shelf. MethodsWe combined widespread marine mammal and fish and invertebrate surveys in an ensemble modelling approach to assess the relative importance and capacity of the environment and other marine species to predict the distribution of both coastal and offshore bottlenose dolphin ecotypes. We corroborate the modelled results with a systematic literature review on the prey of dolphins throughout the region to help explain patterns driven by prey availability, as well as reveal new ones that may not necessarily be a predator–prey relationship. ResultsWe find that coastal bottlenose dolphin distributions are associated with one family of fishes, the Sciaenidae, or drum family, and predictions slightly improve when using only fish versus only environmental variables. The literature review suggests that this tight coupling is likely a predator–prey relationship. Comparatively, offshore dolphin distributions are more strongly related to environmental variables, and predictions are better for environmental‐only models. As revealed by the literature review, this may be due to a mismatch between the animals caught in the fish and invertebrate surveys and the predominant prey of offshore dolphins, notably squid. Main ConclusionsIncorporating prey species into distribution models, especially for coastal bottlenose dolphins, can help inform ecological relationships and predict marine predator distributions.more » « less
An official website of the United States government
