skip to main content


Title: A Test of Species Distribution Model Transferability Across Environmental and Geographic Space for 108 Western North American Tree Species
Predictions from species distribution models (SDMs) are commonly used in support of environmental decision-making to explore potential impacts of climate change on biodiversity. However, because future climates are likely to differ from current climates, there has been ongoing interest in understanding the ability of SDMs to predict species responses under novel conditions (i.e., model transferability). Here, we explore the spatial and environmental limits to extrapolation in SDMs using forest inventory data from 11 model algorithms for 108 tree species across the western United States. Algorithms performed well in predicting occurrence for plots that occurred in the same geographic region in which they were fitted. However, a substantial portion of models performed worse than random when predicting for geographic regions in which algorithms were not fitted. Our results suggest that for transfers in geographic space, no specific algorithm was better than another as there were no significant differences in predictive performance across algorithms. There were significant differences in predictive performance for algorithms transferred in environmental space with GAM performing best. However, the predictive performance of GAM declined steeply with increasing extrapolation in environmental space relative to other algorithms. The results of this study suggest that SDMs may be limited in their ability to predict species ranges beyond the environmental data used for model fitting. When predicting climate-driven range shifts, extrapolation may also not reflect important biotic and abiotic drivers of species ranges, and thus further misrepresent the realized shift in range. Future studies investigating transferability of process based SDMs or relationships between geodiversity and biodiversity may hold promise.  more » « less
Award ID(s):
1934790
NSF-PAR ID:
10294230
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Ecology and Evolution
Volume:
9
ISSN:
2296-701X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Understanding the ranges of rare and endangered species is central to conserving biodiversity in the Anthropocene. Species distribution models (SDMs) have become a common and powerful tool for analyzing species–environment relationships across geographic space. Although evaluating the distribution of rare species is integral to their conservation, this can be difficult when limited distribution data are available. Community science platforms, such as iNaturalist, have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than those of natural history collections, they may have potential for improving SDMs for species with few occurrence records from collections. Here, we investigate the utility of iNaturalist data for developing SDMs for a rare high‐elevation plant,Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. The inclusion of iNaturalist data doubled the number of usable records forT. jamesii.We found that a random forest (RF) model using ensemble training data performed the highest of any model (area under curve = 0.98). We then compared the performance of RF models that use only natural history training data and those that use a combination of natural history (herbarium specimens) and iNaturalist training data. All models heavily relied on climate data (mean temperature of driest quarter, and precipitation of the warmest quarter), indicating that this species is under threat as climate continues to change. Validation datasets affected model fits as well. Models using only herbarium data performed slightly poorer when evaluated with cross‐validation than when validated externally with iNaturalist data. This study can serve as a model for future SDM studies of species with similar data limitations.

     
    more » « less
  2. Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection. 
    more » « less
  3. Species distribution models (SDMs), which relate recorded observations (presences) and absences or background points to environmental characteristics, are powerful tools used to generate hypotheses about the biogeography, ecology, and conservation of species. Although many researchers have examined the effects of presence and background point distributions on model outputs, they have not systematically evaluated the effects of various methods of background point sampling on the performance of a single model algorithm across many species. Therefore, a consensus on the preferred methods of background point sampling is lacking. Here, we conducted presence-background SDMs for 20 vertebrate species in North America under a variety of background point conditions, varying the number of background points used, the size of the buffer used to constrain the background points around the occurrences, and the percentage of background points sampled within the buffer (“spatial weighting”). We evaluated the accuracy and transferability of the models using Boyce index, overlap with expert-generated range maps, and area overpredicted and underpredicted by the SDM (and AUC for comparability with other studies). SDM performance is highly dependent on the species modelled but is affected by the number and spread of background points. Models with little spatial weighting had high accuracy (overlap values), but extreme extrapolation errors and overprediction. In contrast, SDMs with high transferability (high Boyce index values and low overprediction) had moderate-to-high spatial weighting. These results emphasize the importance of both background points and evaluation metric selection in SDMs. For other, more successful metrics, using many background points with spatial weighting may be preferred for models with large extents. These results can assist researchers in selecting the background point parameters most relevant for their research question, allowing them to fine-tune their hypotheses on the distribution of species through space and time. 
    more » « less
  4. Abstract

    Spatial biases are an intrinsic feature of occurrence data used in species distribution models (SDMs). Thinning species occurrences, where records close in the geographic or environmental space are removed from the modeling procedure, is an approach often used to address these biases. However, thinning occurrence data can also negatively affect SDM performance, given that the benefits of removing spatial biases might be outweighed by the detrimental effects of data loss caused by this approach. We used real and virtual species to evaluate how spatial and environmental thinning affected different performance metrics of four SDM methods. The occurrence data of virtual species were sampled randomly, evenly spaced, and clustered in the geographic space to simulate different types of spatial biases, and several spatial and environmental thinning distances were used to thin the occurrence data. Null datasets were also generated for each thinning distance where we randomly removed the same number of occurrences by a thinning distance and compared the results of the thinned and null datasets. We found that spatially or environmentally thinned occurrence data is no better than randomly removing them, given that thinned datasets performed similarly to null datasets. Specifically, spatial and environmental thinning led to a general decrease in model performances across all SDM methods. These results were observed for real and virtual species, were positively associated with thinning distance, and were consistent across the different types of spatial biases. Our results suggest that thinning occurrence data usually fails to improve SDM performance and that the use of thinning approaches when modeling species distributions should be considered carefully.

     
    more » « less
  5. Abstract Aim

    Species distribution models (SDMs) are widely used to make predictions on how species distributions may change as a response to climatic change. To assess the reliability of those predictions, they need to be critically validated with respect to what they are used for. While ecologists are typically interested in how and where distributions will change, we argue that SDMs have seldom been evaluated in terms of their capacity to predict such change. Instead, typical retrospective validation methods estimate model's ability to predict to only one static time in future. Here, we apply two validation methods, one that predicts and evaluates a static pattern, while the other measures change and compare their estimates of predictive performance.

    Location

    Fennoscandia.

    Methods

    We applied a joint SDM to model the distributions of 120 bird species in four model validation settings. We trained models with a dataset from 1975 to 1999 and predicted species' future occurrence and abundance in two ways: for one static time period (2013–2016, ‘static validation’) and for a change between two time periods (difference between 1996–1999 and 2013–2016, ‘change validation’). We then measured predictive performance using correlation between predicted and observed values. We also related predictive performance to species traits.

    Results

    Even though static validation method evaluated predictive performance as good, change method indicated very poor performance. Predictive performance was not strongly related to any trait.

    Main Conclusions

    Static validation method might overestimate predictive performance by not revealing the model's inability to predict change events. If species' distributions remain mostly stable, then even an unfit model can predict the near future well due to temporal autocorrelation. We urge caution when working with forecasts of changes in spatial patterns of species occupancy or abundance, even for SDMs that are based on time series datasets unless they are critically validated for forecasting such change.

     
    more » « less