skip to main content


This content will become publicly available on February 1, 2025

Title: The influence of the number and distribution of background points in presence-background species distribution models
Species distribution models (SDMs), which relate recorded observations (presences) and absences or background points to environmental characteristics, are powerful tools used to generate hypotheses about the biogeography, ecology, and conservation of species. Although many researchers have examined the effects of presence and background point distributions on model outputs, they have not systematically evaluated the effects of various methods of background point sampling on the performance of a single model algorithm across many species. Therefore, a consensus on the preferred methods of background point sampling is lacking. Here, we conducted presence-background SDMs for 20 vertebrate species in North America under a variety of background point conditions, varying the number of background points used, the size of the buffer used to constrain the background points around the occurrences, and the percentage of background points sampled within the buffer (“spatial weighting”). We evaluated the accuracy and transferability of the models using Boyce index, overlap with expert-generated range maps, and area overpredicted and underpredicted by the SDM (and AUC for comparability with other studies). SDM performance is highly dependent on the species modelled but is affected by the number and spread of background points. Models with little spatial weighting had high accuracy (overlap values), but extreme extrapolation errors and overprediction. In contrast, SDMs with high transferability (high Boyce index values and low overprediction) had moderate-to-high spatial weighting. These results emphasize the importance of both background points and evaluation metric selection in SDMs. For other, more successful metrics, using many background points with spatial weighting may be preferred for models with large extents. These results can assist researchers in selecting the background point parameters most relevant for their research question, allowing them to fine-tune their hypotheses on the distribution of species through space and time.  more » « less
Award ID(s):
1945013
NSF-PAR ID:
10488887
Author(s) / Creator(s):
; ;
Publisher / Repository:
Elsevier B.V.
Date Published:
Journal Name:
Ecological Modelling
Volume:
488
Issue:
C
ISSN:
0304-3800
Page Range / eLocation ID:
110604
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The area under the curve (AUC) of the receiving‐operating characteristic (or certain modifications of it) is almost universally used to assess the performance of species distribution models (SDMs), despite the well‐recognized problems encountered with this approach, mainly present when dealing with presence‐only data.

    We present a probabilistic treatment of the presence‐only problem and derive a method to assess the performance of SDMs based on the analysis of an area‐presence plot and the SDM outputs represented in both geographic and environmental spaces.

    We show how our method is useful to solve the two main tasks for which the AUC is used: assessing the performance of an SDM and comparing the performance of different SDMs. Our results build on previous work and constitute a rigorous method for assessing the performance of SDMs in relation to a random classifier.

    We establish comparisons with two of the most popular approaches used to assess the performance of an SDM, the AUC and the Boyce index, and identified cases in which our method has advantages over these two approaches.

    We suggest that the performance of an algorithm that classifies presence‐only data can be assessed by two factors: (a) the degree of non‐randomness of the classification at every step in the accumulation curve of presences, and (b) the amount of uninformative niche space used for the classification. The method we developed can be applied to any SDM output by using the R functions available at:https://github.com/LauraJim/SDM‐hyperTest.

     
    more » « less
  2. Abstract Aim

    Species distribution models (SDMs) that integrate presence‐only and presence–absence data offer a promising avenue to improve information on species' geographic distributions. The use of such ‘integrated SDMs’ on a species range‐wide extent has been constrained by the often limited presence–absence data and by the heterogeneous sampling of the presence‐only data. Here, we evaluate integrated SDMs for studying species ranges with a novel expert range map‐based evaluation. We build new understanding about how integrated SDMs address issues of estimation accuracy and data deficiency and thereby offer advantages over traditional SDMs.

    Location

    South and Central America.

    Time Period

    1979–2017.

    Major Taxa Studied

    Hummingbirds.

    Methods

    We build integrated SDMs by linking two observation models – one for each data type – to the same underlying spatial process. We validate SDMs with two schemes: (i) cross‐validation with presence–absence data and (ii) comparison with respect to the species' whole range as defined with IUCN range maps. We also compare models relative to the estimated response curves and compute the association between the benefit of the data integration and the number of presence records in each data set.

    Results

    The integrated SDM accounting for the spatially varying sampling intensity of the presence‐only data was one of the top performing models in both model validation schemes. Presence‐only data alleviated overly large niche estimates, and data integration was beneficial compared to modelling solely presence‐only data for species which had few presence points when predicting the species' whole range. On the community level, integrated models improved the species richness prediction.

    Main Conclusions

    Integrated SDMs combining presence‐only and presence–absence data are successfully able to borrow strengths from both data types and offer improved predictions of species' ranges. Integrated SDMs can potentially alleviate the impacts of taxonomically and geographically uneven sampling and to leverage the detailed sampling information in presence–absence data.

     
    more » « less
  3. Abstract

    Species distribution models (SDMs) that rely on regional‐scale environmental variables will play a key role in forecasting species occurrence in the face of climate change. However, in the Anthropocene, a number of local‐scale anthropogenic variables, including wildfire history, land‐use change, invasive species, and ecological restoration practices can override regional‐scale variables to drive patterns of species distribution. Incorporating these human‐induced factors into SDMs remains a major research challenge, in part because spatial variability in these factors occurs at fine scales, rendering prediction over regional extents problematic. Here, we used big sagebrush (Artemisia tridentataNutt.) as a model species to explore whether including human‐induced factors improves the fit of the SDM. We applied a Bayesian hurdle spatial approach using 21,753 data points of field‐sampled vegetation obtained from the LANDFIRE program to model sagebrush occurrence and cover by incorporating fire history metrics and restoration treatments from 1980 to 2015 throughout the Great Basin of North America. Models including fire attributes and restoration treatments performed better than those including only climate and topographic variables. Number of fires and fire occurrence had the strongest relative effects on big sagebrush occurrence and cover, respectively. The models predicted that the probability of big sagebrush occurrence decreases by 1.2% (95% CI: −6.9%, 0.6%) when one fire occurs and cover decreases by 44.7% (95% CI: −47.9%, −41.3%) if at least one fire occurred over the 36 year period of record. Restoration practices increased the probability of big sagebrush occurrence but had minimal effect on cover. Our results demonstrate the potential value of including disturbance and land management along with climate in models to predict species distributions. As an increasing number of datasets representing land‐use history become available, we anticipate that our modeling framework will have broad relevance across a range of biomes and species.

     
    more » « less
  4. Abstract Aim

    Species distribution models (SDMs) are widely used to make predictions on how species distributions may change as a response to climatic change. To assess the reliability of those predictions, they need to be critically validated with respect to what they are used for. While ecologists are typically interested in how and where distributions will change, we argue that SDMs have seldom been evaluated in terms of their capacity to predict such change. Instead, typical retrospective validation methods estimate model's ability to predict to only one static time in future. Here, we apply two validation methods, one that predicts and evaluates a static pattern, while the other measures change and compare their estimates of predictive performance.

    Location

    Fennoscandia.

    Methods

    We applied a joint SDM to model the distributions of 120 bird species in four model validation settings. We trained models with a dataset from 1975 to 1999 and predicted species' future occurrence and abundance in two ways: for one static time period (2013–2016, ‘static validation’) and for a change between two time periods (difference between 1996–1999 and 2013–2016, ‘change validation’). We then measured predictive performance using correlation between predicted and observed values. We also related predictive performance to species traits.

    Results

    Even though static validation method evaluated predictive performance as good, change method indicated very poor performance. Predictive performance was not strongly related to any trait.

    Main Conclusions

    Static validation method might overestimate predictive performance by not revealing the model's inability to predict change events. If species' distributions remain mostly stable, then even an unfit model can predict the near future well due to temporal autocorrelation. We urge caution when working with forecasts of changes in spatial patterns of species occupancy or abundance, even for SDMs that are based on time series datasets unless they are critically validated for forecasting such change.

     
    more » « less
  5. Abstract

    As geographic range estimates for the IUCN Red List guide conservation actions, accuracy and ecological realism are crucial. IUCN’s extent of occurrence (EOO) is the general region including the species’ range, while area of occupancy (AOO) is the subset of EOO occupied by the species. Data‐poor species with incomplete sampling present particular difficulties, but species distribution models (SDMs) can be used to predict suitable areas. Nevertheless, SDMs typically employ abiotic variables (i.e., climate) and do not explicitly account for biotic interactions that can impose range constraints. We sought to improve range estimates for data‐poor, parapatric species by masking out areas under inferred competitive exclusion. We did so for two South American spiny pocket mice:Heteromys australis(Least Concern) andHeteromys teleus(Vulnerable due to especially poor sampling), whose ranges appear restricted by competition. For both species, we estimated EOO using SDMs and AOO with four approaches: occupied grid cells, abiotic SDM prediction, and this prediction masked by approximations of the areas occupied by each species’ congener. We made the masks using support vector machines (SVMs) fit with two data types: occurrence coordinates alone; and coordinates along with SDM predictions of suitability. Given the uncertainty in calculating AOO for low‐data species, we made estimates for the lower and upper bounds for AOO, but only make recommendations forH. teleusas its full known range was considered. The SVM approaches (especially the second one) had lower classification error and made more ecologically realistic delineations of the contact zone. ForH. teleus, the lower AOO bound (a strongly biased underestimate) corresponded to Endangered (occupied grid cells), while the upper bounds (other approaches) led to Near Threatened. As we currently lack data to determine the species’ true occupancy within the post‐processed SDM prediction, we recommend that an updated listing forH. teleusinclude these bounds for AOO. This study advances methods for estimating the upper bound of AOO and highlights the need for better ways to produce unbiased estimates of lower bounds. More generally, the SVM approaches for post‐processing SDM predictions hold promise for improving range estimates for other uses in biogeography and conservation.

     
    more » « less