skip to main content


Title: Integrated species distribution models to account for sampling biases and improve range‐wide occurrence predictions
Abstract Aim

Species distribution models (SDMs) that integrate presence‐only and presence–absence data offer a promising avenue to improve information on species' geographic distributions. The use of such ‘integrated SDMs’ on a species range‐wide extent has been constrained by the often limited presence–absence data and by the heterogeneous sampling of the presence‐only data. Here, we evaluate integrated SDMs for studying species ranges with a novel expert range map‐based evaluation. We build new understanding about how integrated SDMs address issues of estimation accuracy and data deficiency and thereby offer advantages over traditional SDMs.

Location

South and Central America.

Time Period

1979–2017.

Major Taxa Studied

Hummingbirds.

Methods

We build integrated SDMs by linking two observation models – one for each data type – to the same underlying spatial process. We validate SDMs with two schemes: (i) cross‐validation with presence–absence data and (ii) comparison with respect to the species' whole range as defined with IUCN range maps. We also compare models relative to the estimated response curves and compute the association between the benefit of the data integration and the number of presence records in each data set.

Results

The integrated SDM accounting for the spatially varying sampling intensity of the presence‐only data was one of the top performing models in both model validation schemes. Presence‐only data alleviated overly large niche estimates, and data integration was beneficial compared to modelling solely presence‐only data for species which had few presence points when predicting the species' whole range. On the community level, integrated models improved the species richness prediction.

Main Conclusions

Integrated SDMs combining presence‐only and presence–absence data are successfully able to borrow strengths from both data types and offer improved predictions of species' ranges. Integrated SDMs can potentially alleviate the impacts of taxonomically and geographically uneven sampling and to leverage the detailed sampling information in presence–absence data.

 
more » « less
NSF-PAR ID:
10477928
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Global Ecology and Biogeography
Volume:
33
Issue:
3
ISSN:
1466-822X
Format(s):
Medium: X Size: p. 356-370
Size(s):
p. 356-370
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Aim

    Species distribution models (SDMs) are ubiquitous in ecology to predict species occurrence throughout their range. Typically, SDMs are created using presence‐only or presence–absence data. We hypothesize that the continuous metric of temporal occupancy, the proportion of time a species is observed at a given site, provides more detail about species occurrence than binary presence‐based SDMs.

    Location

    North America.

    Methods

    We compared SDMs for 189 focal species using four modelling methods to determine whether North American avian species distributions are better predicted using temporal occupancy over presence–absence. We used the North American Breeding Bird Survey and built SDMs based on all sites sampled consecutively between 2001 and 2015, as well as on a subset of only five time points within the 15‐year sampling window. Each model used the same environmental inputs to predict species range. Each SDM was cross‐validated temporally and spatially.

    Results

    Species distributions were generally better predicted using temporal occupancy rather than presence–absence when using either a five‐year or fifteen‐year sampling window. Species that occurred in a smaller proportion of their predicted range were particularly better predicted with SDMs using temporal occupancy. Temporal occupancy SDMs had lower false discovery and false‐positive rates but higher false‐negative rates than presence–absence models.

    Main conclusions

    Temporal occupancy is a valuable metric that can improve predictions of species occurrence for birds and may improve conservation planning and design efforts.

     
    more » « less
  2. Introduction

    Forecasting range shifts in response to climate change requires accurate species distribution models (SDMs), particularly at the margins of species' ranges. However, most studies producing SDMs rely on sparse species occurrence datasets from herbarium records and public databases, along with random pseudoabsences. While environmental covariates used to fit SDMS are increasingly precise due to satellite data, the availability of species occurrence records is still a large source of bias in model predictions. We developed distribution models for hybridizing sister species of western and eastern Joshua trees (Yucca brevifoliaandY. jaegeriana, respectively), iconic Mojave Desert species that are threatened by climate change and habitat loss.

    Methods

    We conducted an intensive visual grid search of online satellite imagery for 672,043 0.25 km2grid cells to identify the two species' presences and absences on the landscape with exceptional resolution, and field validated 29,050 cells in 15,001 km of driving. We used the resulting presence/absence data to train SDMs for each Joshua tree species, revealing the contemporary environmental gradients (during the past 40 years) with greatest influence on the current distribution of adult trees.

    Results

    While the environments occupied byY. brevifoliaandY. jaegerianawere similar in total aridity, they differed with respect to seasonal precipitation and temperature ranges, suggesting the two species may have differing responses to climate change. Moreover, the species showed differing potential to occupy each other's geographic ranges: modeled potential habitat forY. jaegerianaextends throughout the range ofY. brevifolia, while potential habitat forY. brevifoliais not well represented within the range ofY. jaegeriana.

    Discussion

    By reproducing the current range of the Joshua trees with high fidelity, our dataset can serve as a baseline for future research, monitoring, and management of this species, including an increased understanding of dynamics at the trailing and leading margins of the species' ranges and potential for climate refugia.

     
    more » « less
  3. Abstract Aim

    Species distribution models (SDMs) are widely used to make predictions on how species distributions may change as a response to climatic change. To assess the reliability of those predictions, they need to be critically validated with respect to what they are used for. While ecologists are typically interested in how and where distributions will change, we argue that SDMs have seldom been evaluated in terms of their capacity to predict such change. Instead, typical retrospective validation methods estimate model's ability to predict to only one static time in future. Here, we apply two validation methods, one that predicts and evaluates a static pattern, while the other measures change and compare their estimates of predictive performance.

    Location

    Fennoscandia.

    Methods

    We applied a joint SDM to model the distributions of 120 bird species in four model validation settings. We trained models with a dataset from 1975 to 1999 and predicted species' future occurrence and abundance in two ways: for one static time period (2013–2016, ‘static validation’) and for a change between two time periods (difference between 1996–1999 and 2013–2016, ‘change validation’). We then measured predictive performance using correlation between predicted and observed values. We also related predictive performance to species traits.

    Results

    Even though static validation method evaluated predictive performance as good, change method indicated very poor performance. Predictive performance was not strongly related to any trait.

    Main Conclusions

    Static validation method might overestimate predictive performance by not revealing the model's inability to predict change events. If species' distributions remain mostly stable, then even an unfit model can predict the near future well due to temporal autocorrelation. We urge caution when working with forecasts of changes in spatial patterns of species occupancy or abundance, even for SDMs that are based on time series datasets unless they are critically validated for forecasting such change.

     
    more » « less
  4. Abstract

    Spatial biases are a common feature of presence–absence data from citizen scientists. Spatial thinning can mitigate errors in species distribution models (SDMs) that use these data. When detections or non‐detections are rare, however, SDMs may suffer from class imbalance or low sample size of the minority (i.e. rarer) class. Poor predictions can result, the severity of which may vary by modelling technique.

    To explore the consequences of spatial bias and class imbalance in presence–absence data, we used eBird citizen science data for 102 bird species from the northeastern USA to compare spatial thinning, class balancing and majority‐only thinning (i.e. retaining all samples of the minority class). We created SDMs using two parametric or semi‐parametric techniques (generalized linear models and generalized additive models) and two machine learning techniques (random forest and boosted regression trees). We tested the predictive abilities of these SDMs using an independent and systematically collected reference dataset with a combination of discrimination (area under the receiver operator characteristic curve; true skill statistic; area under the precision‐recall curve) and calibration (Brier score; Cohen's kappa) metrics.

    We found large variation in SDM performance depending on thinning and balancing decisions. Across all species, there was no single best approach, with the optimal choice of thinning and/or balancing depending on modelling technique, performance metric and the baseline sample prevalence of species in the data. Spatially thinning all the data was often a poor approach, especially for species with baseline sample prevalence <0.1. For most of these rare species, balancing classes improved model discrimination between presence and absence classes using machine learning techniques, but typically hindered model calibration.

    Baseline sample prevalence, sample size, modelling approach and the intended application of SDM output—whether discrimination or calibration—should guide decisions about how to thin or balance data, given the considerable influence of these methodological choices on SDM performance. For prognostic applications requiring good model calibration (vis‐à‐vis discrimination), the match between sample prevalence and true species prevalence may be the overriding feature and warrants further investigation.

     
    more » « less
  5. Abstract Aim

    Parasites are a major component of global ecosystems, yet spatial variation in parasite diversity is poorly known, largely because their occurrence data are limited and thus difficult to interpret. Using a recently compiled database of parasite occurrences, we compare different models which we use to infer parasite geographic ranges and parasite species richness across the globe.

    Innovation

    To date, most studies exploring spatial patterns of parasite diversity assumed, with little validation, that the geographic range of a parasite species can be represented by the collective geographic range of its host species. Our study compares this assumption with a suite of other methods to infer parasite distribution from parasite occurrence data (e.g., based on data density, ecoregions and climatic conditions). We highlight diversity hotspots identified by the various methods and compare the effects of sampling intensities in different regions, a crucial factor determining observed parasite diversity.

    Main conclusions

    The type of model used to infer parasite distributions affects estimates of both total species richness and spatial patterns of hotspots of parasite richness. Overall, the models based on reported occurrences share similar areas of high parasite richness that tend to be biased towards areas of high sampling effort. In contrast, the model based on host distributions showed hotspots of parasite diversity that are biased towards areas of high host species richness. Accounting for sampling effort could only help to reconcile the outcome from the different models in some regions. Further, the non‐saturated species accumulation curves even for the best studied regions of the world such as Europe and North America serve as a call for further sampling effort and development of effective analytic tools that can provide robust accounts of global parasite diversity.

     
    more » « less