skip to main content


Title: Does adding community science observations to museum records improve distribution modeling of a rare endemic plant?
Abstract

Understanding the ranges of rare and endangered species is central to conserving biodiversity in the Anthropocene. Species distribution models (SDMs) have become a common and powerful tool for analyzing species–environment relationships across geographic space. Although evaluating the distribution of rare species is integral to their conservation, this can be difficult when limited distribution data are available. Community science platforms, such as iNaturalist, have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than those of natural history collections, they may have potential for improving SDMs for species with few occurrence records from collections. Here, we investigate the utility of iNaturalist data for developing SDMs for a rare high‐elevation plant,Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. The inclusion of iNaturalist data doubled the number of usable records forT. jamesii.We found that a random forest (RF) model using ensemble training data performed the highest of any model (area under curve = 0.98). We then compared the performance of RF models that use only natural history training data and those that use a combination of natural history (herbarium specimens) and iNaturalist training data. All models heavily relied on climate data (mean temperature of driest quarter, and precipitation of the warmest quarter), indicating that this species is under threat as climate continues to change. Validation datasets affected model fits as well. Models using only herbarium data performed slightly poorer when evaluated with cross‐validation than when validated externally with iNaturalist data. This study can serve as a model for future SDM studies of species with similar data limitations.

 
more » « less
Award ID(s):
2102974
NSF-PAR ID:
10403824
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Ecosphere
Volume:
14
Issue:
3
ISSN:
2150-8925
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction

    Forecasting range shifts in response to climate change requires accurate species distribution models (SDMs), particularly at the margins of species' ranges. However, most studies producing SDMs rely on sparse species occurrence datasets from herbarium records and public databases, along with random pseudoabsences. While environmental covariates used to fit SDMS are increasingly precise due to satellite data, the availability of species occurrence records is still a large source of bias in model predictions. We developed distribution models for hybridizing sister species of western and eastern Joshua trees (Yucca brevifoliaandY. jaegeriana, respectively), iconic Mojave Desert species that are threatened by climate change and habitat loss.

    Methods

    We conducted an intensive visual grid search of online satellite imagery for 672,043 0.25 km2grid cells to identify the two species' presences and absences on the landscape with exceptional resolution, and field validated 29,050 cells in 15,001 km of driving. We used the resulting presence/absence data to train SDMs for each Joshua tree species, revealing the contemporary environmental gradients (during the past 40 years) with greatest influence on the current distribution of adult trees.

    Results

    While the environments occupied byY. brevifoliaandY. jaegerianawere similar in total aridity, they differed with respect to seasonal precipitation and temperature ranges, suggesting the two species may have differing responses to climate change. Moreover, the species showed differing potential to occupy each other's geographic ranges: modeled potential habitat forY. jaegerianaextends throughout the range ofY. brevifolia, while potential habitat forY. brevifoliais not well represented within the range ofY. jaegeriana.

    Discussion

    By reproducing the current range of the Joshua trees with high fidelity, our dataset can serve as a baseline for future research, monitoring, and management of this species, including an increased understanding of dynamics at the trailing and leading margins of the species' ranges and potential for climate refugia.

     
    more » « less
  2. Abstract

    Species distribution models (SDMs) that rely on regional‐scale environmental variables will play a key role in forecasting species occurrence in the face of climate change. However, in the Anthropocene, a number of local‐scale anthropogenic variables, including wildfire history, land‐use change, invasive species, and ecological restoration practices can override regional‐scale variables to drive patterns of species distribution. Incorporating these human‐induced factors into SDMs remains a major research challenge, in part because spatial variability in these factors occurs at fine scales, rendering prediction over regional extents problematic. Here, we used big sagebrush (Artemisia tridentataNutt.) as a model species to explore whether including human‐induced factors improves the fit of the SDM. We applied a Bayesian hurdle spatial approach using 21,753 data points of field‐sampled vegetation obtained from the LANDFIRE program to model sagebrush occurrence and cover by incorporating fire history metrics and restoration treatments from 1980 to 2015 throughout the Great Basin of North America. Models including fire attributes and restoration treatments performed better than those including only climate and topographic variables. Number of fires and fire occurrence had the strongest relative effects on big sagebrush occurrence and cover, respectively. The models predicted that the probability of big sagebrush occurrence decreases by 1.2% (95% CI: −6.9%, 0.6%) when one fire occurs and cover decreases by 44.7% (95% CI: −47.9%, −41.3%) if at least one fire occurred over the 36 year period of record. Restoration practices increased the probability of big sagebrush occurrence but had minimal effect on cover. Our results demonstrate the potential value of including disturbance and land management along with climate in models to predict species distributions. As an increasing number of datasets representing land‐use history become available, we anticipate that our modeling framework will have broad relevance across a range of biomes and species.

     
    more » « less
  3. Qin, Hong (Ed.)

    iNaturalist has the potential to be an extremely rich source of organismal occurrence data. Launched in 2008, it now contains over 150 million uploaded observations as of May 2023. Based on the findings of a limited number of past studies assessing the taxonomic accuracy of participatory science-driven sources of occurrence data such as iNaturalist, there has been concern that some portion of these records might be misidentified in certain taxonomic groups. In this case study, we compare Research Grade iNaturalist observations with digitized herbarium specimens, both of which are currently available for combined download from large data aggregators and are therefore the primary sources of occurrence data for large-scale biodiversity/biogeography studies. Our comparisons were confined regionally to the southeastern United States (Florida, Georgia, North Carolina, South Carolina, Texas, Tennessee, Kentucky, and Virginia). Occurrence records from ten plant families (Gentianaceae, Ericaceae, Melanthiaceae, Ulmaceae, Fabaceae, Asteraceae, Fagaceae, Cyperaceae, Juglandaceae, Apocynaceae) were downloaded and scored on taxonomic accuracy. We found a comparable and relatively low rate of misidentification among both digitized herbarium specimens and Research Grade iNaturalist observations within the study area. This finding illustrates the utility and high quality of iNaturalist data for future research in the region, but also points to key differences between data types, giving each a respective advantage, depending on applications of the data.

     
    more » « less
  4. Climate change poses a threat to biodiversity, and it is unclear whether species can adapt to or tolerate new conditions, or migrate to areas with suitable habitats. Reconstructions of range shifts that occurred in response to environmental changes since the last glacial maximum (LGM) from species distribution models (SDMs) can provide useful data to inform conservation efforts. However, different SDM algorithms and climate reconstructions often produce contrasting patterns, and validation methods typically focus on accuracy in recreating current distributions, limiting their relevance for assessing predictions to the past or future. We modeled historically suitable habitat for the threatened North American tree green ashFraxinus pennsylvanicausing 24 SDMs built using two climate models, three calibration regions, and four modeling algorithms. We evaluated the SDMs using contemporary data with spatial block cross‐validation and compared the relative support for alternative models using a novel integrative method based on coupled demographic‐genetic simulations. We simulated genomic datasets using habitat suitability of each of the 24 SDMs in a spatially‐explicit model. Approximate Bayesian computation (ABC) was then used to evaluate the support for alternative SDMs through comparisons to an empirical population genomic dataset. Models had very similar performance when assessed with contemporary occurrences using spatial cross‐validation, but ABC model selection analyses consistently supported SDMs based on the CCSM climate model, an intermediate calibration extent, and the generalized linear modeling algorithm. Finally, we projected the future range of green ash under four climate change scenarios. Future projections using the SDMs selected via ABC suggest only minor shifts in suitable habitat for this species, while some of those that were rejected predicted dramatic changes. Our results highlight the different inferences that may result from the application of alternative distribution modeling algorithms and provide a novel approach for selecting among a set of competing SDMs with independent data.

     
    more » « less
  5. Abstract Aim

    Species distribution models (SDMs) that integrate presence‐only and presence–absence data offer a promising avenue to improve information on species' geographic distributions. The use of such ‘integrated SDMs’ on a species range‐wide extent has been constrained by the often limited presence–absence data and by the heterogeneous sampling of the presence‐only data. Here, we evaluate integrated SDMs for studying species ranges with a novel expert range map‐based evaluation. We build new understanding about how integrated SDMs address issues of estimation accuracy and data deficiency and thereby offer advantages over traditional SDMs.

    Location

    South and Central America.

    Time Period

    1979–2017.

    Major Taxa Studied

    Hummingbirds.

    Methods

    We build integrated SDMs by linking two observation models – one for each data type – to the same underlying spatial process. We validate SDMs with two schemes: (i) cross‐validation with presence–absence data and (ii) comparison with respect to the species' whole range as defined with IUCN range maps. We also compare models relative to the estimated response curves and compute the association between the benefit of the data integration and the number of presence records in each data set.

    Results

    The integrated SDM accounting for the spatially varying sampling intensity of the presence‐only data was one of the top performing models in both model validation schemes. Presence‐only data alleviated overly large niche estimates, and data integration was beneficial compared to modelling solely presence‐only data for species which had few presence points when predicting the species' whole range. On the community level, integrated models improved the species richness prediction.

    Main Conclusions

    Integrated SDMs combining presence‐only and presence–absence data are successfully able to borrow strengths from both data types and offer improved predictions of species' ranges. Integrated SDMs can potentially alleviate the impacts of taxonomically and geographically uneven sampling and to leverage the detailed sampling information in presence–absence data.

     
    more » « less