skip to main content


Title: Challenges and opportunities for using natural history collections to estimate insect population trends
Abstract

Natural history collections (NHC) provide a wealth of information that can be used to understand the impacts of global change on biodiversity. As such, there is growing interest in using NHC data to estimate changes in species' distributions and abundance trends over historic time horizons when contemporary survey data are limited or unavailable.

However, museum specimens were not collected with the purpose of estimating population trends and thus can exhibit spatiotemporal and collector‐specific biases that can impose severe limitations to using NHC data for evaluating population trajectories.

Here we review the challenges associated with using museum records to track long‐term insect population trends, including spatiotemporal biases in sampling effort and sparse temporal coverage within and across years. We highlight recent methodological advancements that aim to overcome these challenges and discuss emerging research opportunities.

Specifically, we examine the potential of integrating museum records and other contemporary data sources (e.g. collected via structured, designed surveys and opportunistic citizen science programs) in a unified analytical framework that accounts for the sampling biases associated with each data source. The emerging field of integrated modelling provides a promising framework for leveraging the wealth of collections data to accurately estimate long‐term trends of insect populations and identify cases where that is not possible using existing data sources.

 
more » « less
Award ID(s):
2010698 1954406
NSF-PAR ID:
10381657
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Journal of Animal Ecology
Volume:
92
Issue:
2
ISSN:
0021-8790
Page Range / eLocation ID:
p. 237-249
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Data deficiencies among rare or cryptic species preclude assessment of community‐level processes using many existing approaches, limiting our understanding of the trends and stressors for large numbers of species. Yet evaluating the dynamics of whole communities, not just common or charismatic species, is critical to understanding and the responses of biodiversity to ongoing environmental pressures.

    A recent surge in both public science and government‐funded data collection efforts has led to a wealth of biodiversity data. However, these data collection programmes use a wide range of sampling protocols (from unstructured, opportunistic observations of wildlife to well‐structured, design‐based programmes) and record information at a variety of spatiotemporal scales. As a result, available biodiversity data vary substantially in quantity and information content, which must be carefully reconciled for meaningful ecological analysis.

    Hierarchical modelling, including single‐species integrated models and hierarchical community models, has improved our ability to assess and predict biodiversity trends and processes. Here, we highlight the emerging ‘integrated community modelling’ framework that combines both data integration and community modelling to improve inferences on species‐ and community‐level dynamics.

    We illustrate the framework with a series of worked examples. Our three case studies demonstrate how integrated community models can be used to extend the geographic scope when evaluating species distributions and community‐level richness patterns; discern population and community trends over time; and estimate demographic rates and population growth for communities of sympatric species. We implemented these worked examples using multiple software methods through the R platform via packages with formula‐based interfaces and through development of custom code in JAGS, NIMBLE and Stan.

    Integrated community models provide an exciting approach to model biological and observational processes for multiple species using multiple data types and sources simultaneously, thus accounting for uncertainty and sampling error within a unified framework. By leveraging the combined benefits of both data integration and community modelling, integrated community models can produce valuable information about both common and rare species as well as community‐level dynamics, allowing for holistic evaluation of the effects of global change on biodiversity.

     
    more » « less
  2. Abstract

    Historical museum records provide potentially useful data for identifying drivers of change in species occupancy. However, because museum records are typically obtained via many collection methods, methodological developments are needed to enable robust inferences. Occupancy–detection models, a relatively new and powerful suite of statistical methods, are a potentially promising avenue because they can account for changes in collection effort through space and time.

    We use simulated datasets to identify how and when patterns in data and/or modelling decisions can bias inference. We focus primarily on the consequences of contrasting methodological approaches for dealing with species' ranges and inferring species' non‐detections in both space and time.

    We find that not all datasets are suitable for occupancy–detection analysis but, under the right conditions (namely, datasets that are broken into more time periods for occupancy inference and that contain a high fraction of community‐wide collections, or collection events that focus on communities of organisms), models can accurately estimate trends. Finally, we present a case study on eastern North American odonates where we calculate long‐term trends of occupancy using our most robust workflow.

    These results indicate that occupancy–detection models are a suitable framework for some research cases and expand the suite of available tools for macroecological analysis available to researchers, especially where structured datasets are unavailable.

     
    more » « less
  3. Abstract

    Identifying patterns of pathogen infection in natural systems is crucial to understanding mechanisms of host–pathogen interactions. In this study, we explored how Junonia coenia densovirus (JcDV) infection varies over space and time in populations of the Melissa blue butterfly (Lycaeides melissa: Lycaenidae) using two different host plants. Collections ofL. melissaadults from multiple populations and years, along with host plant tissue and community samples of arthropods found on host plants, were screened to determine JcDV prevalence and load. Additionally, we sampled at multiple time points within a singleL. melissaflight season to investigate intra‐annual variation in infection patterns.

    We found population‐specific variation in viral prevalence ofL. melissaacross collection years, with historical samples potentially having higher viral prevalence than contemporary samples, although host plant diet was not informative for these patterns. Patterns of infection across multiple generations within a flight season showed that late‐season samples had a higher proportion of JcDV‐positive individuals, suggesting an accumulation of virus over the season. Sequence data from a segment of the JcDV capsid gene showed a lack of viral genetic diversity betweenL. melissacollected from different localities, and little to no viral particles were found in the surrounding environment.

    Our discovery of temporal variation in infection suggests that multiple sampling efforts must be made when describing pathogen prevalence in multivoltine hosts. Our findings represent an important first step towards further exploration of the ecological factors mediating disease prevalence and host‐specific variability of infection in wild insect populations.

     
    more » « less
  4. Abstract

    Citizen and community science datasets are typically collected using flexible protocols. These protocols enable large volumes of data to be collected globally every year; however, the consequence is that these protocols typically lack the structure necessary to maintain consistent sampling across years. This can result in complex and pronounced interannual changes in the observation process, which can complicate the estimation of population trends because population changes over time are confounded with changes in the observation process.

    Here we describe a novel modelling approach designed to estimate spatially explicit species population trends while controlling for the interannual confounding common in citizen science data. The approach is based on Double machine learning, a statistical framework that uses machine learning (ML) methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data. ML makes it possible to use large sets of features to control for confounding and to model spatial heterogeneity in trends. Additionally, we present a simulation method to identify and adjust for residual confounding missed by the propensity scores.

    To illustrate the approach, we estimated species trends using data from the citizen science project eBird. We used a simulation study to assess the ability of the method to estimate spatially varying trends when faced with realistic confounding and temporal correlation. Results demonstrated the ability to distinguish between spatially constant and spatially varying trends. There were low error rates on the estimated direction of population change (increasing/decreasing) at each location and high correlations on the estimated magnitude of population change.

    The ability to estimate spatially explicit trends while accounting for confounding inherent in citizen science data has the potential to fill important information gaps, helping to estimate population trends for species and/or regions lacking rigorous monitoring data.

     
    more » « less
  5. Abstract

    Environmental and anthropogenic factors affect the population dynamics of migratory species throughout their annual cycles. However, identifying the spatiotemporal drivers of migratory species' abundances is difficult because of extensive gaps in monitoring data. The collection of unstructured opportunistic data by volunteer (citizen science) networks provides a solution to address data gaps for locations and time periods during which structured, design‐based data are difficult or impossible to collect.

    To estimate population abundance and distribution at broad spatiotemporal extents, we developed an integrated model that incorporates unstructured data during time periods and spatial locations when structured data are unavailable. We validated our approach through simulations and then applied the framework to the eastern North American migratory population of monarch butterflies during their spring breeding period in eastern Texas. Spring climate conditions have been identified as a key driver of monarch population sizes during subsequent summer and winter periods. However, low monarch densities during the spring combined with very few design‐based surveys in the region have limited the ability to isolate effects of spring weather variables on monarchs.

    Simulation results confirmed the ability of our integrated model to accurately and precisely estimate abundance indices and the effects of covariates during locations and time periods in which structured sampling are lacking. In our case study, we combined opportunistic monarch observations during the spring migration and breeding period with structured data from the summer Midwestern breeding grounds. Our model revealed a nonstationary relationship between weather conditions and local monarch abundance during the spring, driven by spatially varying vegetation and temperature conditions.

    Data for widespread and migratory species are often fragmented across multiple monitoring programs, potentially requiring the use of both structured and unstructured data sources to obtain complete geographic coverage. Our integrated model can estimate population abundance at broad spatiotemporal extents despite structured data gaps during the annual cycle by leveraging opportunistic data.

     
    more » « less