skip to main content

Title: Species traits and observer behaviors that bias data assimilation and how to accommodate them

Datasets that monitor biodiversity capture information differently depending on their design, which influences observer behavior and can lead to biases across observations and species. Combining different datasets can improve our ability to identify and understand threats to biodiversity, but this requires an understanding of the observation bias in each. Two datasets widely used to monitor bird populations exemplify these general concerns: eBird is a citizen science project with high spatiotemporal resolution but variation in distribution, effort, and observers, whereas the Breeding Bird Survey (BBS) is a structured survey of specific locations over time. Analyses using these two datasets can identify contradictory population trends. To understand these discrepancies and facilitate data fusion, we quantify species‐level reporting differences across eBird and the BBS in three regions across the United States by jointly modeling bird abundances using data from both datasets. First, we fit a joint Species Distribution Model that accounts for environmental conditions and effort to identify reporting differences across the datasets. We then examine how these differences in reporting are related to species traits. Finally, we analyze species reported to one dataset but not the other and determine whether traits differ between reported and unreported species. We find that most species are reported more in the BBS than eBird. Specifically, we find that compared to eBird, BBS observers tend to report higher counts of common species and species that are usually detected by sound. We also find that species associated with water are reported less in the BBS. Species typically identified by sound are reported more at sunrise than later in the morning. Our results quantify reporting differences in eBird and the BBS to enhance our understanding of how each captures information and how they should be used. The reporting rates we identify can also be incorporated into observation models through detectability or effort to improve analyses across species and datasets. The method demonstrated here can be used to compare reporting rates across any two or more datasets to examine biases.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Ecological Applications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Silva, Daniel de (Ed.)
    Biodiversity loss is a global ecological crisis that is both a driver of and response to environmental change. Understanding the connections between species declines and other components of human-natural systems extends across the physical, life, and social sciences. From an analysis perspective, this requires integration of data from different scientific domains, which often have heterogeneous scales and resolutions. Community science projects such as eBird may help to fill spatiotemporal gaps and enhance the resolution of standardized biological surveys. Comparisons between eBird and the more comprehensive North American Breeding Bird Survey (BBS) have found these datasets can produce consistent multi-year abundance trends for bird populations at national and regional scales. Here we investigate the reliability of these datasets for estimating patterns at finer resolutions, inter-annual changes in abundance within town boundaries. Using a case study of 14 focal species within Massachusetts, we calculated four indices of annual relative abundance using eBird and BBS datasets, including two different modeling approaches within each dataset. We compared the correspondence between these indices in terms of multi-year trends, annual estimates, and inter-annual changes in estimates at the state and town-level. We found correspondence between eBird and BBS multi-year trends, but this was not consistent across all species and diminished at finer, inter-annual temporal resolutions. We further show that standardizing modeling approaches can increase index reliability even between datasets at coarser temporal resolutions. Our results indicate that multiple datasets and modeling methods should be considered when estimating species population dynamics at finer temporal resolutions, but standardizing modeling approaches may improve estimate correspondence between abundance datasets. In addition, reliability of these indices at finer spatial scales may depend on habitat composition, which can impact survey accuracy. 
    more » « less
  2. Abstract

    Aircraft collisions with birds span the entire history of human aviation, including fatal collisions during some of the first powered human flights. Much effort has been expended to reduce such collisions, but increased knowledge about bird movements and species occurrence could dramatically improve decision support and proactive measures to reduce them. Migratory movements of birds pose a unique, often overlooked, threat to aviation that is particularly difficult for individual airports to monitor and predict the occurrence of birds vary extensively in space and time at the local scales of airport responses.

    We use two publicly available datasets, radar data from the US NEXRAD network characterizing migration movements and eBird data collected by citizen scientists to map bird movements and species composition with low human effort expenditures but high temporal and spatial resolution relative to other large‐scale bird survey methods. As a test case, we compare results from weather radar distributions and eBird species composition with detailed bird strike records from three major New York airports.

    We show that weather radar‐based estimates of migration intensity can accurately predict the probability of bird strikes, with 80% of the variation in bird strikes across the year explained by the average amount of migratory movements captured on weather radar. We also show that eBird‐based estimates of species occurrence can, using species’ body mass and flocking propensity, accurately predict when most damaging strikes occur.

    Synthesis and applications. By better understanding when and where different bird species occur, airports across the world can predict seasonal periods of collision risks with greater temporal and spatial resolution; such predictions include potential to predict when the most severe and damaging strikes may occur. Our results highlight the power of federating datasets with bird movement and distribution data for developing better and more taxonomically and ecologically tuned models of likelihood of strikes occurring and severity of strikes.

    more » « less
  3. Abstract Aim

    Understanding and addressing the global biodiversity crisis requires ecological information compiled continuously from across the globe. Data from citizen science initiatives are useful for quantifying species' ecological niches and geographical distributions but can be difficult to apply towards biodiversity monitoring. The presence of fixed geographical locations reduces the opportunistic nature of citizen science data, allowing for more reliable and nuanced trend estimation. The eBird citizen‐science program contains predefined locations whose bird assemblages are sampled across years (‘hotspots’). For hotspots to function as a biodiversity monitoring resource, issues related to data coverage, biases, and trends need to be addressed.




    We estimated the survey completeness of species richness at 300,500 eBird hotspots during 2002–2022. We documented sampling biases at eBird hotspot and non‐hotspot locations during 2022 based on protection status, temperature, precipitation, and landcover.


    A total of 10,410 bird species (ca. 96.9% of total) were recorded at hotspots. The number of hotspots, checklists, and participants and the quality of species richness estimates increased worldwide with the Nearctic containing the strongest and most consistent trends. Compared to non‐hotspots, hotspots oversampled areas with higher protection status. Hotspots and non‐hotspots oversampled warmer and wetter locations in the Antarctic, Nearctic, and Palearctic, and cooler locations in the Afrotropics, Australasia, and the Neotropics. Hotspots and especially non‐hotspots oversampled urban areas. Hotspots and non‐hotspots undersampled shrublands in Australasia. Hotspots and especially non‐hotspots undersampled forests in the Afrotropics, Indomalaya, Neotropics, and Oceania.

    Main Conclusions

    Hotspots have captured a large component of the world's avian diversity but have done so inconsistently across space and time. Data quantity and quality are increasing in many regions, but the presence of regionally specific sampling biases and spatial uncertainty in hotspot locations should be addressed when applying the data.

    more » « less
  4. null (Ed.)
    The growth of biodiversity data sets generated by citizen scientists continues to accelerate. The availability of such data has greatly expanded the scale of questions researchers can address. Yet, error, bias, and noise continue to be serious concerns for analysts, particularly when data being contributed to these giant online data sets are difficult to verify. Counts of birds contributed to eBird, the world’s largest biodiversity online database, present a potentially useful resource for tracking trends over time and space in species’ abundances. We quantified counting accuracy in a sample of 1,406 eBird checklists by comparing numbers contributed by birders (N = 246) who visited a popular birding location in Oregon, USA, with numbers generated by a professional ornithologist engaged in a long-term study creating benchmark (reference) measurements of daily bird counts. We focused on waterbirds, which are easily visible at this site. We evaluated potential predictors of count differences, including characteristics of contributed checklists, of each species, and of time of day and year. Count differences were biased toward undercounts, with more than 75% of counts being below the daily benchmark value. Median count discrepancies were −29.1% (range: 0 to −42.8%; N = 20 species). Model sets revealed an important influence of each species’ reference count, which varied seasonally as waterbird numbers fluctuated, and of percent of species known to be present each day that were included on each checklist. That is, checklists indicating a more thorough survey of the species richness at the site also had, on average, smaller count differences. However, even on checklists with the most thorough species lists, counts were biased low and exceptionally variable in their accuracy. To improve utility of such bird count data, we suggest three strategies to pursue in the future. (1) Assess additional options for analytically determining how to select checklists that include less biased count data, as well as exploring options for correcting bias during the analysis stage. (2) Add options for users to provide additional information that helps analysts choose checklists, such as an option for users to tag checklists where they focused on obtaining accurate counts. (3) Explore opportunities to effectively calibrate citizen-science bird count data by establishing a formalized network of marquis sites where dedicated observers regularly contribute carefully collected benchmark data. 
    more » « less
  5. Abstract

    Fishery observers are prevalent actors in the global effort to reduce discards in fisheries, but there remains considerable uncertainty about how effective they are. We analyzed high-resolution logbook records of individual hauls (n= 127 415) across five-and-a-half-years (2012–2018) for all of Greenland’s large-scale fisheries to determine if onboard fishery observers influence the mandatory reporting of discards. To do so, we used exact matching to compare reported discards for observed and unobserved hauls (each time a catch is recorded), thus controlling for systematic differences between monitored and unmonitored practices. After adjusting for variables that represent species caught, gear, vessel, owner, year, license, and location, we found that skippers systematically underreport discards when no observers are on board. Systematic underreporting was most pronounced in less valuable fisheries, in contrast to theoretical arguments in previous studies. The differences between reported discards from observed and unobserved fishing leads us to assume that onboard observers encourage more faithful logbook records. Thus, onboard observers play a vital role in improving information on the environmental impact of fishing and in turn, make a key contribution to sustainable fisheries management.

    more » « less