skip to main content

Title: Benchmark Bird Surveys Help Quantify Counting Accuracy in a Citizen-Science Database
The growth of biodiversity data sets generated by citizen scientists continues to accelerate. The availability of such data has greatly expanded the scale of questions researchers can address. Yet, error, bias, and noise continue to be serious concerns for analysts, particularly when data being contributed to these giant online data sets are difficult to verify. Counts of birds contributed to eBird, the world’s largest biodiversity online database, present a potentially useful resource for tracking trends over time and space in species’ abundances. We quantified counting accuracy in a sample of 1,406 eBird checklists by comparing numbers contributed by birders (N = 246) who visited a popular birding location in Oregon, USA, with numbers generated by a professional ornithologist engaged in a long-term study creating benchmark (reference) measurements of daily bird counts. We focused on waterbirds, which are easily visible at this site. We evaluated potential predictors of count differences, including characteristics of contributed checklists, of each species, and of time of day and year. Count differences were biased toward undercounts, with more than 75% of counts being below the daily benchmark value. Median count discrepancies were −29.1% (range: 0 to −42.8%; N = 20 species). Model sets revealed an important influence of each species’ reference count, which varied seasonally as waterbird numbers fluctuated, and of percent of species known to be present each day that were included on each checklist. That is, checklists indicating a more thorough survey of the species richness at the site also had, on average, smaller count differences. However, even on checklists with the most thorough species lists, counts were biased low and exceptionally variable in their accuracy. To improve utility of such bird count data, we suggest three strategies to pursue in the future. (1) Assess additional options for analytically determining how to select checklists that include less biased count data, as well as exploring options for correcting bias during the analysis stage. (2) Add options for users to provide additional information that helps analysts choose checklists, such as an option for users to tag checklists where they focused on obtaining accurate counts. (3) Explore opportunities to effectively calibrate citizen-science bird count data by establishing a formalized network of marquis sites where dedicated observers regularly contribute carefully collected benchmark data.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Frontiers in ecology and evolution
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Datasets that monitor biodiversity capture information differently depending on their design, which influences observer behavior and can lead to biases across observations and species. Combining different datasets can improve our ability to identify and understand threats to biodiversity, but this requires an understanding of the observation bias in each. Two datasets widely used to monitor bird populations exemplify these general concerns: eBird is a citizen science project with high spatiotemporal resolution but variation in distribution, effort, and observers, whereas the Breeding Bird Survey (BBS) is a structured survey of specific locations over time. Analyses using these two datasets can identify contradictory population trends. To understand these discrepancies and facilitate data fusion, we quantify species‐level reporting differences across eBird and the BBS in three regions across the United States by jointly modeling bird abundances using data from both datasets. First, we fit a joint Species Distribution Model that accounts for environmental conditions and effort to identify reporting differences across the datasets. We then examine how these differences in reporting are related to species traits. Finally, we analyze species reported to one dataset but not the other and determine whether traits differ between reported and unreported species. We find that most species are reported more in the BBS than eBird. Specifically, we find that compared to eBird, BBS observers tend to report higher counts of common species and species that are usually detected by sound. We also find that species associated with water are reported less in the BBS. Species typically identified by sound are reported more at sunrise than later in the morning. Our results quantify reporting differences in eBird and the BBS to enhance our understanding of how each captures information and how they should be used. The reporting rates we identify can also be incorporated into observation models through detectability or effort to improve analyses across species and datasets. The method demonstrated here can be used to compare reporting rates across any two or more datasets to examine biases.

    more » « less
  2. Abstract Aim

    Understanding and addressing the global biodiversity crisis requires ecological information compiled continuously from across the globe. Data from citizen science initiatives are useful for quantifying species' ecological niches and geographical distributions but can be difficult to apply towards biodiversity monitoring. The presence of fixed geographical locations reduces the opportunistic nature of citizen science data, allowing for more reliable and nuanced trend estimation. The eBird citizen‐science program contains predefined locations whose bird assemblages are sampled across years (‘hotspots’). For hotspots to function as a biodiversity monitoring resource, issues related to data coverage, biases, and trends need to be addressed.




    We estimated the survey completeness of species richness at 300,500 eBird hotspots during 2002–2022. We documented sampling biases at eBird hotspot and non‐hotspot locations during 2022 based on protection status, temperature, precipitation, and landcover.


    A total of 10,410 bird species (ca. 96.9% of total) were recorded at hotspots. The number of hotspots, checklists, and participants and the quality of species richness estimates increased worldwide with the Nearctic containing the strongest and most consistent trends. Compared to non‐hotspots, hotspots oversampled areas with higher protection status. Hotspots and non‐hotspots oversampled warmer and wetter locations in the Antarctic, Nearctic, and Palearctic, and cooler locations in the Afrotropics, Australasia, and the Neotropics. Hotspots and especially non‐hotspots oversampled urban areas. Hotspots and non‐hotspots undersampled shrublands in Australasia. Hotspots and especially non‐hotspots undersampled forests in the Afrotropics, Indomalaya, Neotropics, and Oceania.

    Main Conclusions

    Hotspots have captured a large component of the world's avian diversity but have done so inconsistently across space and time. Data quantity and quality are increasing in many regions, but the presence of regionally specific sampling biases and spatial uncertainty in hotspot locations should be addressed when applying the data.

    more » « less
  3. Abstract

    Spatial biases are a common feature of presence–absence data from citizen scientists. Spatial thinning can mitigate errors in species distribution models (SDMs) that use these data. When detections or non‐detections are rare, however, SDMs may suffer from class imbalance or low sample size of the minority (i.e. rarer) class. Poor predictions can result, the severity of which may vary by modelling technique.

    To explore the consequences of spatial bias and class imbalance in presence–absence data, we used eBird citizen science data for 102 bird species from the northeastern USA to compare spatial thinning, class balancing and majority‐only thinning (i.e. retaining all samples of the minority class). We created SDMs using two parametric or semi‐parametric techniques (generalized linear models and generalized additive models) and two machine learning techniques (random forest and boosted regression trees). We tested the predictive abilities of these SDMs using an independent and systematically collected reference dataset with a combination of discrimination (area under the receiver operator characteristic curve; true skill statistic; area under the precision‐recall curve) and calibration (Brier score; Cohen's kappa) metrics.

    We found large variation in SDM performance depending on thinning and balancing decisions. Across all species, there was no single best approach, with the optimal choice of thinning and/or balancing depending on modelling technique, performance metric and the baseline sample prevalence of species in the data. Spatially thinning all the data was often a poor approach, especially for species with baseline sample prevalence <0.1. For most of these rare species, balancing classes improved model discrimination between presence and absence classes using machine learning techniques, but typically hindered model calibration.

    Baseline sample prevalence, sample size, modelling approach and the intended application of SDM output—whether discrimination or calibration—should guide decisions about how to thin or balance data, given the considerable influence of these methodological choices on SDM performance. For prognostic applications requiring good model calibration (vis‐à‐vis discrimination), the match between sample prevalence and true species prevalence may be the overriding feature and warrants further investigation.

    more » « less
  4. Abstract

    Animal behaviors are often modified in urban settings due to changes in species assemblages and interactions. The ability of prey to respond to a predator is a critical behavior, but urban populations may experience altered predation pressure, food supplementation, and other human‐mediated disturbances that modify their responsiveness to predation risk and promote habituation.

    Citizen‐science programs generally focus on the collection and analysis of observational data (e.g., bird checklists), but there has been increasing interest in the engagement of citizen scientists for ecological experimentation.

    Our goal was to implement a behavioral experiment in which citizen scientists recorded antipredator behaviors in wild birds occupying urban areas. In North America, increasing populations ofAccipiterhawks have colonized suburban and urban areas and regularly prey upon birds that frequent backyard bird feeders. This scenario, of an increasingly common avian predator hunting birds near human dwellings, offers a unique opportunity to characterize antipredator behaviors within urban passerines.

    For two winters, we engaged citizen scientists in Chicago, IL, USA to deploy a playback experiment and record antipredator behaviors in backyard birds. If backyard birds maintained their antipredator behaviors, we hypothesized that birds would decrease foraging behaviors and increase vigilance in response to a predator cue (hawk playback) but that these responses would be mediated by flock size, presence of sentinel species, body size, tree cover, and amount of surrounding urban area.

    Using a randomized control–treatment design, citizen scientists at 15 sites recorded behaviors from 3891 individual birds representing 22 species. Birds were more vigilant and foraged less during the playback of a hawk call, and these responses were strongest for individuals within larger flocks and weakest in larger‐bodied birds. We did not find effects of sentinel species, tree cover, or urbanization.

    By deploying a behavioral experiment, we found that backyard birds inhabiting urban landscapes largely maintained antipredator behaviors of increased vigilance and decreased foraging in response to predator cues. Experimentation in citizen science poses challenges (e.g., observation bias, sample size limitations, and reduced complexity in protocol design), but unlike programs focused solely on observational data, experimentation allows researchers to disentangle the complex factors underlying animal behavior and species interactions.

    more » « less
  5. Abstract

    An occupancy model makes use of data that are structured as sets of repeated visits to each of many sites, in order to estimate the actual probability of occupancy (i.e. proportion of occupied sites) after correcting for imperfect detection using the information contained in the sets of repeated observations. We explore the conditions under which preexisting, volunteer-collected data from the citizen science project eBird can be used for fitting occupancy models. Because the majority of eBird’s data are not collected in the form of repeated observations at individual locations, we explore 2 ways in which the single-visit records could be used in occupancy models. First, we assess the potential for space-for-time substitution: aggregating single-visit records from different locations within a region into pseudo-repeat visits. On average, eBird’s observers did not make their observations at locations that were representative of the habitat in the surrounding area, which would lead to biased estimates of occupancy probabilities when using space-for-time substitution. Thus, the use of space-for-time substitution is not always appropriate. Second, we explored the utility of including data from single-visit records to supplement sets of repeated-visit data. In a simulation study we found that inclusion of single-visit records increased the precision of occupancy estimates, but only when detection probabilities are high. When detection probability was low, the addition of single-visit records exacerbated biases in estimates of occupancy probability. We conclude that subsets of data from eBird, and likely from similar projects, can be used for occupancy modeling either using space-for-time substitution or supplementing repeated-visit data with data from single-visit records. The appropriateness of either alternative will depend on the goals of a study and on the probabilities of detection and occupancy of the species of interest.

    more » « less