skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mass production of unvouchered records fails to represent global biodiversity patterns
The ever-increasing human footprint even in very remote places on Earth has inspired efforts to document biodiversity vigorously in case organisms go extinct. However, the data commonly gathered come from either primary voucher specimens in a natural history collection or from direct field observations that are not traceable to tangible material in a museum or herbarium. Although both datasets are crucial for assessing how anthropogenic drivers affect biodiversity, they have widespread coverage gaps and biases that may render them inefficient in representing patterns of biodiversity. Using a large global dataset of around 1.9 billion occurrence records of terrestrial plants, butterflies, amphibians, birds, reptiles and mammals, we quantify coverage and biases of expected biodiversity patterns by voucher and observation records. We show that the mass production of observation records does not lead to higher coverage of expected biodiversity patterns but is disproportionately biased toward certain regions, clades, functional traits and time periods. Such coverage patterns are driven by the ease of accessibility to air and ground transportation, level of security and extent of human modification at each sampling site. Conversely, voucher records are vastly infrequent in occurrence data but in the few places where they are sampled, showed relative congruence with expected biodiversity patterns for all dimensions. The differences in coverage and bias by voucher and observation records have important implications on the utility of these records for research in ecology, evolution and conservation research.  more » « less
Award ID(s):
2113424 2031928
PAR ID:
10417001
Author(s) / Creator(s):
;
Publisher / Repository:
Nature Ecology & Evolution
Date Published:
Journal Name:
Nature Ecology & Evolution
ISSN:
2397-334X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Primary biodiversity data records that are open access and available in a standardised format are essential for conservation planning and research on policy-relevant time-scales. We created a dataset to document all known occurrence data for the Federally Endangered Poweshiek skipperling butterfly [ Oarismapoweshiek (Parker, 1870; Lepidoptera: Hesperiidae)]. The Poweshiek skipperling was a historically common species in prairie systems across the upper Midwest, United States and Manitoba, Canada. Rapid declines have reduced the number of verified extant sites to six. Aggregating and curating Poweshiek skipperling occurrence records documents and preserves all known distributional data, which can be used to address questions related to Poweshiek skipperling conservation, ecology and biogeography. Over 3500 occurrence records were aggregated over a temporal coverage from 1872 to present. Occurrence records were obtained from 37 data providers in the conservation and natural history collection community using both “HumanObservation” and “PreservedSpecimen” as an acceptable basisOfRecord. Data were obtained in different formats and with differing degrees of quality control. During the data aggregation and cleaning process, we transcribed specimen label data, georeferenced occurrences, adopted a controlled vocabulary, removed duplicates and standardised formatting. We examined the dataset for inconsistencies with known Poweshiek skipperling biogeography and phenology and we verified or removed inconsistencies by working with the original data providers. In total, 12 occurrence records were removed because we identified them to be the western congener Oarismagarita (Reakirt, 1866). This resulting dataset enhances the permanency of Poweshiek skipperling occurrence data in a standardised format. This is a validated and comprehensive dataset of occurrence records for the Poweshiek skipperling ( Oarismapoweshiek ) utilising both observation and specimen-based records. Occurrence data are preserved and available for continued research and conservation projects using standardised Darwin Core formatting where possible. Prior to this project, much of these occurrence records were not mobilised and were being stored in individual institutional databases, researcher datasets and personal records. This dataset aggregates presence data from state conservation agencies, natural heritage programmes, natural history collections, citizen scientists, researchers and the U.S. Fish & Wildlife Service. The data include opportunistic observations and collections, research vouchers, observations collected for population monitoring and observations collected using standardised research methodologies. The aggregated occurrence records underwent cleaning efforts that improved data interoperablitity, removed transcription errors and verified or removed uncertain data. This dataset enhances available information on the spatiotemporal distribution of this Federally Endangered species. As part of this aggregation process, we discovered and verified Poweshiek skipperling occurrence records from two previously unknown states, Nebraska and Ohio. 
    more » « less
  2. Growing threats to biodiversity demand timely, detailed information on species occurrence, diversity and abundance at large scales. Camera traps (CTs), combined with computer vision models, provide an efficient method to survey species of certain taxa with high spatio-temporal resolution. We test the potential of CTs to close biodiversity knowledge gaps by comparing CT records of terrestrial mammals and birds from the recently released Wildlife Insights platform to publicly available occurrences from many observation types in the Global Biodiversity Information Facility. In locations with CTs, we found they sampled a greater number of days (mean = 133 versus 57 days) and documented additional species (mean increase of 1% of expected mammals). For species with CT data, we found CTs provided novel documentation of their ranges (93% of mammals and 48% of birds). Countries with the largest boost in data coverage were in the historically underrepresented southern hemisphere. Although embargoes increase data providers' willingness to share data, they cause a lag in data availability. Our work shows that the continued collection and mobilization of CT data, especially when combined with data sharing that supports attribution and privacy, has the potential to offer a critical lens into biodiversity. This article is part of the theme issue ‘Detecting and attributing the causes of biodiversity change: needs, gaps and solutions’. 
    more » « less
  3. Abstract The availability of citizen science data has resulted in growing applications in biodiversity science. One widely used platform, iNaturalist, provides millions of digitally vouchered observations submitted by a global user base. These observation records include a date and a location but otherwise do not contain any information about the sampling process. As a result, sampling biases must be inferred from the data themselves. In the present article, we examine spatial and temporal biases in iNaturalist observations from the platform's launch in 2008 through the end of 2019. We also characterize user behavior on the platform in terms of individual activity level and taxonomic specialization. We found that, at the level of taxonomic class, the users typically specialized on a particular group, especially plants or insects, and rarely made observations of the same species twice. Biodiversity scientists should consider whether user behavior results in systematic biases in their analyses before using iNaturalist data. 
    more » « less
  4. The State of Arizona in the south-western United States supports a high diversity of insects. Digitised occurrence records, especially from preserved specimens in natural history collections, are an important and growing resource to understand biodiversity and biogeography. Underlying bias in how insects are collected and what that means for interpreting patterns of insect diversity is largely untested. To explore the effects of insect collecting bias in Arizona, the State was regionalised into specific areas. First, the entire State was divided into broad biogeographic areas by ecoregion. Second, the 81 tallest mountain ranges were mapped on to the State. The distribution of digitised records across these areas were then examined. A case study of surveying the beetles (Insecta, Coleoptera) of the Sand Tank Mountains is presented. The Sand Tanks are a low-elevation range in the Lower Colorado River Basin subregion of the Sonoran Desert from which a single beetle record was published before this study. The number of occurrence records and collecting events are very unevenly distributed throughout Arizona and do not strongly correlate with the geographic size of areas. Species richness is estimated for regions in Arizona using rarefaction and extrapolation. Digitised records from the disproportionately highly collected areas in Arizona represent at best 70% the total insect diversity within them. We report a total of 141 species of Coleoptera from the Sand Tank Mountains, based on 914 digitised voucher specimens. These specimens add important new records for taxa that were previously unavailable in digitised data and highlight important biogeographic ranges. Possible underlying mechanisms causing bias are discussed and recommendations are made for future targeted collecting of under-sampled regions. Insect species diversity is apparently at best 70% documented for the State of Arizona with many thousands of species not yet recorded. The Chiricahua Mountains are the most densely sampled region of Arizona and likely contain at least 2,000 species not yet vouchered in online data. Preliminary estimates for species richness of Arizona are at least 21,000 and likely much higher. Limitations to analyses are discussed which highlight the strong need for more insect occurrence data. 
    more » « less
  5. Abstract Whether cities are more or less diverse than surrounding environments, and the extent to which non‐native species in cities impact regional species pools, remain two fundamental yet unanswered questions in urban ecology. Here we offer a unifying framework for understanding the mechanisms that generate biodiversity patterns across taxonomic groups and spatial scales in urban systems. One commonality between existing frameworks is the collective recognition that species co‐occurrence locally is not simply a function of natural colonization and extinction processes. Instead, it is largely a consequence of human actions that are governed by a myriad of social processes occurring across groups, institutions, and stakeholders. Rather than challenging these frameworks, we expand upon them to explicitly consider how human and non‐human mechanisms interact to control urban biodiversity and influence species composition over space and time. We present a comprehensive theory of the processes that drive biodiversity within cities, between cities and surrounding non‐urbanized areas and across cities, using the general perspective of metacommunity ecology. Armed with this approach, we embrace the fact that humans substantially influence β‐diversity by creating a variety of different habitats in urban areas, and by influencing dispersal processes and rates, and suggest ways how these influences can be accommodated to existing metacommunity paradigms. Since patterns in urban biodiversity have been extensively described at the local or regional scale, we argue that the basic premises of the theory can be validated by studying the β‐diversity across spatial scales within and across urban areas. By explicitly integrating the myriad of processes that drive native and non‐native urban species co‐occurrence, the proposed theory not only helps reconcile contrasting views on whether urban ecosystems are biodiversity hotspots or biodiversity sinks, but also provides a mechanistic understanding to better predict when and why alternative biodiversity patterns might emerge. 
    more » « less