skip to main content


This content will become publicly available on April 1, 2024

Title: Mass production of unvouchered records fails to represent global biodiversity patterns
The ever-increasing human footprint even in very remote places on Earth has inspired efforts to document biodiversity vigorously in case organisms go extinct. However, the data commonly gathered come from either primary voucher specimens in a natural history collection or from direct field observations that are not traceable to tangible material in a museum or herbarium. Although both datasets are crucial for assessing how anthropogenic drivers affect biodiversity, they have widespread coverage gaps and biases that may render them inefficient in representing patterns of biodiversity. Using a large global dataset of around 1.9 billion occurrence records of terrestrial plants, butterflies, amphibians, birds, reptiles and mammals, we quantify coverage and biases of expected biodiversity patterns by voucher and observation records. We show that the mass production of observation records does not lead to higher coverage of expected biodiversity patterns but is disproportionately biased toward certain regions, clades, functional traits and time periods. Such coverage patterns are driven by the ease of accessibility to air and ground transportation, level of security and extent of human modification at each sampling site. Conversely, voucher records are vastly infrequent in occurrence data but in the few places where they are sampled, showed relative congruence with expected biodiversity patterns for all dimensions. The differences in coverage and bias by voucher and observation records have important implications on the utility of these records for research in ecology, evolution and conservation research.  more » « less
Award ID(s):
2113424 2031928
NSF-PAR ID:
10417001
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Nature Ecology & Evolution
ISSN:
2397-334X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large systematic revisionary projects incorporating data for hundreds or thousands of taxa require an integrative approach, with a strong biodiversity-informatics core for efficient data management to facilitate research on the group. Our original biodiversity informatics platform, 3i (Internet-accessible Interactive Identification) combined a customized MS Access database backend with ASP-based web interfaces to support revisionary syntheses of several large genera of leafhopers (Hemiptera: Auchenorrhyncha: Cicadellidae). More recently, for our National Science Foundation sponsored project, “GoLife: Collaborative Research: Integrative genealogy, ecology and phenomics of deltocephaline leafhoppers (Hemiptera: Cicadellidae), and their microbial associates”, we selected the new open-source platform TaxonWorks as the cyberinfrastructure. In the scope of the project, the original “3i World Auchenorrhyncha Database” was imported into TaxonWorks. At the present time, TaxonWorks has many tools to automatically import nomenclature, citations, and specimen based collection data. At the time of the initial migration of the 3i database, many of those tools were still under development, and complexity of the data in the database required a custom migration script, which is still probably the most efficient solution for importing datasets with long development history. At the moment, the World Auchenorrhyncha Database comprehensively covers nomenclature of the group and includes data on 70 valid families, 6,816 valid genera, 47,064 valid species as well as synonymy and subsequent combinations (Fig. 1). In addition, many taxon records include the original citation, bibliography, type information, etymology, etc. The bibliography of the group includes 37,579 sources, about 1/3 of which are associated with PDF files. Species have distribution records, either derived from individual specimens or as country and state level asserted distribution, as well as biological associations indicating host plants, predators, and parasitoids. Observation matrices in TaxonWorks are designed to handle morphological data associated with taxa or specimens. The matrices may be used to automatically generate interactive identification keys and taxon descriptions. They can also be downloaded to be imported, for example, into Lucid builder, or to perform phylogenetic analysis using an external application. At the moment there are 36 matrices associated with the project. The observation matrix from GoLife project covers 798 taxa by 210 descriptors (most of which are qualitative multi-state morphological descriptors) (Fig. 2). Illustrations are provided for 9,886 taxa and organized in the specialized image matrix and could be used as a pictorial key for determination of species and taxa of a higher rank. For the phylogenetic analysis, a dataset was constructed for 730 terminal taxa and >160,000 nucleotide positions obtained using anchored hybrid enrichment of genomic DNA for a sample of leafhoppers from the subfamily Deltocephalinae and outgroups. The probe kit targets leafhopper genes, as well as some bacterial genes (endosymbionts and plant pathogens transmitted by leafhoppers). The maximum likelihood analyses of concatenated nucleotide and amino acid sequences as well as coalescent gene tree analysis yielded well-resolved phylogenetic trees (Cao et al. 2022). Raw sequence data have been uploaded to the Sequence Read Archive on GenBank. Occurrence and morphological data, as well as diagnostic images, for voucher specimens have been incorporated into TaxonWorks. Data in TaxonWorks could be exported in raw format, get accessed via Application Programming Interface (API), or be shared with external data aggregators like Catalogue of Life, GBIF, iDigBio. 
    more » « less
  2. Primary biodiversity data records that are open access and available in a standardised format are essential for conservation planning and research on policy-relevant time-scales. We created a dataset to document all known occurrence data for the Federally Endangered Poweshiek skipperling butterfly [ Oarismapoweshiek (Parker, 1870; Lepidoptera: Hesperiidae)]. The Poweshiek skipperling was a historically common species in prairie systems across the upper Midwest, United States and Manitoba, Canada. Rapid declines have reduced the number of verified extant sites to six. Aggregating and curating Poweshiek skipperling occurrence records documents and preserves all known distributional data, which can be used to address questions related to Poweshiek skipperling conservation, ecology and biogeography. Over 3500 occurrence records were aggregated over a temporal coverage from 1872 to present. Occurrence records were obtained from 37 data providers in the conservation and natural history collection community using both “HumanObservation” and “PreservedSpecimen” as an acceptable basisOfRecord. Data were obtained in different formats and with differing degrees of quality control. During the data aggregation and cleaning process, we transcribed specimen label data, georeferenced occurrences, adopted a controlled vocabulary, removed duplicates and standardised formatting. We examined the dataset for inconsistencies with known Poweshiek skipperling biogeography and phenology and we verified or removed inconsistencies by working with the original data providers. In total, 12 occurrence records were removed because we identified them to be the western congener Oarismagarita (Reakirt, 1866). This resulting dataset enhances the permanency of Poweshiek skipperling occurrence data in a standardised format. This is a validated and comprehensive dataset of occurrence records for the Poweshiek skipperling ( Oarismapoweshiek ) utilising both observation and specimen-based records. Occurrence data are preserved and available for continued research and conservation projects using standardised Darwin Core formatting where possible. Prior to this project, much of these occurrence records were not mobilised and were being stored in individual institutional databases, researcher datasets and personal records. This dataset aggregates presence data from state conservation agencies, natural heritage programmes, natural history collections, citizen scientists, researchers and the U.S. Fish & Wildlife Service. The data include opportunistic observations and collections, research vouchers, observations collected for population monitoring and observations collected using standardised research methodologies. The aggregated occurrence records underwent cleaning efforts that improved data interoperablitity, removed transcription errors and verified or removed uncertain data. This dataset enhances available information on the spatiotemporal distribution of this Federally Endangered species. As part of this aggregation process, we discovered and verified Poweshiek skipperling occurrence records from two previously unknown states, Nebraska and Ohio. 
    more » « less
  3. Growing threats to biodiversity demand timely, detailed information on species occurrence, diversity and abundance at large scales. Camera traps (CTs), combined with computer vision models, provide an efficient method to survey species of certain taxa with high spatio-temporal resolution. We test the potential of CTs to close biodiversity knowledge gaps by comparing CT records of terrestrial mammals and birds from the recently released Wildlife Insights platform to publicly available occurrences from many observation types in the Global Biodiversity Information Facility. In locations with CTs, we found they sampled a greater number of days (mean = 133 versus 57 days) and documented additional species (mean increase of 1% of expected mammals). For species with CT data, we found CTs provided novel documentation of their ranges (93% of mammals and 48% of birds). Countries with the largest boost in data coverage were in the historically underrepresented southern hemisphere. Although embargoes increase data providers' willingness to share data, they cause a lag in data availability. Our work shows that the continued collection and mobilization of CT data, especially when combined with data sharing that supports attribution and privacy, has the potential to offer a critical lens into biodiversity. This article is part of the theme issue ‘Detecting and attributing the causes of biodiversity change: needs, gaps and solutions’. 
    more » « less
  4. The State of Arizona in the south-western United States supports a high diversity of insects. Digitised occurrence records, especially from preserved specimens in natural history collections, are an important and growing resource to understand biodiversity and biogeography. Underlying bias in how insects are collected and what that means for interpreting patterns of insect diversity is largely untested. To explore the effects of insect collecting bias in Arizona, the State was regionalised into specific areas. First, the entire State was divided into broad biogeographic areas by ecoregion. Second, the 81 tallest mountain ranges were mapped on to the State. The distribution of digitised records across these areas were then examined.

    A case study of surveying the beetles (Insecta, Coleoptera) of the Sand Tank Mountains is presented. The Sand Tanks are a low-elevation range in the Lower Colorado River Basin subregion of the Sonoran Desert from which a single beetle record was published before this study.

    The number of occurrence records and collecting events are very unevenly distributed throughout Arizona and do not strongly correlate with the geographic size of areas. Species richness is estimated for regions in Arizona using rarefaction and extrapolation. Digitised records from the disproportionately highly collected areas in Arizona represent at best 70% the total insect diversity within them. We report a total of 141 species of Coleoptera from the Sand Tank Mountains, based on 914 digitised voucher specimens. These specimens add important new records for taxa that were previously unavailable in digitised data and highlight important biogeographic ranges.

    Possible underlying mechanisms causing bias are discussed and recommendations are made for future targeted collecting of under-sampled regions. Insect species diversity is apparently at best 70% documented for the State of Arizona with many thousands of species not yet recorded. The Chiricahua Mountains are the most densely sampled region of Arizona and likely contain at least 2,000 species not yet vouchered in online data. Preliminary estimates for species richness of Arizona are at least 21,000 and likely much higher. Limitations to analyses are discussed which highlight the strong need for more insect occurrence data.

     
    more » « less
  5. BACKGROUND Madagascar is one of the world’s foremost biodiversity hotspots. Its unique assemblage of plants, animals, and fungi—the majority of which evolved on the island and occur nowhere else—is both diverse and threatened. After human arrival, the island’s entire megafauna became extinct, and large portions of the current flora and fauna may be on track for a similar fate. Conditions for the long-term survival of many Malagasy species are not currently met because of multiple anthropogenic threats. ADVANCES We review the extinction risk and threats to biodiversity in Madagascar, using available international assessment data as well as a machine learning analysis to predict the extinction risks and threats to plant species lacking assessments. Our compilation of global International Union for Conservation of Nature (IUCN) Red List assessments shows that overexploitation alongside unsustainable agricultural practices affect 62.1 and 56.8% of vertebrate species, respectively, and each affects nearly 90% of all plant species. Other threats have a relatively minor effect today but are expected to increase in coming decades. Because only one-third (4652) of all Malagasy plant species have been formally assessed, we carried out a neural network analysis to predict the putative status and threats for 5887 unassessed species and to evaluate biases in current assessments. The percentage of plant species currently assessed as under threat is probably representative of actual numbers, except in the case of the ferns and lycophytes, where significantly more species are estimated to be threatened. We find that Madagascar is home to a disproportionately high number of Evolutionarily Distinct and Globally Endangered (EDGE) species. This further highlights the urgency for evidence-based and effective in situ and ex situ conservation. Despite these alarming statistics and trends, we find that 10.4% of Madagascar’s land area is protected and that the network of protected areas (PAs) covers at least part of the range of 97.1% of terrestrial and freshwater vertebrates with known distributions (amphibians, freshwater fishes, reptiles, birds, and mammal species combined) and 67.7% of plant species (for threatened species, the percentages are 97.7% for vertebrates and 79.6% for plants). Complementary to this, ex situ collections hold 18% of vertebrate species and 23% of plant species. Nonetheless, there are still many threatened species that do not occur within PAs and are absent from ex situ collections, including one amphibian, three mammals, and seven reptiles, as well as 559 plants and more yet to be assessed. Based on our updated vegetation map, we find that the current PA network provides good coverage of the major habitats, particularly mangroves, spiny forest, humid forest, and tapia, but subhumid forest and grassland-woodland mosaic have very low areas under protection (5.7 and 1.8% respectively). OUTLOOK Madagascar is among the world’s poorest countries, and its biodiversity is a key resource for the sustainable future and well-being of its citizens. Current threats to Madagascar’s biodiversity are deeply rooted in historical and present social contexts, including widespread inequalities. We therefore propose five opportunities for action to further conservation in a just and equitable way. First, investment in conservation and restoration must be based on evidence and effectiveness and be tailored to meet future challenges through inclusive solutions. Second, expanded biodiversity monitoring, including increased dataset production and availability, is key. Third, improving the effectiveness of existing PAs—for example through community engagement, training, and income opportunities—is more important than creating new ones. Fourth, conservation and restoration should not focus solely on the PA network but should also include the surrounding landscapes and communities. And finally, conservation actions must address the root causes of biodiversity loss, including poverty and food insecurity. In the eyes of much of the world, Madagascar’s biodiversity is a unique global asset that needs saving; in the daily lives of many of the Malagasy people, it is a rapidly diminishing source of the most basic needs for subsistence. Protecting Madagascar’s biodiversity while promoting social development for its people is a matter of the utmost urgency Visual representation of five key opportunities for conserving and restoring Madagascar’s rapidly declining biodiversity identified in this Review. The dashed lines point to representative vegetation types where these recommendations could have tangible effects, but the opportunities are applicable across Madagascar. ILLUSTRATION: INESSA VOET 
    more » « less