skip to main content


Title: Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization
While bees are critical to sustaining a large proportion of global food production, as well as pollinating both wild and cultivated plants, they are decreasing in both numbers and diversity. Our understanding of the factors driving these declines is limited, in part, because we lack sufficient data on the distribution of bee species to predict changes in their geographic range under climate change scenarios. Additionally lacking is adequate data on the behavioral and anatomical traits that may make bees either vulnerable or resilient to human-induced environmental changes, such as habitat loss and climate change. Fortunately, a wealth of associated attributes can be extracted from the specimens deposited in natural history collections for over 100 years. Extending Anthophila Research Through Image and Trait Digitization (Big-Bee) is a newly funded US National Science Foundation Advancing Digitization of Biodiversity Collections project. Over the course of three years, we will create over one million high-resolution 2D and 3D images of bee specimens (Fig. 1), representing over 5,000 worldwide bee species, including most of the major pollinating species. We will also develop tools to measure bee traits from images and generate comprehensive bee trait and image datasets to measure changes through time. The Big-Bee network of participating institutions includes 13 US institutions (Fig. 2) and partnerships with US government agencies. We will develop novel mechanisms for sharing image datasets and datasets of bee traits that will be available through an open, Symbiota-Light (Gilbert et al. 2020) data portal called the Bee Library. In addition, biotic interaction and species association data will be shared via Global Biotic Interactions (Poelen et al. 2014). The Big-Bee project will engage the public in research through community science via crowdsourcing trait measurements and data transcription from images using Notes from Nature (Hill et al. 2012). Training and professional development for natural history collection staff, researchers, and university students in data science will be provided through the creation and implementation of workshops focusing on bee traits and species identification. We are also planning a short, artistic college radio segment called "the Buzz" to get people excited about bees, biodiversity, and the wonders of our natural world.  more » « less
Award ID(s):
2102006 2101851
NSF-PAR ID:
10344374
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; « less
Date Published:
Journal Name:
Biodiversity Information Science and Standards
Volume:
5
ISSN:
2535-0897
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A wealth of information about how parasites interact with their hosts already exists in collections, scientific publications, specialized databases, and grey literature. The US National Science Foundation-funded Terrestrial Parasite Tracker Thematic Collection Network (TPT) project began in 2019 to help build a comprehensive picture of arthropod ectoparasites including the evolution of these parasite-host biotic associations, distributions, and the ecological interactions of disease vectors. TPT is a network of biodiversity collections whose data can assist scientists, educators, land managers, and policymakers to better understand the complex relationship between hosts and parasites including emergent properties that may explain the causes and frequency of human and wildlife pathogens. TPT member collections make their association information easier to access via Global Biotic Interactions (GloBI, Poelen et al. 2014), which is periodically archived through Zenodo to track progress in the TPT project. TPT leverages GloBI's ability to index biotic associations from specimen occurrence records that come from existing management systems (e.g., Arctos, Symbiota, EMu, Excel, MS Access) to avoid having to completely rework existing, or build new, cyber-infrastructures before collections can share data. TPT-affiliated collection managers use collection-specific translation tables to connect their verbatim (or original) terms used to describe associations (e.g., "ex", "found on", "host") to their interpreted, machine-readable terms in the OBO Relations Ontology (RO). These interpreted terms enable searches across previously siloed association record sets, while the original verbatim values remain accessible to help retain provenance and allow for interpretation improvements. TPT is an ambitious project, with the goal to database label data from over 1.2 million specimens of arthropod parasites of vertebrates coming from 22 collections across North America. In the first year of the project, the TPT collections created over 73,700 new records and 41,984 images. In addition, 17 TPT data providers and three other collaborators shared datasets that are now indexed by GloBI, visible on the TPT GloBI project page. These datasets came from collection specimen occurrence records and literature sources. Two TPT data archives that capture and preserve the changes in the data coming from TPT to GloBI were published through Zenodo (Poelen et al. 2020a, Poelen et al. 2020b). The archives document the changes in how data are shared by collections including the biotic association data format and quantity of data captured. The Poelen et al. 2020b report included all TPT collections and biotic interactions from Arctos collections in VertNet and the Symbiota Collection of Arthropods Network (SCAN). The total number of interactions included in this report was 376,671 records (500,000 interactions is the overall goal for TPT). In addition, close coordination with TPT collection data managers including many one-on-one conversations, a workshop, and a webinar (Sullivan et al. 2020) was conducted to help guide the data capture of biotic associations. GloBI is an effective tool to help integrate biotic association data coming from occurrence records into an openly accessible, global, linked view of existing species interaction records. The results gleaned from the TPT workshop and Zenodo data archives demonstrate that minimizing changes to existing workflows allow for custom interpretation of collection-specific interaction terms. In addition, including collection data managers in the development of the interaction term vocabularies is an important part of the process that may improve data sharing and the overall downstream data quality. 
    more » « less
  2. PLEASE CONTACT AUTHORS IF YOU CONTRIBUTE AND WOULD LIKE TO BE LISTED AS A CO-AUTHOR. (this message will be removed some time weeks/months after the first publication)

    Terrestrial Parasite Tracker indexed biotic interactions and review summary.

    The Terrestrial Parasite Tracker (TPT) project began in 2019 and is funded by the National Science foundation to mobilize data from vector and ectoparasite collections to data aggregators (e.g., iDigBio, GBIF) to help build a comprehensive picture of arthropod host-association evolution, distributions, and the ecological interactions of disease vectors which will assist scientists, educators, land managers, and policy makers. Arthropod parasites often are important to human and wildlife health and safety as vectors of pathogens, and it is critical to digitize these specimens so that they, and their biotic interaction data, will be available to help understand and predict the spread of human and wildlife disease.

    This data publication contains versioned TPT associated datasets and related data products that were tracked, reviewed and indexed by Global Biotic Interactions (GloBI) and associated tools. GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open source software.

    If you have questions or comments about this publication, please open an issue at https://github.com/ParasiteTracker/tpt-reporting or contact the authors by email.

    Funding:
    The creation of this archive was made possible by the National Science Foundation award "Collaborative Research: Digitization TCN: Digitizing collections to trace parasite-host associations and predict the spread of vector-borne disease," Award numbers DBI:1901932 and DBI:1901926

    References:
    Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

    GloBI Data Review Report

    Datasets under review:
     - University of Michigan Museum of Zoology Insect Division. Full Database Export 2020-11-20 provided by Erika Tucker and Barry Oconner. accessed via https://github.com/EMTuckerLabUMMZ/ummzi/archive/6731357a377e9c2748fc931faa2ff3dc0ce3ea7a.zip on 2022-06-24T14:02:48.801Z
     - Academy of Natural Sciences Entomology Collection for the Parasite Tracker Project accessed via https://github.com/globalbioticinteractions/ansp-para/archive/5e6592ad09ec89ba7958266ad71ec9d5d21d1a44.zip on 2022-06-24T14:04:22.091Z
     - Bernice Pauahi Bishop Museum, J. Linsley Gressitt Center for Research in Entomology accessed via https://github.com/globalbioticinteractions/bpbm-ent/archive/c085398dddd36f8a1169b9cf57de2a572229341b.zip on 2022-06-24T14:04:37.692Z
     - Texas A&M University, Biodiversity Teaching and Research Collections accessed via https://github.com/globalbioticinteractions/brtc-para/archive/f0a718145b05ed484c4d88947ff712d5f6395446.zip on 2022-06-24T14:06:40.154Z
     - Brigham Young University Arthropod Museum accessed via https://github.com/globalbioticinteractions/byu-byuc/archive/4a609ac6a9a03425e2720b6cdebca6438488f029.zip on 2022-06-24T14:06:51.420Z
     - California Academy of Sciences Entomology accessed via https://github.com/globalbioticinteractions/cas-ent/archive/562aea232ec74ab615f771239451e57b057dc7c0.zip on 2022-06-24T14:07:16.371Z
     - Clemson University Arthropod Collection accessed via https://github.com/globalbioticinteractions/cu-cuac/archive/6cdcbbaa4f7cec8e1eac705be3a999bc5259e00f.zip on 2022-06-24T14:07:40.925Z
     - Denver Museum of Nature and Science (DMNS) Parasite specimens (DMNS:Para) accessed via https://github.com/globalbioticinteractions/dmns-para/archive/a037beb816226eb8196533489ee5f98a6dfda452.zip on 2022-06-24T14:08:00.730Z
     - Field Museum of Natural History IPT accessed via https://github.com/globalbioticinteractions/fmnh/archive/6bfc1b7e46140e93f5561c4e837826204adb3c2f.zip on 2022-06-24T14:18:51.995Z
     - Illinois Natural History Survey Insect Collection accessed via https://github.com/globalbioticinteractions/inhs-insects/archive/38692496f590577074c7cecf8ea37f85d0594ae1.zip on 2022-06-24T14:19:37.563Z
     - UMSP / University of Minnesota / University of Minnesota Insect Collection accessed via https://github.com/globalbioticinteractions/min-umsp/archive/3f1b9d32f947dcb80b9aaab50523e097f0e8776e.zip on 2022-06-24T14:20:27.232Z
     - Milwaukee Public Museum Biological Collections Data Portal accessed via https://github.com/globalbioticinteractions/mpm/archive/9f44e99c49ec5aba3f8592cfced07c38d3223dcd.zip on 2022-06-24T14:20:46.185Z
     - Museum for Southern Biology (MSB) Parasite Collection accessed via https://github.com/globalbioticinteractions/msb-para/archive/178a0b7aa0a8e14b3fe953e770703fe331eadacc.zip on 2022-06-24T15:16:07.223Z
     - The Albert J. Cook Arthropod Research Collection accessed via https://github.com/globalbioticinteractions/msu-msuc/archive/38960906380443bd8108c9e44aeff4590d8d0b50.zip on 2022-06-24T16:09:40.702Z
     - Ohio State University Acarology Laboratory accessed via https://github.com/globalbioticinteractions/osal-ar/archive/876269d66a6a94175dbb6b9a604897f8032b93dd.zip on 2022-06-24T16:10:00.281Z
     - Frost Entomological Museum, Pennsylvania State University accessed via https://github.com/globalbioticinteractions/psuc-ento/archive/30b1f96619a6e9f10da18b42fb93ff22cc4f72e2.zip on 2022-06-24T16:10:07.741Z
     - Purdue Entomological Research Collection accessed via https://github.com/globalbioticinteractions/pu-perc/archive/e0909a7ca0a8df5effccb288ba64b28141e388ba.zip on 2022-06-24T16:10:26.654Z
     - Texas A&M University Insect Collection accessed via https://github.com/globalbioticinteractions/tamuic-ent/archive/f261a8c192021408da67c39626a4aac56e3bac41.zip on 2022-06-24T16:10:58.496Z
     - University of California Santa Barbara Invertebrate Zoology Collection accessed via https://github.com/globalbioticinteractions/ucsb-izc/archive/825678ad02df93f6d4469f9d8b7cc30151b9aa45.zip on 2022-06-24T16:12:29.854Z
     - University of Hawaii Insect Museum accessed via https://github.com/globalbioticinteractions/uhim/archive/53fa790309e48f25685e41ded78ce6a51bafde76.zip on 2022-06-24T16:12:41.408Z
     - University of New Hampshire Collection of Insects and other Arthropods UNHC-UNHC accessed via https://github.com/globalbioticinteractions/unhc/archive/f72575a72edda8a4e6126de79b4681b25593d434.zip on 2022-06-24T16:12:59.500Z
     - Scott L. Gardner and Gabor R. Racz (2021). University of Nebraska State Museum - Parasitology. Harold W. Manter Laboratory of Parasitology. University of Nebraska State Museum. accessed via https://github.com/globalbioticinteractions/unl-nsm/archive/6bcd8aec22e4309b7f4e8be1afe8191d391e73c6.zip on 2022-06-24T16:13:06.914Z
     - Data were obtained from specimens belonging to the United States National Museum of Natural History (USNM), Smithsonian Institution, Washington DC and digitized by the Walter Reed Biosystematics Unit (WRBU). accessed via https://github.com/globalbioticinteractions/usnmentflea/archive/ce5cb1ed2bbc13ee10062b6f75a158fd465ce9bb.zip on 2022-06-24T16:13:38.013Z
     - US National Museum of Natural History Ixodes Records accessed via https://github.com/globalbioticinteractions/usnm-ixodes/archive/c5fcd5f34ce412002783544afb628a33db7f47a6.zip on 2022-06-24T16:13:45.666Z
     - Price Institute of Parasite Research, School of Biological Sciences, University of Utah accessed via https://github.com/globalbioticinteractions/utah-piper/archive/43da8db550b5776c1e3d17803831c696fe9b8285.zip on 2022-06-24T16:13:54.724Z
     - University of Wisconsin Stevens Point, Stephen J. Taft Parasitological Collection accessed via https://github.com/globalbioticinteractions/uwsp-para/archive/f9d0d52cd671731c7f002325e84187979bca4a5b.zip on 2022-06-24T16:14:04.745Z
     - Giraldo-Calderón, G. I., Emrich, S. J., MacCallum, R. M., Maslen, G., Dialynas, E., Topalis, P., … Lawson, D. (2015). VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic acids research, 43(Database issue), D707–D713. doi:10.1093/nar/gku1117. accessed via https://github.com/globalbioticinteractions/vectorbase/archive/00d6285cd4e9f4edd18cb2778624ab31b34b23b8.zip on 2022-06-24T16:14:11.965Z
     - WIRC / University of Wisconsin Madison WIS-IH / Wisconsin Insect Research Collection accessed via https://github.com/globalbioticinteractions/wis-ih-wirc/archive/34162b86c0ade4b493471543231ae017cc84816e.zip on 2022-06-24T16:14:29.743Z
     - Yale University Peabody Museum Collections Data Portal accessed via https://github.com/globalbioticinteractions/yale-peabody/archive/43be869f17749d71d26fc820c8bd931d6149fe8e.zip on 2022-06-24T16:23:29.289Z

    Generated on:
    2022-06-24

    by:
    GloBI's Elton 0.12.4 
    (see https://github.com/globalbioticinteractions/elton).

    Note that all files ending with .tsv are files formatted 
    as UTF8 encoded tab-separated values files.

    https://www.iana.org/assignments/media-types/text/tab-separated-values


    Included in this review archive are:

    README:
      This file.

    review_summary.tsv:
      Summary across all reviewed collections of total number of distinct review comments.

    review_summary_by_collection.tsv:
      Summary by reviewed collection of total number of distinct review comments.

    indexed_interactions_by_collection.tsv: 
      Summary of number of indexed interaction records by institutionCode and collectionCode.

    review_comments.tsv.gz:
      All review comments by collection.

    indexed_interactions_full.tsv.gz:
      All indexed interactions for all reviewed collections.

    indexed_interactions_simple.tsv.gz:
      All indexed interactions for all reviewed collections selecting only sourceInstitutionCode, sourceCollectionCode, sourceCatalogNumber, sourceTaxonName, interactionTypeName and targetTaxonName.

    datasets_under_review.tsv:
      Details on the datasets under review.

    elton.jar: 
      Program used to update datasets and generate the review reports and associated indexed interactions.

    datasets.zip:
      Source datasets used by elton.jar in process of executing the generate_report.sh script.

    generate_report.sh:
      Program used to generate the report

    generate_report.log:
      Log file generated as part of running the generate_report.sh script
     

     
    more » « less
  3. Over 300 million arthropod specimens are housed in North American natural history collections. These collections represent a “vast hidden treasure trove” of biodiversity −95% of the specimen label data have yet to be transcribed for research, and less than 2% of the specimens have been imaged. Specimen labels contain crucial information to determine species distributions over time and are essential for understanding patterns of ecology and evolution, which will help assess the growing biodiversity crisis driven by global change impacts. Specimen images offer indispensable insight and data for analyses of traits, and ecological and phylogenetic patterns of biodiversity. Here, we review North American arthropod collections using two key metrics, specimen holdings and digitization efforts, to assess the potential for collections to provide needed biodiversity data. We include data from 223 arthropod collections in North America, with an emphasis on the United States. Our specific findings are as follows: (1) The majority of North American natural history collections (88%) and specimens (89%) are located in the United States. Canada has comparable holdings to the United States relative to its estimated biodiversity. Mexico has made the furthest progress in terms of digitization, but its specimen holdings should be increased to reflect the estimated higher Mexican arthropod diversity. The proportion of North American collections that has been digitized, and the number of digital records available per species, are both much lower for arthropods when compared to chordates and plants. (2) The National Science Foundation’s decade-long ADBC program (Advancing Digitization of Biological Collections) has been transformational in promoting arthropod digitization. However, even if this program became permanent, at current rates, by the year 2050 only 38% of the existing arthropod specimens would be digitized, and less than 1% would have associated digital images. (3) The number of specimens in collections has increased by approximately 1% per year over the past 30 years. We propose that this rate of increase is insufficient to provide enough data to address biodiversity research needs, and that arthropod collections should aim to triple their rate of new specimen acquisition. (4) The collections we surveyed in the United States vary broadly in a number of indicators. Collectively, there is depth and breadth, with smaller collections providing regional depth and larger collections providing greater global coverage. (5) Increased coordination across museums is needed for digitization efforts to target taxa for research and conservation goals and address long-term data needs. Two key recommendations emerge: collections should significantly increase both their specimen holdings and their digitization efforts to empower continental and global biodiversity data pipelines, and stimulate downstream research. 
    more » « less
  4. Large systematic revisionary projects incorporating data for hundreds or thousands of taxa require an integrative approach, with a strong biodiversity-informatics core for efficient data management to facilitate research on the group. Our original biodiversity informatics platform, 3i (Internet-accessible Interactive Identification) combined a customized MS Access database backend with ASP-based web interfaces to support revisionary syntheses of several large genera of leafhopers (Hemiptera: Auchenorrhyncha: Cicadellidae). More recently, for our National Science Foundation sponsored project, “GoLife: Collaborative Research: Integrative genealogy, ecology and phenomics of deltocephaline leafhoppers (Hemiptera: Cicadellidae), and their microbial associates”, we selected the new open-source platform TaxonWorks as the cyberinfrastructure. In the scope of the project, the original “3i World Auchenorrhyncha Database” was imported into TaxonWorks. At the present time, TaxonWorks has many tools to automatically import nomenclature, citations, and specimen based collection data. At the time of the initial migration of the 3i database, many of those tools were still under development, and complexity of the data in the database required a custom migration script, which is still probably the most efficient solution for importing datasets with long development history. At the moment, the World Auchenorrhyncha Database comprehensively covers nomenclature of the group and includes data on 70 valid families, 6,816 valid genera, 47,064 valid species as well as synonymy and subsequent combinations (Fig. 1). In addition, many taxon records include the original citation, bibliography, type information, etymology, etc. The bibliography of the group includes 37,579 sources, about 1/3 of which are associated with PDF files. Species have distribution records, either derived from individual specimens or as country and state level asserted distribution, as well as biological associations indicating host plants, predators, and parasitoids. Observation matrices in TaxonWorks are designed to handle morphological data associated with taxa or specimens. The matrices may be used to automatically generate interactive identification keys and taxon descriptions. They can also be downloaded to be imported, for example, into Lucid builder, or to perform phylogenetic analysis using an external application. At the moment there are 36 matrices associated with the project. The observation matrix from GoLife project covers 798 taxa by 210 descriptors (most of which are qualitative multi-state morphological descriptors) (Fig. 2). Illustrations are provided for 9,886 taxa and organized in the specialized image matrix and could be used as a pictorial key for determination of species and taxa of a higher rank. For the phylogenetic analysis, a dataset was constructed for 730 terminal taxa and >160,000 nucleotide positions obtained using anchored hybrid enrichment of genomic DNA for a sample of leafhoppers from the subfamily Deltocephalinae and outgroups. The probe kit targets leafhopper genes, as well as some bacterial genes (endosymbionts and plant pathogens transmitted by leafhoppers). The maximum likelihood analyses of concatenated nucleotide and amino acid sequences as well as coalescent gene tree analysis yielded well-resolved phylogenetic trees (Cao et al. 2022). Raw sequence data have been uploaded to the Sequence Read Archive on GenBank. Occurrence and morphological data, as well as diagnostic images, for voucher specimens have been incorporated into TaxonWorks. Data in TaxonWorks could be exported in raw format, get accessed via Application Programming Interface (API), or be shared with external data aggregators like Catalogue of Life, GBIF, iDigBio. 
    more » « less
  5. Native bee species in the United States provide invaluable pollination services. Concerns about native bee declines are growing, and there are calls for a national monitoring program. Documenting species ranges at ecologically meaningful scales through coverage completeness analysis is a fundamental step to track bees from species to communities. It may take decades before all existing bee specimens are digitized, so projections are needed now to focus future research and management efforts. From 1.923 million records, we created range maps for nearly 88% (3158 species) of bee species in the contiguous United States, provided the first analysis of inventory completeness for digitized specimens of a major insect clade, and perhaps most important, estimated spatial completeness accounting for all known bee specimens in USA collections, including undigitized bee specimens. Completeness analyses were very low (3–37%) across four examined spatial resolutions when using the currently available bee specimen records. Adding a subset of observations from community science data sources did not significantly increase completeness, and adding a projected 4.7 million undigitized specimens increased completeness by only an additional 12–13%. Assessments of data, including projected specimen records, indicate persistent taxonomic and geographic deficiencies. In conjunction with expedited digitization, new inventories that integrate community science data with specimen‐based documentation will be required to close these gaps. A combined effort involving both strategic inventories and accelerated digitization campaigns is needed for a more complete understanding of USA bee distributions. 
    more » « less