The Global Biodiversity Information Facility (GBIF 2022a) has indexed more than 2 billion occurrence records from 70,147 datasets. These datasets often include "hidden" biotic interaction data because biodiversity communities use the Darwin Core standard (DwC, Wieczorek et al. 2012) in different ways to document biotic interactions. In this study, we extracted biotic interactions from GBIF data using an approach similar to that employed in the Global Biotic Interactions (GloBI; Poelen et al. 2014) and summarized the results. Here we aim to present an estimation of the interaction data available in GBIF, showing that biotic interaction claims can be automatically found and extracted from GBIF. Our results suggest that much can be gained by an increased focus on development of tools that help to index and curate biotic interaction data in existing datasets. Combined with data standardization and best practices for sharing biotic interactions, such as the initiative on plant-pollinators interaction (Salim 2022), this approach can rapidly contribute to and meet open data principles (Wilkinson 2016). We used Preston (Elliott et al. 2020), open-source software that versions biodiversity datasets, to copy all GBIF-indexed datasets. The biodiversity data graph version (Poelen 2020) of the GBIF-indexed datasets used during this study contains 58,504 datasets in Darwin Core Archive (DwC-A) format, totaling 574,715,196 records. After retrieval and verification, the datasets were processed using Elton. Elton extracts biotic interaction data and supports 20+ existing file formats, including various types of data elements in DwC records. Elton also helps align interaction claims (e.g., host of, parasite of, associated with) to the Relations Ontology (RO, Mungall 2022), making it easier to discover datasets across a heterogeneous collection of datasets. Using specific mapping between interaction claims found in the DwC records to the terms in RO*1, Elton found 30,167,984 potential records (with non-empty values for the scanned DwC terms) and 15,248,478 records with recognized interaction types. Taxonomic name validation was performed using Nomer, which maps input names to names found in a variety of taxonomic catalogs. We only considered an interaction record valid where the interaction type could be mapped to a term in RO and where Nomer found a valid name for source and target taxa. Based on the workflow described in Fig. 1, we found 7,947,822 interaction records (52% of the potential interactions). Most of them were generic interactions ( interacts_ with , 87.5%), but the remaining 12.5% (993,477 records) included host-parasite and plant-animal interactions. The majority of the interactions records found involved plants (78%), animals (14%) and fungi (6%). In conclusion, there are many biotic interactions embedded in existing datasets registered in large biodiversity data indexers and aggregators like iDigBio, GBIF, and BioCASE. We exposed these biotic interaction claims using the combined functionality of biodiversity data tools Elton (for interaction data extraction), Preston (for reliable dataset tracking) and Nomer (for taxonomic name alignment). Nonetheless, the development of new vocabularies, standards and best practice guides would facilitate aggregation of interaction data, including the diversification of the GBIF data model (GBIF 2022b) for sharing biodiversity data beyond occurrences data. That is the aim of the TDWG Interest Group on Biological Interactions Data (TDWG 2022). more »« less
Seltmann, Katja; Poelen, Jorrit; Sullivan, Kathryn; Zaspel, Jennifer
(, Biodiversity Information Science and Standards)
null
(Ed.)
A wealth of information about how parasites interact with their hosts already exists in collections, scientific publications, specialized databases, and grey literature. The US National Science Foundation-funded Terrestrial Parasite Tracker Thematic Collection Network (TPT) project began in 2019 to help build a comprehensive picture of arthropod ectoparasites including the evolution of these parasite-host biotic associations, distributions, and the ecological interactions of disease vectors. TPT is a network of biodiversity collections whose data can assist scientists, educators, land managers, and policymakers to better understand the complex relationship between hosts and parasites including emergent properties that may explain the causes and frequency of human and wildlife pathogens. TPT member collections make their association information easier to access via Global Biotic Interactions (GloBI, Poelen et al. 2014), which is periodically archived through Zenodo to track progress in the TPT project. TPT leverages GloBI's ability to index biotic associations from specimen occurrence records that come from existing management systems (e.g., Arctos, Symbiota, EMu, Excel, MS Access) to avoid having to completely rework existing, or build new, cyber-infrastructures before collections can share data. TPT-affiliated collection managers use collection-specific translation tables to connect their verbatim (or original) terms used to describe associations (e.g., "ex", "found on", "host") to their interpreted, machine-readable terms in the OBO Relations Ontology (RO). These interpreted terms enable searches across previously siloed association record sets, while the original verbatim values remain accessible to help retain provenance and allow for interpretation improvements. TPT is an ambitious project, with the goal to database label data from over 1.2 million specimens of arthropod parasites of vertebrates coming from 22 collections across North America. In the first year of the project, the TPT collections created over 73,700 new records and 41,984 images. In addition, 17 TPT data providers and three other collaborators shared datasets that are now indexed by GloBI, visible on the TPT GloBI project page. These datasets came from collection specimen occurrence records and literature sources. Two TPT data archives that capture and preserve the changes in the data coming from TPT to GloBI were published through Zenodo (Poelen et al. 2020a, Poelen et al. 2020b). The archives document the changes in how data are shared by collections including the biotic association data format and quantity of data captured. The Poelen et al. 2020b report included all TPT collections and biotic interactions from Arctos collections in VertNet and the Symbiota Collection of Arthropods Network (SCAN). The total number of interactions included in this report was 376,671 records (500,000 interactions is the overall goal for TPT). In addition, close coordination with TPT collection data managers including many one-on-one conversations, a workshop, and a webinar (Sullivan et al. 2020) was conducted to help guide the data capture of biotic associations. GloBI is an effective tool to help integrate biotic association data coming from occurrence records into an openly accessible, global, linked view of existing species interaction records. The results gleaned from the TPT workshop and Zenodo data archives demonstrate that minimizing changes to existing workflows allow for custom interpretation of collection-specific interaction terms. In addition, including collection data managers in the development of the interaction term vocabularies is an important part of the process that may improve data sharing and the overall downstream data quality.
Introduction This archive includes a tab-delimited (tsv) and comma-delimited (csv) version of the Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). Discover Life is an important resource for bee species names and this update is from Draft-55, November 2020. Data were accessed and transformed into a tsv file in August 2023 using Global Biotic Interactions (GloBI) nomer software. GloBI now incorporates the Discover Life bee species guide and world checklist in its functionality for searching for bee interactions. Update! New Dataset also includes Subgenera Names A new, tab-delimited version of the Discover Life taxonomy as derived from Dorey et. al, 2023 can be found via Zenodo at https://doi.org/10.5281/zenodo.10463762. This version of the Discover Life world species guide and checklist includes subgeneric names. Citation Please cite the original source for this data as: Ascher, J. S. and J. Pickering. 2022.Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila).http://www.discoverlife.org/mp/20q?guide=Apoidea_species Draft-56, 21 August, 2022 nomer nomer is a command-line application for working with taxonomic resources offline. nomer incorporates many of the present taxonomic catalogs (e.g., catalog of life, ITIS, EOL, NCBI) and provides simple tools for comparing between resources or resolving taxonomic names based on one or more taxonomic name catalogs. Discover Life is in nomer version 0.5.1 and this full dataset can be recreated by installing nomer from https://github.com/globalbioticinteractions/nomer and running $ nomer list discoverlife > discoverlife.tsv Data Columns Discover Life provides a world name checklist and includes other names (synonyms and homonyms) that refer to the same species. In the tsv file, the provided name is both the accepted, or checklist name, or "other name." All names will be listed as a providedName. Below is an example subset of the transformed version of the data. providedExternalId= link to name on Discover Life providedName=an accepted or "other name" in the Discover Life bee checklist. "Other names" can be synonyms or homonyms. providedAuthorship=authorship for the providedName providedRank=rank of the providedName providedPath=higher taxonomy of the providedName. This will be the same as the accepted name or resolvedName relationName=relationship between the "other name" and the bee name in the Discover Life checklist. It may include itself resolvedExternalID=an accepted name in the Discover Life bee checklist resolvedExternalId=link to name on Discover Life resolvedAuthorship=authorship of the accepted, or checklist name resolvedRank=rank of the accepted, or checklist name resolvedPath=higher taxonomy of the accepted, or checklist name Changes No major changes to format in this version. References Jorrit Poelen, & José Augusto Salim. (2022). globalbioticinteractions/nomer: (0.2.11). Zenodo. https://doi.org/10.5281/zenodo.6128011 Poelen JH, Simons JD and Mungall CH. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005. Seltmann KC, Allen J, Brown BV, Carper A, Engel MS, Franz N, Gilbert E, Grinter C, Gonzalez VH, Horsley P, Lee S, Maier C, Miko I, Morris P, Oboyski P, Pierce NE, Poelen J, Scott VL, Smith M, Talamas EJ, Tsutsui ND, Tucker E (2021) Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards 5: e74037. https://doi.org/10.3897/biss.5.74037 Dorey, J.B., Fischer, E.E., Chesshire, P.R. et al. A globally synthesised and flagged bee occurrence dataset and cleaning workflow. Sci Data 10, 747 (2023). https://doi.org/10.1038/s41597-023-02626-w
{"Abstract":["Extending Anthophila research through image and trait digitization (Big-Bee) indexed biotic interactions and review summary. Declining populations of bees impact plant-pollinator interactions in both natural and agricultural systems. While bees and other insects pollinate most wild plants and are critical to sustaining a large proportion of global food production, they are decreasing in both numbers and diversity. Our understanding of the factors driving these declines is limited because we lack sufficient data on the distribution of bee species, and on the behavioral and anatomical traits that may make them either vulnerable or resilient to human-induced environmental changes, such as habitat loss and climate change. Fortunately, wild bees have been collected by researchers and deposited in natural history collections for over 100 years, retaining a wealth of associated attributes that can be extracted from specimen images. This project will digitally capture data and images from these historic specimens, develop tools to measure bee traits from these images and generate a comprehensive bee trait and image dataset to measure changes through time. This will increase our understanding of specific traits that put bee species at risk of decline - a critical need for both sustaining our agricultural economy and the conservation of our natural resources. In addition, the large image datasets created by this project can be used for new artificial intelligence identification tools that will help improve our future pollinator observation and monitoring efforts. The Big-Bee project began in 2021 and is funded by the National Science Foundation to mobilize data about worldwide bee species to data aggregators (e.g., iDigBio, GBIF). The Big-Bee Thematic Collection Network (Big-Bee) will create over one million high-resolution 2D and 3D images of bee specimens, representing over 5,000 worldwide bee species, including all of the major pollinating species of the United States. The Big-Bee network includes 13 institutions and partnerships with US government agencies. Novel mechanisms for sharing image datasets will be developed and datasets of bee traits will be available through an open data portal, the Bee Library, for research and education. The Big-Bee project will engage the general public in research through community science via crowdsourcing trait measurements and data transcription from images. In addition, training and professional development for natural history collection staff, researchers, and university students in data science will be provided through the creation and implementation of workshops focusing on bee traits and species identification. All data resulting from this award will be shared with and publicly available through the national digitized biocollections resource, iDigBio.org. This is the first archive of Big-Bee data indexed by Global Biotic Interactions (GloBI). GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open-source software. This version of the Big Bee dataset includes interactions that are not just bees. Also in this version, the datasets included in this publication are specifically those institutions in the Big Bee project network and do not represent all bee interaction data found at Global Biotic Interactions. Bee Library Information - Statistics about Big Bee data providers The specimens indexed by GloBI are also found in the Bee Library. To date, the number of specimens and images in the library are listed below. The Bee Library taxonomic backbone is not yet complete, so information regarding the number of species is not yet available. Further summary statistics are available in the Big Bee Metrics from the Bee Library and GloBI - July 24, 2023.pdf file. From Bee Library (partner indexed records) 1,234,107 occurrence records 993,692 (81%) georeferenced 351,592 (28%) occurrences imaged 986,323 (80%) identified to species 9 families 526 genera 10,700 species 11,386 total taxa (including subsp. and var.) Statistics Per Collection Collection Occurrences Georeferenced Imaged Interactions Indexed in GloBI (all) Interactions Indexed in GloBI (bees) ASU Hasbrouck Insect Collection - Bee Records 13223 13221 2352 21300 3834 Bee Biology and Systematics Laboratory, USDA-ARS Pollinating Insect-Biology, Management, Systematics Research 561820 547461 0 0 0 California Academy of Sciences 884 300 3 16984 117 California Academy of Sciences - Type Collection 1838 59 83 0 0 Essig Museum of Entomology, University of California Berkeley 58551 55028 0 0 Florida State Collection of Arthropods 17134 12349 7816 559 Museum of Comparative Zoology, Harvard University 22020 21099 11595 6777 1535 Natural History Museum of Los Angeles County 24685 7421 3480 0 0 San Diego Natural History Museum Entomology Department 4065 1690 1982 8688 90 University of California Santa Barbara Invertebrate Zoology Collection 8674 8410 2751 1940 660 University of Colorado Museum of Natural History, Entomology Collection 18043 18043 0 9589 4723 University of Kansas Natural History Museum Entomology Division 464927 275200 304415 119963 112677 University of Michigan Museum of Zoology Division of Insects 17764 15305 15269 53755 4134 University of New Hampshire, Donald S. Chandler Entomological Collection 17685 17393 0 3137 3137 USGS Native Bee Inventory and Monitoring Lab 101 101 0 0 0 GloBI Data Review Report - Datasets in Review from Global Biotic Interactions Datasets under review: - UUniversity of Michigan Museum of Zoology, Division of Insects accessed via https://github.com/globalbioticinteractions/ummz-ummzi/archive/d9282e51f29f3157af2e5869a09ea8a111ddea34.zip on 2023-07-24T22:06:08.671Z - Arizona State University Hasbrouck Insect Collection accessed via https://github.com/globalbioticinteractions/asu-asuhic/archive/4ed77cb9ca8e526269d4678692e2844c950022f8.zip on 2023-07-24T22:07:09.630Z - California Academy of Sciences Entomology and Entomology Type Collection accessed via https://github.com/globalbioticinteractions/cas-ent/archive/47d385b73a63aa379cd5e6d3615005ba78b0ffc1.zip on 2023-07-24T22:08:13.753Z - University of California Berkeley, Essig Museum of Entomology accessed via https://github.com/globalbioticinteractions/emec/archive/93b17a3db566baa001ce9190e6fbdb60fa99dda4.zip on 2023-07-24T22:08:24.495Z - Florida State Collection of Arthropods accessed via https://github.com/globalbioticinteractions/fsca/archive/2cdcf9475b7e0ef2a728a96535608bc0ce2ac5ca.zip on 2023-07-24T22:08:49.972Z - University of Kansas Natural History Museum accessed via https://github.com/globalbioticinteractions/ku-semc/archive/a9c7cb81050eef68b4428667206a219da458f517.zip on 2023-07-24T22:09:17.016Z - Natural History Museum of Los Angeles County accessed via https://github.com/globalbioticinteractions/lacm-lacmec/archive/dafbf532c53fbadba126c81186c26d52677aa781.zip on 2023-07-24T22:11:11.442Z - Harvard University M, Morris P J (2021). Museum of Comparative Zoology, Harvard University. Museum of Comparative Zoology, Harvard University. accessed via https://github.com/globalbioticinteractions/mcz/archive/b33635a9fc75fd7931ad968cbc11180e6467bfd7.zip on 2023-07-24T22:21:32.961Z - San Diego Natural History Museum accessed via https://github.com/globalbioticinteractions/sdnhm-sdmc/archive/7238d8b804f543250eb487b43144e1125fb3688a.zip on 2023-07-24T22:26:25.503Z - University of Colorado Museum of Natural History Entomology Collection accessed via https://github.com/globalbioticinteractions/ucm-ucmc/archive/60530dcc82d33c9675a4026ad60dc40bea8f2a91.zip on 2023-07-24T22:26:50.178Z - University of California Santa Barbara Invertebrate Zoology Collection accessed via https://github.com/globalbioticinteractions/ucsb-izc/archive/66a4e39589d1dfa299d07985546c4be522ff60d8.zip on 2023-07-24T22:27:13.801Z - University of New Hampshire Donald S. Chandler Entomological Collection accessed via https://github.com/globalbioticinteractions/unhc-unhc/archive/d7668a6bb4545dc4da0645ecc383169ba547b0f5.zip on 2023-07-24T22:27:28.670Z Generated on: 2023-07-24 by: GloBI's Elton 0.12.6 (see https://github.com/globalbioticinteractions/elton). Note that all files ending with .tsv are files formatted as UTF8 encoded tab-separated values files. https://www.iana.org/assignments/media-types/text/tab-separated-values Included in this review archive are: README: This file. review_summary.tsv: Summary across all reviewed collections of total number of distinct review comments. review_summary_by_collection.tsv: Summary by reviewed collection of total number of distinct review comments. indexed_interactions_by_collection.tsv: Summary of number of indexed interaction records by institutionCode and collectionCode. review_comments.tsv.gz: All review comments by collection. indexed_interactions_full.tsv.gz: All indexed interactions for all reviewed collections. indexed_interactions_simple.tsv.gz: All indexed interactions for all reviewed collections selecting only sourceInstitutionCode, sourceCollectionCode, sourceCatalogNumber, sourceTaxonName, interactionTypeName and targetTaxonName. datasets_under_review.tsv: Details on the datasets under review. elton.jar: Program used to update datasets and generate the review reports and associated indexed interactions. indexed_interactions_bees.tsv: All indexed bee interactions datasets.zip: All datasets reviewed for this publication Big Bee Metrics from the Bee Library and GloBI - July 24, 2023.pdf: Summary statistics from the Bee Library and GloBI about data partners If you have questions or comments about this publication, please open an issue at https://github.com/Big-Bee-Network/issues-observations-and-questions/discussions or contact the authors by email. Funding: The creation of this archive was made possible by the National Science Foundation award Collaborative Research: Digitization TCN: Extending Anthophila research through image and trait digitization (Big-Bee). Award numbers: DBI:2102006, DBI:2101929, DBI:2101908, DBI:2101876, DBI:2101875, DBI:2101851, DBI:2101345, DBI:2101913, DBI:2101891 and DBI:2101850. References: Poelen JH, Simons JD and Mungall CH. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005. Seltmann KC, Allen J, Brown BV, Carper A, Engel MS, Franz N, Gilbert E, Grinter C, Gonzalez VH, Horsley P, Lee S, Maier C, Miko I, Morris P, Oboyski P, Pierce NE, Poelen J, Scott VL, Smith M, Talamas EJ, Tsutsui ND, Tucker E (2021) Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards 5: e74037. https://doi.org/10.3897/biss.5.74037 Jorrit Poelen, Tobias Kuhn, & Katrin Leinweber. (2022). globalbioticinteractions/elton: 0.12.5 (0.12.5). Zenodo. https://doi.org/10.5281/zenodo.7267926"],"Other":["{"references": ["Seltmann KC, Allen J, Brown BV, Carper A, Engel MS, Franz N, Gilbert E, Grinter C, Gonzalez VH, Horsley P, Lee S, Maier C, Miko I, Morris P, Oboyski P, Pierce NE, Poelen J, Scott VL, Smith M, Talamas EJ, Tsutsui ND, Tucker E (2021) Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards 5: e74037. https://doi.org/10.3897/biss.5.74037", "Poelen JH, Simons JD and Mungall CH. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.", "Jorrit Poelen, Tobias Kuhn, & Katrin Leinweber. (2022). globalbioticinteractions/elton: (0.12.4). Zenodo. https://doi.org/10.5281/zenodo.6385185"]}"]}
{"Abstract":["Extending Anthophila research through image and trait digitization (Big-Bee) indexed biotic interactions and review summary. Declining populations of bees impact plant-pollinator interactions in both natural and agricultural systems. While bees and other insects pollinate most wild plants and are critical to sustaining a large proportion of global food production, they are decreasing in both numbers and diversity. Our understanding of the factors driving these declines is limited because we lack sufficient data on the distribution of bee species, and on the behavioral and anatomical traits that may make them either vulnerable or resilient to human-induced environmental changes, such as habitat loss and climate change. Fortunately, wild bees have been collected by researchers and deposited in natural history collections for over 100 years, retaining a wealth of associated attributes that can be extracted from specimen images. This project will digitally capture data and images from these historic specimens, develop tools to measure bee traits from these images and generate a comprehensive bee trait and image dataset to measure changes through time. This will increase our understanding of specific traits that put bee species at risk of decline - a critical need for both sustaining our agricultural economy and the conservation of our natural resources. In addition, the large image datasets created by this project can be used for new artificial intelligence identification tools that will help improve our future pollinator observation and monitoring efforts. The Big-Bee project began in 2021 and is funded by the National Science Foundation to mobilize data about worldwide bee species to data aggregators (e.g., iDigBio, GBIF). The Big-Bee Thematic Collection Network (Big-Bee) will create over one million high-resolution 2D and 3D images of bee specimens, representing over 5,000 worldwide bee species, including all of the major pollinating species of the United States. The Big-Bee network includes 13 institutions and partnerships with US government agencies. Novel mechanisms for sharing image datasets will be developed and datasets of bee traits will be available through an open data portal, the Bee Library, for research and education. The Big-Bee project will engage the general public in research through community science via crowdsourcing trait measurements and data transcription from images. In addition, training and professional development for natural history collection staff, researchers, and university students in data science will be provided through the creation and implementation of workshops focusing on bee traits and species identification. All data resulting from this award will be shared with and publicly available through the national digitized biocollections resource, iDigBio.org. This is the first archive of Big-Bee data indexed by Global Biotic Interactions (GloBI). GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open-source software. This version of the Big Bee dataset includes interactions that are not just bees. Also in this version, the datasets included in this publication are specifically those institutions in the Big Bee project network and do not represent all bee interaction data found at Global Biotic Interactions. Bee Library Information - Statistics about Big Bee data providers The specimens indexed by GloBI are also found in the Bee Library. To date, the number of specimens and images in the library are listed below. The Bee Library taxonomic backbone is not yet complete, so information regarding the number of species is not yet available. Further summary statistics are available in the Big Bee Metrics from the Bee Library and GloBI - July 27 2022.pdf file. From Bee Library (partner indexed records) 1,218,256 occurrence records 992,776 (81%) georeferenced 350,105 (29%) occurrences imaged 1,004,491 (82%) identified to species 9 families 523 genera 10,808 species 11,492 total taxa (including subsp. and var.) Statistics per Collection (partner collections) Collection Occurrences Georeferenced Imaged Interactions Indexed in GloBI (all) Interactions Indexed in GloBI (bees) ASU Hasbrouck Insect Collection - Bee Records 13219 13217 2047 19774 3834 Bee Biology and Systematics Laboratory, USDA-ARS Pollinating Insect-Biology, Management, Systematics Research 561820 547461 0 0 0 California Academy of Sciences 873 286 3 16957 117 California Academy of Sciences - Type Collection 1838 59 83 0 0 Essig Museum of Entomology, University of California Berkeley 58548 55022 0 0 0 Florida State Collection of Arthropods 12290 12246 8979 0 0 Museum of Comparative Zoology, Harvard University 22020 21099 11595 6476 1535 Natural History Museum of Los Angeles County 16442 7420 3372 0 0 San Diego Natural History Museum Entomology Department 4065 1690 1980 8678 90 University of California Santa Barbara Invertebrate Zoology Collection 8678 8416 2646 1940 659 University of Colorado Museum of Natural History, Entomology Collection 18043 18043 0 9589 4723 University of Kansas Natural History Museum Entomology Division 464896 275180 304405 119947 112674 University of Michigan Museum of Zoology Division of Insects 17738 15143 14995 53602 4120 University of New Hampshire, Donald S. Chandler Entomological Collection 17685 17393 0 3137 3137 USGS Native Bee Inventory and Monitoring Lab 101 101 0 0 0 Generated on: GloBI Data Review Report - Datasets in Review from Global Biotic Interactions GloBI Data Review Report Datasets under review: - University of Michigan Museum of Zoology, Division of Insects accessed via https://github.com/globalbioticinteractions/ummz-ummzi/archive/d9282e51f29f3157af2e5869a09ea8a111ddea34.zip on 2023-04-25T19:48:17.288Z - Arizona State University Hasbrouck Insect Collection accessed via https://github.com/globalbioticinteractions/asu-asuhic/archive/4ed77cb9ca8e526269d4678692e2844c950022f8.zip on 2023-04-25T19:49:18.649Z - California Academy of Sciences Entomology and Entomology Type Collection accessed via https://github.com/globalbioticinteractions/cas-ent/archive/47d385b73a63aa379cd5e6d3615005ba78b0ffc1.zip on 2023-04-25T19:50:01.820Z - University of California Berkeley, Essig Museum of Entomology accessed via https://github.com/globalbioticinteractions/emec/archive/93b17a3db566baa001ce9190e6fbdb60fa99dda4.zip on 2023-04-25T19:50:38.682Z - Florida State Collection of Arthropods accessed via https://github.com/globalbioticinteractions/fsca/archive/682f11686317ae81959a043bd6b493ddfc06c438.zip on 2023-04-25T19:51:09.435Z - University of Kansas Natural History Museum accessed via https://github.com/globalbioticinteractions/ku-semc/archive/a9c7cb81050eef68b4428667206a219da458f517.zip on 2023-04-25T19:51:51.861Z - Natural History Museum of Los Angeles County accessed via https://github.com/globalbioticinteractions/lacm-lacmec/archive/dafbf532c53fbadba126c81186c26d52677aa781.zip on 2023-04-25T19:53:50.488Z - Harvard University M, Morris P J (2021). Museum of Comparative Zoology, Harvard University. Museum of Comparative Zoology, Harvard University. accessed via https://github.com/globalbioticinteractions/mcz/archive/b33635a9fc75fd7931ad968cbc11180e6467bfd7.zip on 2023-04-25T20:05:19.619Z - San Diego Natural History Museum accessed via https://github.com/globalbioticinteractions/sdnhm-sdmc/archive/7238d8b804f543250eb487b43144e1125fb3688a.zip on 2023-04-25T20:11:18.816Z - University of Colorado Museum of Natural History Entomology Collection accessed via https://github.com/globalbioticinteractions/ucm-ucmc/archive/60530dcc82d33c9675a4026ad60dc40bea8f2a91.zip on 2023-04-25T20:11:45.143Z - University of California Santa Barbara Invertebrate Zoology Collection accessed via https://github.com/globalbioticinteractions/ucsb-izc/archive/66a4e39589d1dfa299d07985546c4be522ff60d8.zip on 2023-04-25T20:12:06.236Z - University of New Hampshire Donald S. Chandler Entomological Collection accessed via https://github.com/globalbioticinteractions/unhc-unhc/archive/d7668a6bb4545dc4da0645ecc383169ba547b0f5.zip on 2023-04-25T20:12:21.404Z Generated on: 2023-04-25 by: GloBI's Elton 0.12.6 (see https://github.com/globalbioticinteractions/elton). Note that all files ending with .tsv are files formatted as UTF8 encoded tab-separated values files. https://www.iana.org/assignments/media-types/text/tab-separated-values indexed_interactions_bees.tsv: All indexed bee interactions datasets.zip: All datasets reviewed for this publication Big Bee Metrics from the Bee Library and GloBI - Apr 25, 2023.pdf: Summary statistics from the Bee Library and GloBI about data partners If you have questions or comments about this publication, please open an issue at https://github.com/Big-Bee-Network/issues-observations-and-questions/discussions or contact the authors by email. Funding: The creation of this archive was made possible by the National Science Foundation award Collaborative Research: Digitization TCN: Extending Anthophila research through image and trait digitization (Big-Bee). Award numbers: DBI:2102006, DBI:2101929, DBI:2101908, DBI:2101876, DBI:2101875, DBI:2101851, DBI:2101345, DBI:2101913, DBI:2101891 and DBI:2101850. References: Poelen JH, Simons JD and Mungall CH. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005. Seltmann KC, Allen J, Brown BV, Carper A, Engel MS, Franz N, Gilbert E, Grinter C, Gonzalez VH, Horsley P, Lee S, Maier C, Miko I, Morris P, Oboyski P, Pierce NE, Poelen J, Scott VL, Smith M, Talamas EJ, Tsutsui ND, Tucker E (2021) Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards 5: e74037. https://doi.org/10.3897/biss.5.74037 Jorrit Poelen, Tobias Kuhn, & Katrin Leinweber. (2022). globalbioticinteractions/elton: 0.12.5 (0.12.5). Zenodo. https://doi.org/10.5281/zenodo.7267926"],"Other":["{"references": ["Seltmann KC, Allen J, Brown BV, Carper A, Engel MS, Franz N, Gilbert E, Grinter C, Gonzalez VH, Horsley P, Lee S, Maier C, Miko I, Morris P, Oboyski P, Pierce NE, Poelen J, Scott VL, Smith M, Talamas EJ, Tsutsui ND, Tucker E (2021) Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards 5: e74037. https://doi.org/10.3897/biss.5.74037", "Poelen JH, Simons JD and Mungall CH. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.", "Jorrit Poelen, Tobias Kuhn, & Katrin Leinweber. (2022). globalbioticinteractions/elton: (0.12.4). Zenodo. https://doi.org/10.5281/zenodo.6385185"]}"]}
Poelen, J. H
(, iDigBio Communications Luncheon, 12 April 2021)
null
(Ed.)
Abstract: As the web of biodiversity knowledge continues to grow and become more complex, practical questions arise: How do we publish and review works that use big and complex datasets? How do we keep track of data use across biodiversity data networks? How do we keep our digital data available for the next 50 years? In this iDigBio lunch seminar, Jorrit Poelen works towards answering these questions through use cases taken from Global Biotic Interactions (GloBI, https://globalbioticinteractions.org), Terrestrial Parasite Tracker TCN (TPT, https://parasitetracker.org) and Preston (https://preston.guoda.bio), a biodiversity data tracker.
Salim, José Augusto, Seltmann, Katja, Poelen, Jorrit, and Saraiva, Antonio. Indexing Biotic Interactions in GBIF data. Retrieved from https://par.nsf.gov/biblio/10437333. Biodiversity Information Science and Standards 6. Web. doi:10.3897/biss.6.93565.
@article{osti_10437333,
place = {Country unknown/Code not available},
title = {Indexing Biotic Interactions in GBIF data},
url = {https://par.nsf.gov/biblio/10437333},
DOI = {10.3897/biss.6.93565},
abstractNote = {The Global Biodiversity Information Facility (GBIF 2022a) has indexed more than 2 billion occurrence records from 70,147 datasets. These datasets often include "hidden" biotic interaction data because biodiversity communities use the Darwin Core standard (DwC, Wieczorek et al. 2012) in different ways to document biotic interactions. In this study, we extracted biotic interactions from GBIF data using an approach similar to that employed in the Global Biotic Interactions (GloBI; Poelen et al. 2014) and summarized the results. Here we aim to present an estimation of the interaction data available in GBIF, showing that biotic interaction claims can be automatically found and extracted from GBIF. Our results suggest that much can be gained by an increased focus on development of tools that help to index and curate biotic interaction data in existing datasets. Combined with data standardization and best practices for sharing biotic interactions, such as the initiative on plant-pollinators interaction (Salim 2022), this approach can rapidly contribute to and meet open data principles (Wilkinson 2016). We used Preston (Elliott et al. 2020), open-source software that versions biodiversity datasets, to copy all GBIF-indexed datasets. The biodiversity data graph version (Poelen 2020) of the GBIF-indexed datasets used during this study contains 58,504 datasets in Darwin Core Archive (DwC-A) format, totaling 574,715,196 records. After retrieval and verification, the datasets were processed using Elton. Elton extracts biotic interaction data and supports 20+ existing file formats, including various types of data elements in DwC records. Elton also helps align interaction claims (e.g., host of, parasite of, associated with) to the Relations Ontology (RO, Mungall 2022), making it easier to discover datasets across a heterogeneous collection of datasets. Using specific mapping between interaction claims found in the DwC records to the terms in RO*1, Elton found 30,167,984 potential records (with non-empty values for the scanned DwC terms) and 15,248,478 records with recognized interaction types. Taxonomic name validation was performed using Nomer, which maps input names to names found in a variety of taxonomic catalogs. We only considered an interaction record valid where the interaction type could be mapped to a term in RO and where Nomer found a valid name for source and target taxa. Based on the workflow described in Fig. 1, we found 7,947,822 interaction records (52% of the potential interactions). Most of them were generic interactions ( interacts_ with , 87.5%), but the remaining 12.5% (993,477 records) included host-parasite and plant-animal interactions. The majority of the interactions records found involved plants (78%), animals (14%) and fungi (6%). In conclusion, there are many biotic interactions embedded in existing datasets registered in large biodiversity data indexers and aggregators like iDigBio, GBIF, and BioCASE. We exposed these biotic interaction claims using the combined functionality of biodiversity data tools Elton (for interaction data extraction), Preston (for reliable dataset tracking) and Nomer (for taxonomic name alignment). Nonetheless, the development of new vocabularies, standards and best practice guides would facilitate aggregation of interaction data, including the diversification of the GBIF data model (GBIF 2022b) for sharing biodiversity data beyond occurrences data. That is the aim of the TDWG Interest Group on Biological Interactions Data (TDWG 2022).},
journal = {Biodiversity Information Science and Standards},
volume = {6},
author = {Salim, José Augusto and Seltmann, Katja and Poelen, Jorrit and Saraiva, Antonio},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.