Biodiversity datasets, or descriptions of biodiversity datasets, are increasingly available through open digital data infrastructures such as Global Biodiversity Information Facility (GBIF, https://gbif.org), Integrated Digitized Biocollections (iDigBio, https://www.idigbio.org) and the Biological Collection Access Service (BioCASE, http://www.biocase.org). </p> However, little is known about how these networks, and the data accessed through them, change over time. This dataset provide snapshots of the biodiversity dataset graphs as tracked by Preston (https://github.com/bio-guoda/preston , https://doi.org/10.5281/zenodo.1410543 ).</p> The rdf/nquad and tsv snapshots were generated using the respective commands:</p> preston ls | bzip2 > preston-ls.nq.bz2</p> and</p> preston ls --log tsv | bzip2 > preston-ls.tsv.bz2</p> For convenience, the first 100 uncompressed entries of both files are included, as well as the sha256 hashes of the content of the files.</p>
more »
« less
To connect is to preserve: on frugal data integration and preservation solutions, 10.17605/OSF.IO/A2V8G
The deluge of digital biodiversity datasets unleashed through institutional, national and global infrastructures brings up an inconvenient truth: internet-connected infrastructures are in a constant state of flux while preservation and integration of digital knowledge are often afterthoughts. Rather than taking digital amnesia for granted, we examine examples of durable and frugal digital data preservation and integration methods. Examples include tracking external datasets, creating verifiable data citations, cross-publishing and cross-linking datasets, reproducing data-integration processes, and distributing large data archives across poor, or nonexistent, internet connections. Topics include cryptographic hashes, Provenance Ontology, content-addressed storage, Unix philosophy, and offline first design as applied in projects like Preston (https://preston.guoda.bio) and Global Biotic Interactions (https://globalbioticinteractions.org). The examples are then related to best practices applied by proven knowledge-preservation experts: librarians and curators.
more »
« less
- Award ID(s):
- 1839201
- PAR ID:
- 10192214
- Date Published:
- Journal Name:
- Society for Preservation of Natural History Collections (SPNHC) Annual Meeting. Chicago
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Global Biotic Interactions (GloBI, https://globalbioticinteractions.org) uses frugal and pragmatic methods to make openly available species interaction datasets (e.g., parasite-host, predator-prey, plant-pollinator) easier to find and reuse. Since 2013, GloBI increased the reach of existing datasets, facilitated research, improved data integration methods and provided dataset reviews. In this talk, GloBI is introduced and various reuse examples are presented to discuss the question: Why should we bother to reuse existing (species-interaction) datasets?more » « less
-
null (Ed.)Abstract: As the web of biodiversity knowledge continues to grow and become more complex, practical questions arise: How do we publish and review works that use big and complex datasets? How do we keep track of data use across biodiversity data networks? How do we keep our digital data available for the next 50 years? In this iDigBio lunch seminar, Jorrit Poelen works towards answering these questions through use cases taken from Global Biotic Interactions (GloBI, https://globalbioticinteractions.org), Terrestrial Parasite Tracker TCN (TPT, https://parasitetracker.org) and Preston (https://preston.guoda.bio), a biodiversity data tracker.more » « less
-
Abstract Data for Policy (dataforpolicy.org), a trans-disciplinary community of research and practice, has emerged around the application and evaluation of data technologies and analytics for policy and governance. Research in this area has involved cross-sector collaborations, but the areas of emphasis have previously been unclear. Within the Data for Policy framework of six focus areas, this report offers a landscape review of Focus Area 2: Technologies and Analytics. Taking stock of recent advancements and challenges can help shape research priorities for this community. We highlight four commonly used technologies for prediction and inference that leverage datasets from the digital environment: machine learning (ML) and artificial intelligence systems, the internet-of-things, digital twins, and distributed ledger systems. We review innovations in research evaluation and discuss future directions for policy decision-making.more » « less
-
From smart devices to homes to cities, Internet of Things (IoT) technologies have become embedded within everyday objects on a global scale. We understand IoT technologies as a form of infrastructure that bridges the gaps between offline spaces and online networks as they track, transmit, and construct digital data from and of the physical world. We examine the social construction of IoT network technologies through their technological design and corporate discourses. In this article, we explore the methodological challenges and opportunities of studying IoT as an emerging network technology. We draw on a case study of a low-power wide-area network (LPWAN), a cost-effective radio frequency network that is designed to connect sensors across long distances. Reflecting on our semi-structured interviews with LPWAN users and advocates, participant observation at conferences about LPWAN, as well as a community-based LPWAN project, we examine the intersections of methods and practices as related to space, data, and infrastructures. We identify three key methodological obstacles involved in studying the social construction of networked technologies that straddle physical and digital environments. These include (a) transcending the invisibility and abstraction of network infrastructures, (b) managing practical and conceptual boundaries to sample key cases and participants, and (c) negotiating competing technospatial imaginaries between participants and researchers. Through our reflection, we demonstrate that these challenges also serve as generative methodological opportunities, extending existing tools to study the ways data connects online and offline spaces.more » « less
An official website of the United States government

