skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: 2nd Workshop on Digital Infrastructures for Scholarly Content Objects (DISCO'22)
The goal of the Digital Infrastructures for Scholarly Content Objects (DISCO) workshop is to raise awareness of quality issues, improved discovery, and re-use challenges in digital infrastructures for scholarly content, and to collect potential solutions among an audience of diverse expertise.Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries  more » « less
Award ID(s):
2046454
PAR ID:
10436343
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
Page Range / eLocation ID:
1 - 2
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sharing, reuse, and synthesis of knowledge is central to the research process. These core functions are in theory served by the system of monographs, abstracts, and papers in journals and proceedings, with citation indices and search databases that comprise the core of our formal scholarly communication infrastructure; yet, converging lines of empirical and anecdotal evidence suggest that this system does not adequately act as infrastructure for synthesis. Emerging developments in new institutions for science, along with new technical infrastructures and tooling for decentralized knowledge work, offer new opportunities to prototype new technical infrastructures on top of a different installed base than the publish or perish, neoliberal academy. This workshop aims to integrate these developments and communities with CSCW’s deep roots in knowledge infrastructures and collaborative and distributed sensemaking, with new developments in science institutions and tooling, to stimulate and accelerate progress towards prototyping new scholarly communication infrastructures that are actually optimized for sharing, reusing, and synthesizing knowledge. 
    more » « less
  2. The deluge of digital biodiversity datasets unleashed through institutional, national and global infrastructures brings up an inconvenient truth: internet-connected infrastructures are in a constant state of flux while preservation and integration of digital knowledge are often afterthoughts. Rather than taking digital amnesia for granted, we examine examples of durable and frugal digital data preservation and integration methods. Examples include tracking external datasets, creating verifiable data citations, cross-publishing and cross-linking datasets, reproducing data-integration processes, and distributing large data archives across poor, or nonexistent, internet connections. Topics include cryptographic hashes, Provenance Ontology, content-addressed storage, Unix philosophy, and offline first design as applied in projects like Preston (https://preston.guoda.bio) and Global Biotic Interactions (https://globalbioticinteractions.org). The examples are then related to best practices applied by proven knowledge-preservation experts: librarians and curators. 
    more » « less
  3. The volume of scholarly data has been growing exponentially over the last 50 years. The total size of the open access documents is estimated to be 35 million by 2022. The total amount of data to be handled, including crawled documents, production repository, metadata, extracted content, and their replications, can be as high as 350TB. Academic digital library search engines face significant challenges in maintaining sustainable services. We discuss these challenges and propose feasible solutions to key modules in the digital library architecture including the document storage, data extraction, database and index. We use CiteSeerX as a case study. 
    more » « less
  4. The volume of scholarly data has been growing exponentially over the last 50 years. The total size of the open access documents is estimated to be 35 million by 2022. The total amount of data to be handled, including crawled documents, production repository, metadata, extracted content, and their replications, can be as high as 350TB. Academic digital library search engines face significant challenges in maintaining sustainable services. We discuss these challenges and propose feasible solutions to key modules in the digital library architecture including the document storage, data extraction, database and index. We use CiteSeerX as a case study. 
    more » « less
  5. The volume of scholarly data has been growing exponentially over the last 50 years. The total size of the open access documents is estimated to be 35 million by 2022. The total amount of data to be handled, including crawled documents, production repository, metadata, extracted content, and their replications, can be as high as 350TB. Academic digital library search engines face signi cant challenges in maintaining sustainable services. We discuss these challenges and propose feasible solutions to key modules in the digital library architecture including the document storage, data extraction, database and index. We use CiteSeerX as a case study. 
    more » « less