skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Curating for Convergence: Data Stewardship for Interdisciplinary Inquiry
Advances in data infrastructure are often led by disciplinary initiatives aimed at innovation in federation and sharing of data and related research materials. In library and information science (LIS), the data services area has focused on data curation and stewardship to support description and deposit of data for access, reuse, and preservation. At the same time, solutions to societal grand challenges are thought to lie in convergence research, characterized by a problem-focused orientation and deep cross-disciplinary integration, requiring access to highly varied data sources with differing resolutions or scales. We argue that data curation and stewardship work in LIS should expand to foster convergence research based on a robust understanding of the dynamics of disciplinary and interdisciplinary research methods and practices. Highlighting unique contributions by Dr. Linda C. Smith to the field of LIS, we outline how her work illuminates problems that are core to current directions in convergence research. Drawing on advances in data infrastructure in the earth and geosciences and trends in qualitative domains, we emphasize the importance of metastructures and the necessary influence of disciplinary practice on principles, standards, and provisions for ethical use across the evolving data ecosystem.  more » « less
Award ID(s):
1928208
PAR ID:
10470846
Author(s) / Creator(s):
;
Publisher / Repository:
John Hopkins University Press
Date Published:
Journal Name:
Library Trends
Volume:
71
Issue:
1
ISSN:
1559-0682
Page Range / eLocation ID:
113 to 131
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. When it comes to climate crisis research, current debates are increasingly thematizing the needs but also the challenges of collaborative, transdisciplinary work. Geophysical characterizations of climate change are increasingly deemed insufficient to respond to the challenges that vulnerable communities face worldwide. In this paper, I describe the work of studying‐while‐caring for an environmental data infrastructure in order to address this issue. I suggest framing “data management” anthropologically as a question of collective stewardship that is better conceived as a “knowledge infrastructure” (Edwards 2010) instead of a formal approach to automated data curation. To examine the sociotechnical blindspots of data management, I elaborate on the anthropological concept of “infrastructural blues” based on the data engineering work I conducted. For the conclusion, I discuss the concept of “common” as a substitute for “open” technologies and address the broader implications of the proposed shift toward community stewardship and self‐determination as guiding practices for socio‐environmental data governance. 
    more » « less
  2. Data sharing and reuse are becoming the norm in quantitative research. At the same time, significant skepticism still accompanies the sharing and reuse of qualitative research data on both ethical and epistemological grounds. Nevertheless, there is growing interest in the reuse of qualitative data, as demonstrated by the range of contributions in this special issue. In this research note, we address epistemological critiques of reusing qualitative data and argue that careful curation of data can enable what we term “epistemologically responsible reuse” of qualitative data. We begin by briefly defining qualitative data and summarizing common epistemological objections to their shareability or usefulness for secondary analysis. We then introduce the concept of curation as enabling epistemologically responsible reuse and a potential way to address such objections. We discuss three recent trends that we believe are enhancing curatorial practices and thus expand the opportunities for responsible reuse: improvements in data management practices among researchers, the development of collaborative curation practices at repositories focused on qualitative data and technological advances that support sharing rich qualitative data. Using three examples of successful reuse of qualitative data, we illustrate the potential of these three trends to further improve the availability of reusable data projects. 
    more » « less
  3. OpenMSIStream provides seamless connection of scientific data stores with streaming infrastructure to allow researchers to leverage the power of decoupled, real-time data streaming architectures. Data streaming is the process of transmitting, ingesting, and processing data continuously rather than in batches. Access to streaming data has revolutionized many industries in the past decade and created entirely new standards of practice and types of analytics. While not yet commonly used in scientific research, data streaming has the potential to become a key technology to drive rapid advances in scientific data collection (e.g., Brookhaven National Lab (2022)). This paucity of streaming infrastructures linking complex scientific systems is due to a lack of tools that facilitate streaming in the diverse and distributed systems common in modern research. OpenMSIStream closes this gap between underlying streaming systems and common scientific infrastructure. Closing this gap empowers novel streaming applications for scientific data including automation of data curation, reduction, and analysis; real-time experiment monitoring and control; and flexible deployment of AI/ML to guide autonomous research. Streaming data generally refers to data continuously generated from multiple sources and passed in small packets (termed messages). Streaming data messages are typically organized in groups called topics and persist for periods of time conducive to processing for multiple uses either sequentially or in small groups. The resulting flows of raw data, metadata, and processing results form “ecosystems” that automate varied data-driven tasks. A strength of data streaming ecosystems is the use of publish-subscribe (“pub/sub”) messaging backbones that decouple data senders (publishers) and recipients (subscribers). Popular message-focused middleware solutions such as RabbitMQ (VMware, 2022), Apache Pulsar (Apache Software Foundation, 2022b), and Apache Kafka (Apache Software Foundation, 2022a) all provide differing capabilities as backbones. OpenMSIStream provides robust and efficient, yet easy, access to the rich data streaming systems of Apache Kafka. 
    more » « less
  4. With dramatic advancements in biological data generation, genetic rescue and reproductive technologies, and inter-institutional coordination of care across entire animal populations, zoos, aquariums, and their collaborators are uniquely positioned to lead population-wide research benefiting animal wellbeing and species survival. However, procedural and inter-institutional barriers make it exceedingly difficult to access existing zoological biospecimens and data at scale. To address this, the Zoonomics Working Group, representing diverse roles across three zoological associations (AZA, EAZA, WAZA), proposes a biodiversity biobank alliance that develops and delivers shared resources to support the collection, storage, and sharing of biological samples and associated data across the zoological and conservation community. By biobank alliance, we mean a community-guided effort that develops shared resources, standards, ethos, and practices for collecting, storing, and sharing biological samples and associated data voluntarily through transparent processes, consistent with professional accreditation standards and international best practices. While initially focused on addressing the needs and regulatory landscape of U.S. institutions, the alliance is designed to create frameworks that are adaptable and adoptable for international expansion. Such a framework would help the zoological community navigate the ethical, legal, and practical challenges of managing biospecimen collections, making access more efficient, reliable, and robust. Achieving this vision requires collective agreement on ethical principles such as reciprocity, transparency, and data stewardship, ensuring that research is both feasible and proactively supported. Such coordination will drive advances in fundamental biology and accelerate progress in animal health, welfare, management, and biodiversity conservation. 
    more » « less
  5. null (Ed.)
    Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process. 
    more » « less