skip to main content


Title: (Hyper)active data curation: A video case study from behavioral science
Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.  more » « less
Award ID(s):
2032713
PAR ID:
10288032
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of escience librarianship
Volume:
10
Issue:
3
ISSN:
2161-3974
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data sharing and reuse are becoming the norm in quantitative research. At the same time, significant skepticism still accompanies the sharing and reuse of qualitative research data on both ethical and epistemological grounds. Nevertheless, there is growing interest in the reuse of qualitative data, as demonstrated by the range of contributions in this special issue. In this research note, we address epistemological critiques of reusing qualitative data and argue that careful curation of data can enable what we term “epistemologically responsible reuse” of qualitative data. We begin by briefly defining qualitative data and summarizing common epistemological objections to their shareability or usefulness for secondary analysis. We then introduce the concept of curation as enabling epistemologically responsible reuse and a potential way to address such objections. We discuss three recent trends that we believe are enhancing curatorial practices and thus expand the opportunities for responsible reuse: improvements in data management practices among researchers, the development of collaborative curation practices at repositories focused on qualitative data and technological advances that support sharing rich qualitative data. Using three examples of successful reuse of qualitative data, we illustrate the potential of these three trends to further improve the availability of reusable data projects. 
    more » « less
  2. Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure. 
    more » « less
  3. Advances in data infrastructure are often led by disciplinary initiatives aimed at innovation in federation and sharing of data and related research materials. In library and information science (LIS), the data services area has focused on data curation and stewardship to support description and deposit of data for access, reuse, and preservation. At the same time, solutions to societal grand challenges are thought to lie in convergence research, characterized by a problem-focused orientation and deep cross-disciplinary integration, requiring access to highly varied data sources with differing resolutions or scales. We argue that data curation and stewardship work in LIS should expand to foster convergence research based on a robust understanding of the dynamics of disciplinary and interdisciplinary research methods and practices. Highlighting unique contributions by Dr. Linda C. Smith to the field of LIS, we outline how her work illuminates problems that are core to current directions in convergence research. Drawing on advances in data infrastructure in the earth and geosciences and trends in qualitative domains, we emphasize the importance of metastructures and the necessary influence of disciplinary practice on principles, standards, and provisions for ethical use across the evolving data ecosystem. 
    more » « less
  4. null (Ed.)
    This workshop report tackles one of the most significant barriers to progress in making research data publicly accessible: the hurdles faced by researchers in producing and reusing publicly accessible research data, both in their research practice and in the surrounding ecosystem shaped by external stakeholders. The central challenge in high quality data sharing is to understand how researchers can increase the downstream value of shared data while reducing burden for both data producers and reusers. The report summarizes recommendations and actions from an NSF-sponsored virtual workshop series on Fostering Data Reusability: Increasing Impact and Ease in Data Sharing and Reuse held in June 2021. The series explored what context data reusers need to evaluate and appropriately reuse the data, identified practices that will improve data reusability and reduce the burden in producing and sharing research data, and used a stakeholder alignment approach to identify actions stakeholders could take to foster progress in reducing burden and increasing impact in data sharing and reuse. 
    more » « less
  5. The growing push for open data resulted in an abundance of data for coastal researchers, which can lead to problems for individual researchers related to data discoverability. One solution is to explicitly develop services for coastal researchers to help curate data for discovery, hosting discussions around reuse, community building, and finding collaborators. To develop the idea of a coastal data curation service, we investigate aspects of the UNESCO International Coastal Atlas Network member sites that could be used to build a curation service. We develop a minimal example of a coastal data curation service, deploy this as a website, and describe the next steps to move beyond the prototype phase. We envision a coastal data curation service as a way to cultivate a community focused on coastal data discovery and reuse. 
    more » « less