skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: (Hyper)active data curation: A video case study from behavioral science
Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.  more » « less
Award ID(s):
2032713
PAR ID:
10288032
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of escience librarianship
Volume:
10
Issue:
3
ISSN:
2161-3974
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data sharing and reuse are becoming the norm in quantitative research. At the same time, significant skepticism still accompanies the sharing and reuse of qualitative research data on both ethical and epistemological grounds. Nevertheless, there is growing interest in the reuse of qualitative data, as demonstrated by the range of contributions in this special issue. In this research note, we address epistemological critiques of reusing qualitative data and argue that careful curation of data can enable what we term “epistemologically responsible reuse” of qualitative data. We begin by briefly defining qualitative data and summarizing common epistemological objections to their shareability or usefulness for secondary analysis. We then introduce the concept of curation as enabling epistemologically responsible reuse and a potential way to address such objections. We discuss three recent trends that we believe are enhancing curatorial practices and thus expand the opportunities for responsible reuse: improvements in data management practices among researchers, the development of collaborative curation practices at repositories focused on qualitative data and technological advances that support sharing rich qualitative data. Using three examples of successful reuse of qualitative data, we illustrate the potential of these three trends to further improve the availability of reusable data projects. 
    more » « less
  2. Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure. 
    more » « less
  3. Sharing high-quality research data specifically for reuse in future work helps the scientific community progress by enabling researchers to build upon existing work and explore new research questions without duplicating data collection efforts. Because current discussions about research artifacts in Computer Security focus on reproducibility and availability of source code, the reusability of data is unclear. We examine data sharing practices in Computer Security and Measurement to provide resources and recommendations for sharing reusable data. Our study covers five years (2019–2023) and seven conferences in Computer Security and Measurement, identifying 948 papers that create a dataset as one of their contributions. We analyze the 265 accessible datasets, evaluating their under-standability and level of reuse. Our findings reveal inconsistent practices in data sharing structure and documentation, causing some datasets to not be shared effectively. Additionally, reuse of datasets is low, especially in fields where the nature of the data does not lend itself to reuse. Based on our findings, we offer data-driven recommendations and resources for improving data sharing practices in our community. Furthermore, we encourage authors to be intentional about their data sharing goals and align their sharing strategies with those goals. 
    more » « less
  4. Advances in data infrastructure are often led by disciplinary initiatives aimed at innovation in federation and sharing of data and related research materials. In library and information science (LIS), the data services area has focused on data curation and stewardship to support description and deposit of data for access, reuse, and preservation. At the same time, solutions to societal grand challenges are thought to lie in convergence research, characterized by a problem-focused orientation and deep cross-disciplinary integration, requiring access to highly varied data sources with differing resolutions or scales. We argue that data curation and stewardship work in LIS should expand to foster convergence research based on a robust understanding of the dynamics of disciplinary and interdisciplinary research methods and practices. Highlighting unique contributions by Dr. Linda C. Smith to the field of LIS, we outline how her work illuminates problems that are core to current directions in convergence research. Drawing on advances in data infrastructure in the earth and geosciences and trends in qualitative domains, we emphasize the importance of metastructures and the necessary influence of disciplinary practice on principles, standards, and provisions for ethical use across the evolving data ecosystem. 
    more » « less
  5. null (Ed.)
    This workshop report tackles one of the most significant barriers to progress in making research data publicly accessible: the hurdles faced by researchers in producing and reusing publicly accessible research data, both in their research practice and in the surrounding ecosystem shaped by external stakeholders. The central challenge in high quality data sharing is to understand how researchers can increase the downstream value of shared data while reducing burden for both data producers and reusers. The report summarizes recommendations and actions from an NSF-sponsored virtual workshop series on Fostering Data Reusability: Increasing Impact and Ease in Data Sharing and Reuse held in June 2021. The series explored what context data reusers need to evaluate and appropriately reuse the data, identified practices that will improve data reusability and reduce the burden in producing and sharing research data, and used a stakeholder alignment approach to identify actions stakeholders could take to foster progress in reducing burden and increasing impact in data sharing and reuse. 
    more » « less