skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How Data Curation Enables Epistemically Responsible Reuse of Qualitative Data
Data sharing and reuse are becoming the norm in quantitative research. At the same time, significant skepticism still accompanies the sharing and reuse of qualitative research data on both ethical and epistemological grounds. Nevertheless, there is growing interest in the reuse of qualitative data, as demonstrated by the range of contributions in this special issue. In this research note, we address epistemological critiques of reusing qualitative data and argue that careful curation of data can enable what we term “epistemologically responsible reuse” of qualitative data. We begin by briefly defining qualitative data and summarizing common epistemological objections to their shareability or usefulness for secondary analysis. We then introduce the concept of curation as enabling epistemologically responsible reuse and a potential way to address such objections. We discuss three recent trends that we believe are enhancing curatorial practices and thus expand the opportunities for responsible reuse: improvements in data management practices among researchers, the development of collaborative curation practices at repositories focused on qualitative data and technological advances that support sharing rich qualitative data. Using three examples of successful reuse of qualitative data, we illustrate the potential of these three trends to further improve the availability of reusable data projects.  more » « less
Award ID(s):
1823950
PAR ID:
10319965
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The Qualitative Report
ISSN:
2160-3715
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Advances in data infrastructure are often led by disciplinary initiatives aimed at innovation in federation and sharing of data and related research materials. In library and information science (LIS), the data services area has focused on data curation and stewardship to support description and deposit of data for access, reuse, and preservation. At the same time, solutions to societal grand challenges are thought to lie in convergence research, characterized by a problem-focused orientation and deep cross-disciplinary integration, requiring access to highly varied data sources with differing resolutions or scales. We argue that data curation and stewardship work in LIS should expand to foster convergence research based on a robust understanding of the dynamics of disciplinary and interdisciplinary research methods and practices. Highlighting unique contributions by Dr. Linda C. Smith to the field of LIS, we outline how her work illuminates problems that are core to current directions in convergence research. Drawing on advances in data infrastructure in the earth and geosciences and trends in qualitative domains, we emphasize the importance of metastructures and the necessary influence of disciplinary practice on principles, standards, and provisions for ethical use across the evolving data ecosystem. 
    more » « less
  2. null (Ed.)
    Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process. 
    more » « less
  3. As funder, journal, and disciplinary norms and mandates have foregrounded obligations of data sharing and opportunities for data reuse, the need to plan for and curate data sets that can reach researchers and end-users with disabilities has become even more urgent. We begin by exploring the disability studies literature, describing the need for advocacy and representation of disabled scholars as data creators, subjects, and users. We then survey the landscape of data repositories, curation guidelines, and research-data-related standards, finding little consideration of accessibility for people with disabilities. We suggest three sets of minimal good practices for moving toward truly accessible research data: 1) ensuring Web accessibility for data repositories; 2) ensuring accessibility of common text formats, including those used in documentation; and 3) enhancement of visual and audiovisual materials. We point to some signs of progress in regard to truly accessible data by highlighting exemplary practices by repositories, standards, and data professionals. Accessibility needs to become a mainstream component of curation practice included in every training, manual, and primer. 
    more » « less
  4. Sharing high-quality research data specifically for reuse in future work helps the scientific community progress by enabling researchers to build upon existing work and explore new research questions without duplicating data collection efforts. Because current discussions about research artifacts in Computer Security focus on reproducibility and availability of source code, the reusability of data is unclear. We examine data sharing practices in Computer Security and Measurement to provide resources and recommendations for sharing reusable data. Our study covers five years (2019–2023) and seven conferences in Computer Security and Measurement, identifying 948 papers that create a dataset as one of their contributions. We analyze the 265 accessible datasets, evaluating their under-standability and level of reuse. Our findings reveal inconsistent practices in data sharing structure and documentation, causing some datasets to not be shared effectively. Additionally, reuse of datasets is low, especially in fields where the nature of the data does not lend itself to reuse. Based on our findings, we offer data-driven recommendations and resources for improving data sharing practices in our community. Furthermore, we encourage authors to be intentional about their data sharing goals and align their sharing strategies with those goals. 
    more » « less
  5. This paper describes a machine learning approach for annotating and analyzing data curation work logs at ICPSR, a large social sciences data archive. The systems we studied track curation work and coordinate team decision-making at ICPSR. Archive staff use these systems to organize, prioritize, and document curation work done on datasets, making them promising resources for studying curation work and its impact on data reuse, especially in combination with data usage analytics. A key challenge, however, is classifying similar activities so that they can be measured and associated with impact metrics. This paper contributes: 1) a set of data curation activities; 2) a computational model for identifying curation actions in work log descriptions; and 3) an analysis of frequent data curation activities at ICPSR over time. We first propose a set of data curation actions to help us analyze the impact of curation work. We then use this set to annotate a set of data curation logs, which contain records of data transformations and project management decisions completed by archive staff. Finally, we train a text classifier to detect the frequency of curation actions in a large set of work logs. Our approach supports the analysis of curation work documented in work log systems as an important step toward studying the relationship between research data curation and data reuse. 
    more » « less