skip to main content


Title: Fostering Data Reusability: Increasing Impact and Ease in Sharing and Reusing Research Data - Workshop Report and Action Steps
This workshop report tackles one of the most significant barriers to progress in making research data publicly accessible: the hurdles faced by researchers in producing and reusing publicly accessible research data, both in their research practice and in the surrounding ecosystem shaped by external stakeholders. The central challenge in high quality data sharing is to understand how researchers can increase the downstream value of shared data while reducing burden for both data producers and reusers. The report summarizes recommendations and actions from an NSF-sponsored virtual workshop series on Fostering Data Reusability: Increasing Impact and Ease in Data Sharing and Reuse held in June 2021. The series explored what context data reusers need to evaluate and appropriately reuse the data, identified practices that will improve data reusability and reduce the burden in producing and sharing research data, and used a stakeholder alignment approach to identify actions stakeholders could take to foster progress in reducing burden and increasing impact in data sharing and reuse.  more » « less
Award ID(s):
2039677
NSF-PAR ID:
10293361
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Fostering Data Reusability: Increasing Impact and Ease in Sharing and Reusing Research Data
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure. 
    more » « less
  2. The project mission was to organize a workshop aimed to explore how the US data science community can cooperate with and benefit from collaborations with partners in Serbia and the West Balkan region. The scope included fundamental data science methods and high-impact applications related to big data processing, security and privacy in critical infrastructures, biomedical informatics, and computational archeology. The proposed workshop facilitated closing the gap between data science research in the US and Serbia and the region and brought together data scientists with researchers from disciplines that until recently had little exposure to data science methods, potentially enabling collaborative breakthroughs in those scientific fields. A large fraction of participants from both sides were early career researchers including advanced level graduate students, postdoctoral research associates, and assistant/associate professors within 10 years of obtaining their Ph.D. The participants included a large fraction of female and minority scientists. The workshop objective was achieved by including the following inter-related objectives: (1) Establishing new multidisciplinary international collaborations between data science, mathematics, and sciences that generate big data and require advanced methods; (2) Reinforcing collaboration mechanisms between the NSF and Serbia’s Ministry of Education, Science and Technological Development and organize joint research projects; and (3) Widening the impact of the workshop, by involving researchers and stakeholders from the West Balkan region. The workshop consisted of four tracks, each co-chaired by 3 investigators from the US, Serbia and another West Balkan country. Tangible outcomes from the workshop include a report describing workshop activities for each of four tracks and a proposal recommending research collaboration areas of interest for all parties and determining collaboration mechanisms and programs to facilitate collaboration. 
    more » « less
  3. This paper reports on a project funded through the Engineering Education and Centers (EEC) Division of the National Science Foundation. Since 2010, EEC has funded more than 500 proposals totaling over $150 million through engineering education research (EER) programs such as Research in Engineering Education (REE) and Research in the Formation of Engineers (RFE), to enhance understanding and improve practice. The resulting archive of robust qualitative and quantitative data represents a vast untapped potential to exponentially increase the impact of EEC funding and transform engineering education. But tapping this potential has thus far been an intractable problem, despite ongoing calls for data sharing by public funders of research. Changing the paradigm of single-use data collection requires actionable, proven practices for effective, ethical data sharing, coupled with sufficient incentives to both share and use existing data. To that end, this project draws together a team of experts to overcome substantial obstacles in qualitative data sharing by building a framework to guide secondary analysis in engineering education research (EER), and to test this framework using pioneering data sets. Herein, we report on accomplishments within the first year of the project during which time we gathered a group of 13 expert qualitative researchers to engage in the first of a series of working meetings intended to meet our project goals. We came into this first workshop with a potentially limiting definition of secondary data analysis and the idea that people would want to share existing datasets if we could find ways around anticipated hurdles. However, the workshop yielded a broader definition of secondary data analysis and revealed a stronger interest in creating new datasets designed for sharing rather than sharing existing datasets. Thus, we have reconceived our second phase as one that is a cohesive effort based on an inclusive “open cohort model” to pilot projects related to secondary data analysis. 
    more » « less
  4. Social media provides unique opportunities for researchers to learn about a variety of phenomena—it is often publicly available, highly accessible, and affords more naturalistic observation. However, as research using social media data has increased, so too has public scrutiny, highlighting the need to develop ethical approaches to social media data use. Prior work in this area has explored users’ perceptions of researchers’ use of social media data in the context of a single platform. In this paper, we expand on that work, exploring how platforms and their affordances impact how users feel about social media data reuse. We present results from three factorial vignette surveys, each focusing on a different platform—dating apps, Instagram, and Reddit—to assess users’ comfort with research data use scenarios across a variety of contexts. Although our results highlight different expectations between platforms depending on the research domain, purpose of research, and content collected, we find that the factor with the greatest impact across all platforms is consent—a finding which presents challenges for big data researchers. We conclude by offering a sociotechnical approach to ethical decision-making. This approach provides recommendations on how researchers can interpret and respond to platform norms and affordances to predict potential data use sensitivities. The approach also recommends that researchers respond to the predominant expectation of notification and consent for research participation by bolstering awareness of data collection on digital platforms. 
    more » « less
  5. The FAIR Hackathon Workshop for Mathematics and the Physical Sciences (MPS) February 27-28, 2019 in Alexandria, Virginia brought together forty-four stakeholders in the physical sciences community to share skills, tools and techniques to FAIRify research data. As one of the first efforts of its kind in the US, the workshop offered participants a way to engage with FAIR principles (Findable, Accessible, Interoperable and Reusable) Data and metrics in the context of a hackathon. The workshop was designed to address issues of public access to data and to provide experience with FAIR tools and relevant hands-on experience for researchers. Existing FAIR tools and infrastructure were introduced. Hands-on hackathon breakout time was devoted to testing FAIR metrics and tools against physical sciences data. The hackathon invited MPS research data management stakeholders to react to the FAIR principles and to jointly consider gaps in the MPS data sharing ecosystem in the context of researcher’s actual projects. FAIR Gap analysis was introduced as a way to identify community-specific tools or infrastructure that could dramatically enhance the ability of domain scientists to make their data more FAIR. 
    more » « less