Sharing high-quality research data specifically for reuse in future work helps the scientific community progress by enabling researchers to build upon existing work and explore new research questions without duplicating data collection efforts. Because current discussions about research artifacts in Computer Security focus on reproducibility and availability of source code, the reusability of data is unclear. We examine data sharing practices in Computer Security and Measurement to provide resources and recommendations for sharing reusable data. Our study covers five years (2019–2023) and seven conferences in Computer Security and Measurement, identifying 948 papers that create a dataset as one of their contributions. We analyze the 265 accessible datasets, evaluating their under-standability and level of reuse. Our findings reveal inconsistent practices in data sharing structure and documentation, causing some datasets to not be shared effectively. Additionally, reuse of datasets is low, especially in fields where the nature of the data does not lend itself to reuse. Based on our findings, we offer data-driven recommendations and resources for improving data sharing practices in our community. Furthermore, we encourage authors to be intentional about their data sharing goals and align their sharing strategies with those goals. 
                        more » 
                        « less   
                    
                            
                            Fostering Data Reusability: Increasing Impact and Ease in Sharing and Reusing Research Data - Workshop Report and Action Steps
                        
                    
    
            This workshop report tackles one of the most significant barriers to progress in making research data publicly accessible: the hurdles faced by researchers in producing and reusing publicly accessible research data, both in their research practice and in the surrounding ecosystem shaped by external stakeholders. The central challenge in high quality data sharing is to understand how researchers can increase the downstream value of shared data while reducing burden for both data producers and reusers. The report summarizes recommendations and actions from an NSF-sponsored virtual workshop series on Fostering Data Reusability: Increasing Impact and Ease in Data Sharing and Reuse held in June 2021. The series explored what context data reusers need to evaluate and appropriately reuse the data, identified practices that will improve data reusability and reduce the burden in producing and sharing research data, and used a stakeholder alignment approach to identify actions stakeholders could take to foster progress in reducing burden and increasing impact in data sharing and reuse. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2039677
- PAR ID:
- 10293361
- Date Published:
- Journal Name:
- Fostering Data Reusability: Increasing Impact and Ease in Sharing and Reusing Research Data
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Incomplete and inconsistent connections between institutional repository holdings and the global data infrastructure inhibit research data discovery and reusability. Preventing metadata loss on the path from institutional repositories to the global research infrastructure can substantially improve research data reusability. The Realities of Academic Data Sharing (RADS) Initiative, funded by the National Science Foundation, is investigating institutional processes for improving research data FAIRness. Focal points of the RADS inquiry are to understand where researchers are sharing their data and to assess metadata quality, i.e., completeness, at six Data Curation Network (DCN) academic institutions: Cornell University, Duke University, University of Michigan, University of Minnesota, Washington University in St. Louis, and Virginia Tech. RADS is examining where researchers are storing their data, considering local institutional repositories and other popular repositories, and analyzing the completeness of the research data metadata stored in these institutional and other repositories. Metadata FAIRness (Findable, Accessible, Interoperable, Reusable) is used as the metric to assess metadata quality as FAIR complete. Research findings show significant content loss when metadata from local institutional repositories are compared to metadata found in DataCite. After examining the factors contributing to this metadata loss, RADS investigators are developing a set of recommended best practices for institutions to increase the quality of their scholarly metadata. Further, documentation such as README files are of particular importance not only for data reuse, but as sources containing valuable metadata such as Persistent Identifiers (PIDs). DOIs and related PIDs such as ORCID and ROR are still rarely used in institutional repositories. More frequent use would have a positive effect on discoverability, interoperability and reusability, especially when transferring to global infrastructure.more » « less
- 
            The project mission was to organize a workshop aimed to explore how the US data science community can cooperate with and benefit from collaborations with partners in Serbia and the West Balkan region. The scope included fundamental data science methods and high-impact applications related to big data processing, security and privacy in critical infrastructures, biomedical informatics, and computational archeology. The proposed workshop facilitated closing the gap between data science research in the US and Serbia and the region and brought together data scientists with researchers from disciplines that until recently had little exposure to data science methods, potentially enabling collaborative breakthroughs in those scientific fields. A large fraction of participants from both sides were early career researchers including advanced level graduate students, postdoctoral research associates, and assistant/associate professors within 10 years of obtaining their Ph.D. The participants included a large fraction of female and minority scientists. The workshop objective was achieved by including the following inter-related objectives: (1) Establishing new multidisciplinary international collaborations between data science, mathematics, and sciences that generate big data and require advanced methods; (2) Reinforcing collaboration mechanisms between the NSF and Serbia’s Ministry of Education, Science and Technological Development and organize joint research projects; and (3) Widening the impact of the workshop, by involving researchers and stakeholders from the West Balkan region. The workshop consisted of four tracks, each co-chaired by 3 investigators from the US, Serbia and another West Balkan country. Tangible outcomes from the workshop include a report describing workshop activities for each of four tracks and a proposal recommending research collaboration areas of interest for all parties and determining collaboration mechanisms and programs to facilitate collaboration.more » « less
- 
            null (Ed.)Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.more » « less
- 
            As funder, journal, and disciplinary norms and mandates have foregrounded obligations of data sharing and opportunities for data reuse, the need to plan for and curate data sets that can reach researchers and end-users with disabilities has become even more urgent. We begin by exploring the disability studies literature, describing the need for advocacy and representation of disabled scholars as data creators, subjects, and users. We then survey the landscape of data repositories, curation guidelines, and research-data-related standards, finding little consideration of accessibility for people with disabilities. We suggest three sets of minimal good practices for moving toward truly accessible research data: 1) ensuring Web accessibility for data repositories; 2) ensuring accessibility of common text formats, including those used in documentation; and 3) enhancement of visual and audiovisual materials. We point to some signs of progress in regard to truly accessible data by highlighting exemplary practices by repositories, standards, and data professionals. Accessibility needs to become a mainstream component of curation practice included in every training, manual, and primer.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    