skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2226408

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Data leakage remains a pervasive issue in machine learning (ML), especially when applied to science, leading to overly optimistic performance estimates and irreproducible findings. Despite its prevalence, data leakage receives limited attention in ML education, in part due to the lack of accessible, hands-on teaching resources. To address this gap, we developed interactive learning modules in which students reproduce examples from academic publications that are affected by data leakage, then repeat the evaluation without the data leakage error to see how the finding is affected. These modules were deployed by the authors in two introductory machine learning courses, enabling students to explore common forms of leakage and their impact on model reliability. Following their engagement with these materials, student feedback highlighted increased awareness of subtle pitfalls that can compromise machine learning workflows. 
    more » « less
    Free, publicly-accessible full text available July 29, 2026
  2. Thanks to increasing awareness of the importance of reproducibility in computer science research, initiatives such as artifact review and badging have been introduced to encourage reproducible research in this field. However, making "practical reproducibility" truly widespread requires more than just incentives. It demands an increase in capacity for reproducible research among computer scientists - more tools, workflows, and exemplar artifacts, and more human resources trained in best practices for reproducibility. In this paper, we describe our experiences in the first two years of the Summer of Reproducibility (SoR), a mentoring program that seeks to build global capacity by enabling students around the world to work with expert mentors while producing reproducibility artifacts, tools, and education materials. We give an overview of the program, report preliminary outcomes, and discuss plans to evolve this program. 
    more » « less
    Free, publicly-accessible full text available July 29, 2026
  3. Free, publicly-accessible full text available April 30, 2026
  4. With increasing recognition of the importance of reproducibility in computer science research, a wide range of efforts to promote reproducible research have been implemented across various sub-disciplines of computer science. These include artifact review and badging processes, and dedicated reproducibility tracks at conferences. However, these initiatives primarily engage active researchers and students already involved in research in their respective areas. In this paper, we present an argument for expanding the scope of these efforts to include a much larger audience, by introducing more reproducibility content into computer science courses. We describe various ways to integrate reproducibility content into the curriculum, drawing on our own experiences, as well as published experience reports from several sub-disciplines of computer science and computational science. 
    more » « less