The primary goal of the authentic learning approach is to engage and motivate students in learning real world problem solving. We report our experience in developing k-nearest neighbor (KNN) classification for anomaly user behavior detection, one of the authentic machine learning for cybersecurity (ML4Cybr) learning modules based on 10 cybersecurity (CybrS) cases with machine learning (ML) solutions. All portable labs are made available on Google CoLab. So students can access and practice these hands-on labs anywhere and anytime without software installation and configuration which will engage students in learning concepts immediately and getting more experience for hands-on problem solving skills.
more »
« less
This content will become publicly available on July 29, 2026
Learning from Irreproducibility: Introducing Data Leakage Case Studies for Machine Learning Education
Data leakage remains a pervasive issue in machine learning (ML), especially when applied to science, leading to overly optimistic performance estimates and irreproducible findings. Despite its prevalence, data leakage receives limited attention in ML education, in part due to the lack of accessible, hands-on teaching resources. To address this gap, we developed interactive learning modules in which students reproduce examples from academic publications that are affected by data leakage, then repeat the evaluation without the data leakage error to see how the finding is affected. These modules were deployed by the authors in two introductory machine learning courses, enabling students to explore common forms of leakage and their impact on model reliability. Following their engagement with these materials, student feedback highlighted increased awareness of subtle pitfalls that can compromise machine learning workflows.
more »
« less
- Award ID(s):
- 2226408
- PAR ID:
- 10636853
- Publisher / Repository:
- ACM
- Date Published:
- Format(s):
- Medium: X
- Location:
- Vancouver, BC, Canada
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This research paper systematically identifies the perceptions of learning machine learning (ML) topics. To keep up with the ever-increasing need for professionals with ML expertise, for-profit and non-profit organizations conduct a wide range of ML-related courses at undergraduate and graduate levels. Despite the availability of ML-related education materials, there is lack of understanding how students perceive ML-related topics and the dissemination of ML-related topics. A systematic categorization of students' perceptions of these courses can aid educators in understanding the challenges that students face, and use that understanding for better dissemination of ML-related topics in courses. The goal of this paper is to help educators teach machine learning (ML) topics by providing an experience report of students' perceptions related to learning ML. We accomplish our research goal by conducting an empirical study where we deploy a survey with 83 students across five academic institutions. These students are recruited from a mixture of undergraduate and graduate courses. We apply a qualitative analysis technique called open coding to identify challenges that students encounter while studying ML-related topics. Using the same qualitative analysis technique we identify quality aspects do students prioritize ML-related topics. From our survey, we identify 11 challenges that students face when learning about ML topics, amongst which data quality is the most frequent, followed by hardware-related challenges. We observe the majority of the students prefer hands-on projects over theoretical lectures. Furthermore, we find the surveyed students to consider ethics, security, privacy, correctness, and performance as essential considerations while developing ML-based systems. Based on our findings, we recommend educators who teach ML-related courses to (i) incorporate hands-on projects to teach ML-related topics, (ii) dedicate course materials related to data quality, (iii) use lightweight virtualization tools to showcase computationally intensive topics, such as deep neural networks, and (iv) empirical evaluation of how large language models can be used in ML-related education.more » « less
-
This paper presents the findings of action research conducted to evaluate new modules created to teach learners how to apply machine learning (ML) and artificial intelligence (AI) techniques to malware data sets. The trend in the data suggest that learners with cybersecurity competencies may be better prepared to complete the AI/ML modules’ exercises than learners with AI/ML competencies. We describe the challenge of identifying prerequisites that could be used to determine learner readiness, report our findings, and conclude with the implications for instructional design and teaching practice.more » « less
-
Recent discoveries by neutrino telescopes, such as the IceCube Neutrino Observatory, relied extensively on machine learning (ML) tools to infer physical quantities from the raw photon hits detected. Neutrino telescope reconstruction algorithms are limited by the sparse sampling of photons by the optical modules due to the relatively large spacing (10–100 m) between them. In this Letter, we propose a novel technique that learns photon transport through the detector medium through the use of deep-learning-driven superresolution of data events. These “improved” events can then be reconstructed using traditional or ML techniques, resulting in improved resolution. Our strategy arranges additional “virtual” optical modules within an existing detector geometry and trains a convolutional neural network to predict the hits on these virtual optical modules. We show that this technique improves the angular reconstruction of muons in a generic ice-based neutrino telescope. Our results readily extend to water-based neutrino telescopes and other event morphologies. Published by the American Physical Society2025more » « less
-
Artificial Intelligence and Machine Learning continue to increase in popularity. As a result, several new approaches to machine learning education have emerged in recent years. Many existing interactive techniques utilize text, image, and video data to engage students with machine learning. However, the use of physiological sensors for machine learning education activities is significantly unexplored. This paper presents findings from a study exploring students’ experiences learning basic machine learning concepts while using physiological sensors to control an interactive game. In particular, the sensors measured electrical activity generated from students’ arm muscles. Activities featuring physiological sensors produced similar outcomes when compared to exercises that leveraged image data. While students’ machine learning self-efficacy increased in both conditions, students seemed more curious about machine learning after working with the physiological sensor. These results suggest that PhysioML may provide learning support similar to traditional ML education approaches while engaging students with novel interactive physiological sensors. We discuss these findings and reflect on ways physiological sensors may be used to augment traditional data types during classroom activities focused on machine learning.more » « less
An official website of the United States government
