The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.
Title: Customizing Feedback for Introductory Programming Courses Using Semantic Clusters
The number of introductory programming learners is increasing worldwide. Delivering feedback to these learners is important to support their progress; however, traditional methods to deliver feedback do not scale to thousands of programs. We identify several opportunities to improve a recent data-driven technique to analyze individual program statements. These statements are grouped based on their semantic intent and usually differ on their actual implementation and syntax. The existing technique groups statements that are semantically close, and considers outliers those statements that reduce the cohesiveness of the clusters. Unfortunately, this approach leads to many statements to be considered outliers. We propose to reduce the number of outliers through a new clustering algorithm that processes vertices based on density. Our experiments over six real-world introductory programming assignments show that we are able to reduce the number of outliers and, therefore, increase the total coverage of the programs that are under evaluation. more »« less
Marin, Victor J.; Contractor, Maheen Riaz; Rivero, Carlos R.(
, Lecture notes in computer science)
Cristea, Alexandra I.; Troussas, Christos
(Ed.)
Supporting novice programming learners at scale has become a necessity. Such a support generally consists of delivering automated feedback on what and why learners did incorrectly. Existing approaches cast the problem as automatically repairing learners’ incorrect programs; specifically, data-driven approaches assume there exists a correct program provided by other learner that can be extrapolated to repair an incorrect program. Unfortunately, their repair potential, i.e., their capability of providing feedback, is hindered by how they compare programs. In this paper, we propose a flexible program alignment based on program dependence graphs, which we enrich with semantic information extracted from the programs, i.e., operations and calls. Having a correct and an incorrect graphs, we exploit approximate graph alignment to find correspondences at the statement level between them. Each correspondence has a similarity attached to it that reflects the matching affinity between two statements based on topology (control and data flow information) and semantics (operations and calls). Repair suggestions are discovered based on this similarity. We evaluate our flexible approach with respect to rigid schemes over correct and incorrect programs belonging to nine real-world introductory programming assignments. We show that our flexible program alignment is feasible in practice, achieves better performance than rigid program comparisons, and is more resilient when limiting the number of available correct programs.
Wang, Wengran; Zhang, Chenhao; Stahlbauer, Andreas; Fraser, Gordon; Price, Thomas W.(
, ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education)
null
(Ed.)
Programming environments such as Snap, Scratch, and Processing engage learners by allowing them to create programming artifacts such as apps and games, with visual and interactive output. Learning programming with such a media-focused context has been shown to increase retention and success rate. However, assessing these visual, interactive projects requires time and laborious manual effort, and it is therefore difficult to offer automated or real-time feedback to students as they work. In this paper, we introduce SnapCheck, a dynamic testing framework for Snap that enables instructors to author test cases with Condition-Action templates. The goal of SnapCheck is to allow instructors or researchers to author property-based test cases that can automatically assess students' interactive programs with high accuracy. Our evaluation of SnapCheck on 162 code snapshots from a Pong game assignment in an introductory programming course shows that our automated testing framework achieves at least 98% accuracy over all rubric items, showing potentials to use SnapCheck for auto-grading and providing formative feedback to students.
Contractor, Maheen Riaz; Rivero, Carlos R(
, Springer)
Crossley, Scott; Popescu, Elvira
(Ed.)
Automated program repair is a promising approach to deliver feedback to novice learners at scale. CLARA is an effective repairer that uses a correct program to fix an incorrect program. CLARA suffers from two main issues: rigid matching and lack of support for typical constructs and tasks in introductory programming assignments. We present several modifications to CLARA to overcome these problems. We propose approximate graph matching based on semantic and topological information of the programs compared, and modify CLARA’s abstract syntax tree processor and interpreter to support new constructs and tasks like reading from/writing to console. Our experiments show that, thanks to our modifications, we can apply CLARA to real-world programs. Also, our approximate graph matching allows us to repair many incorrect programs that are not repaired using rigid program matching.
Kiral, Isabell; Roy, Subhrajit; Mummert, Todd; Braz, Alan; Tsay, Jason; Tang, Jianbin; Asif, Umar; Schaffter, Thomas; Mehmet, Eren; Picone, Joseph; et al(
, Challenges in Machine Learning Competitions for All (CiML))
null
(Ed.)
The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem
Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer†
* These authors contributed equally to this work
† Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com
◊
Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section
J.
Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia.
Introduction
This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating.
The challenge: enabling wide participation
With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5].
In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles:
1.
No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.)
2.
No participant should need to be an expert data scientist.
3.
No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.)
4.
No participant should need to provide their own computing resources.
In addition to the above, our platform should further
•
guide participants through the entire process from sign-up to model submission,
•
facilitate collaboration, and
•
provide instant feedback to the participants through data visualisation and intermediate online leaderboards.
The platform
The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team.
Figure 1: High-level architecture of the challenge platform
Measuring success
The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication.
Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists.
Figure 2: Primary occupation as reported by challenge participants
Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience.
Conclusion
Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge.
There have been many calls recently for computing for all across the nation. While there are many opportunities to study and use computing to advance the fields of computer science, software development, and information technology, computing is also needed in a wide range of other disciplines, including engineering. Most engineering programs require students take a course that teaches them introductory programming, which covers many of the same topics as an introductory course for computing majors (and at times may be the same course). However, statistics about the success of a course that is an introductory programming course are sobering; approximately half the students will fail, forcing them to either repeat the course or leave their chosen field of study if passing the course is required.
This NSF IUSE project incorporates instructional techniques identified through educational psychology research as effective ways to improve student learning and retention in introductory programming. The research team has developed worked examples of problems that incorporate subgoal labels, which are explanations that describe the function of steps in the problem solution to the learner and highlight the problem-solving process. Using subgoal labels within worked examples, which has been effective in other STEM fields, students are able to see an expert's problem solving process, which helps students learn to solving problems before they can solve problem themselves. Experts, including instructors, teaching introductory level courses are often unable to explain the process they use in problem solving at a level that learners can grasp because they have automated much of the problem-solving processes after many years of practice. This submission will present the results of the first part of development of subgoals and will explain how to integrate them into classroom lessons in introductory computing classes.
Marin, Victor J., Hosseini, Hadi, and Rivero, Carlos R. Customizing Feedback for Introductory Programming Courses Using Semantic Clusters. Retrieved from https://par.nsf.gov/biblio/10279878. Lecture notes in computer science 12677. Web. doi:10.1007/978-3-030-80421-3_30.
Marin, Victor J., Hosseini, Hadi, & Rivero, Carlos R. Customizing Feedback for Introductory Programming Courses Using Semantic Clusters. Lecture notes in computer science, 12677 (). Retrieved from https://par.nsf.gov/biblio/10279878. https://doi.org/10.1007/978-3-030-80421-3_30
@article{osti_10279878,
place = {Country unknown/Code not available},
title = {Customizing Feedback for Introductory Programming Courses Using Semantic Clusters},
url = {https://par.nsf.gov/biblio/10279878},
DOI = {10.1007/978-3-030-80421-3_30},
abstractNote = {The number of introductory programming learners is increasing worldwide. Delivering feedback to these learners is important to support their progress; however, traditional methods to deliver feedback do not scale to thousands of programs. We identify several opportunities to improve a recent data-driven technique to analyze individual program statements. These statements are grouped based on their semantic intent and usually differ on their actual implementation and syntax. The existing technique groups statements that are semantically close, and considers outliers those statements that reduce the cohesiveness of the clusters. Unfortunately, this approach leads to many statements to be considered outliers. We propose to reduce the number of outliers through a new clustering algorithm that processes vertices based on density. Our experiments over six real-world introductory programming assignments show that we are able to reduce the number of outliers and, therefore, increase the total coverage of the programs that are under evaluation.},
journal = {Lecture notes in computer science},
volume = {12677},
author = {Marin, Victor J. and Hosseini, Hadi and Rivero, Carlos R.},
editor = {Cristea, Alexandra I. and Troussas, Christos}
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.