skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Flexible Program Alignment to Deliver Data-Driven Feedback to Novice Programmers
Supporting novice programming learners at scale has become a necessity. Such a support generally consists of delivering automated feedback on what and why learners did incorrectly. Existing approaches cast the problem as automatically repairing learners’ incorrect programs; specifically, data-driven approaches assume there exists a correct program provided by other learner that can be extrapolated to repair an incorrect program. Unfortunately, their repair potential, i.e., their capability of providing feedback, is hindered by how they compare programs. In this paper, we propose a flexible program alignment based on program dependence graphs, which we enrich with semantic information extracted from the programs, i.e., operations and calls. Having a correct and an incorrect graphs, we exploit approximate graph alignment to find correspondences at the statement level between them. Each correspondence has a similarity attached to it that reflects the matching affinity between two statements based on topology (control and data flow information) and semantics (operations and calls). Repair suggestions are discovered based on this similarity. We evaluate our flexible approach with respect to rigid schemes over correct and incorrect programs belonging to nine real-world introductory programming assignments. We show that our flexible program alignment is feasible in practice, achieves better performance than rigid program comparisons, and is more resilient when limiting the number of available correct programs.  more » « less
Award ID(s):
1915404
PAR ID:
10279876
Author(s) / Creator(s):
; ;
Editor(s):
Cristea, Alexandra I.; Troussas, Christos
Date Published:
Journal Name:
Lecture notes in computer science
Volume:
12677
ISSN:
0302-9743
Page Range / eLocation ID:
247-258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Crossley, Scott; Popescu, Elvira (Ed.)
    Automated program repair is a promising approach to deliver feedback to novice learners at scale. CLARA is an effective repairer that uses a correct program to fix an incorrect program. CLARA suffers from two main issues: rigid matching and lack of support for typical constructs and tasks in introductory programming assignments. We present several modifications to CLARA to overcome these problems. We propose approximate graph matching based on semantic and topological information of the programs compared, and modify CLARA’s abstract syntax tree processor and interpreter to support new constructs and tasks like reading from/writing to console. Our experiments show that, thanks to our modifications, we can apply CLARA to real-world programs. Also, our approximate graph matching allows us to repair many incorrect programs that are not repaired using rigid program matching. 
    more » « less
  2. To repair a program does not mean to make it (absolutely) correct; it only means to make it more-correct than it was originally. This is not a mundane academic distinction: Given that programs typically have about a dozen faults per KLOC, it is important for program repair methods and tools to be designed in such a way that they map an incorrect program into a more-correct, albeit still potentially incorrect, program. Yet in the absence of a concept of relative correctness, many program repair methods and tools resort to approximations of absolute correctness; since these methods and tools are often validated against programs with a single fault, making them absolutely correct is indistinguishable from making them more-correct; this has contributed to conceal/ obscure the absence of (and the need for) relative correctness. In this paper we propose a theory of program repair based on a concept of relative correctness. We aspire to encourage researchers in program repair to explicitly specify what concept of relative correctness their method or tool is based upon; and to validate their method or tool by proving that it does enhance relative correctness, as defined. 
    more » « less
  3. Cristea, Alexandra I.; Troussas, Christos (Ed.)
    The number of introductory programming learners is increasing worldwide. Delivering feedback to these learners is important to support their progress; however, traditional methods to deliver feedback do not scale to thousands of programs. We identify several opportunities to improve a recent data-driven technique to analyze individual program statements. These statements are grouped based on their semantic intent and usually differ on their actual implementation and syntax. The existing technique groups statements that are semantically close, and considers outliers those statements that reduce the cohesiveness of the clusters. Unfortunately, this approach leads to many statements to be considered outliers. We propose to reduce the number of outliers through a new clustering algorithm that processes vertices based on density. Our experiments over six real-world introductory programming assignments show that we are able to reduce the number of outliers and, therefore, increase the total coverage of the programs that are under evaluation. 
    more » « less
  4. Fault localization is essential to software maintenance tasks such as testing and automated program repair. Many fault localization techniques have been developed, the most common of which are spectrum-based. Most techniques have been designed for traditional programming paradigms that map passing and failing test cases to lines or branches of code, hence specialized programming paradigms which utilize different code abstractions may fail to localize well. In this paper, we study fault localization in the context of a class of programs, molecular programs. Recent research has designed automated testing and repair frameworks for these programs but has ignored the importance of fault localization. As we demonstrate, using existing spectrum-based approaches may not provide much information. Instead we propose a novel approach, Traceback, that leverages temporal trace data. In an empirical study on a set of 89 faulty program variants, we demonstrate that Traceback provides between a 32-90\% improvement in localization over reaction-based mapping, a direct translation of spectrum-based localization. We see little difference in parameter tuning of Traceback when all tests, or only code-based (invariant) tests are used, however the best depth and weight parameters vary when using specification based tests, which can be either functional or metamorphic. Overall, invariant-based tests provide the best localization results (either alone or in combination with others), followed by metamorphic and then functional tests. 
    more » « less
  5. Call graph or caller-callee relationships have been used for various kinds of static program analysis, performance analysis and profiling, and for program safety or security analysis such as detecting anomalies of program execution or code injection attacks. However, different tools generate call graphs in different formats, which prevents efficient reuse of call graph results. In this paper, we present an approach of using ontology and resource description framework (RDF) to create knowledge graphs for specifying call graphs to facilitate the construction of full-fledged and complex call graphs of computer programs, realizing more interoperable and scalable program analyses than conventional approaches. We create a formal ontology-based specification of call graph information to capture concepts and properties of both static and dynamic call graphs so different tools can collaboratively contribute to more comprehensive analysis results. Our experiments show that ontology enables merging of call graphs generated from different tools and flexible queries using a standard query interface. 
    more » « less