Patch Generation with Language Models: Feasibility and Scaling Behavior

Kolak, Sophia D.; Martins, Ruben; Le Goues, Claire; Hellendoorn, Vincent Josua

Citation Details

Large language models have shown a propensity for generating correct, multi-line programs from natural language prompts. Given past findings highlighting that bugs and patches can be distinguished by predictability according to simple language models, it is natural to ask if modern, large neural options lend themselves especially well to program repair without any calibration. We study this in the context of one-line bugs, providing a series of models of varying scales (from 160M to 12B parameters) with the context preceding a buggy line in 72 Java and Python programs and analyze the rank at which the correct patch (and original buggy line) is generated, if at all. Our results highlight a noticeable correlation of model size with test-passing accuracy and patch ranking quality, as well as several other findings related to the differences between the two languages and the propensity for especially the largest models to generate candidate patches that closely resemble (if not exactly match), the original developer patch. more »

Award ID(s):: 1762363

PAR ID:: 10340618

Author(s) / Creator(s):: Kolak, Sophia D.; Martins, Ruben; Le Goues, Claire; Hellendoorn, Vincent Josua

Date Published:: 2022-04-01

Journal Name:: Deep Learning for Code Workshop

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this