NSF PAR Search | NSF Public Access Repository

Patch Generation with Language Models: Feasibility and Scaling Behavior

Kolak, Sophia D.; Martins, Ruben; Le Goues, Claire; Hellendoorn, Vincent Josua (April 2022, Deep Learning for Code Workshop)

Large language models have shown a propensity for generating correct, multi-line programs from natural language prompts. Given past findings highlighting that bugs and patches can be distinguished by predictability according to simple language models, it is natural to ask if modern, large neural options lend themselves especially well to program repair without any calibration. We study this in the context of one-line bugs, providing a series of models of varying scales (from 160M to 12B parameters) with the context preceding a buggy line in 72 Java and Python programs and analyze the rank at which the correct patch (and original buggy line) is generated, if at all. Our results highlight a noticeable correlation of model size with test-passing accuracy and patch ranking quality, as well as several other findings related to the differences between the two languages and the propensity for especially the largest models to generate candidate patches that closely resemble (if not exactly match), the original developer patch.

Full Text Available

Search for: All records