Learning type annotation: is big data enough?

Jesse, Kevin; Devanbu, Premkumar T.; Ahmed, Toufique

doi:10.1145/3468264.3473135

Citation Details

Learning type annotation: is big data enough?

TypeScript is a widely used optionally-typed language where developers can adopt “pay as you go” typing: they can add types as desired, and benefit from static typing. The “type annotation tax” or manual effort required to annotate new or existing TypeScript can be reduced by a variety of automatic methods. Probabilistic machine-learning (ML) approaches work quite well. ML approaches use different inductive biases, ranging from simple token sequences to complex graphical neural network (GNN) models capturing syntax and semantic relations. More sophisticated inductive biases are hand-engineered to exploit the formal nature of software. Rather than deploying fancy inductive biases for code, can we just use “big data” to learn natural patterns relevant to typing? We find evidence suggesting that this is the case. We present TypeBert, demonstrating that even with simple token-sequence inductive bias used in BERT-style models and enough data, type-annotation performance of the most sophisticated models can be surpassed. more »

Award ID(s):: 1934568

PAR ID:: 10349460

Author(s) / Creator(s):: Jesse, Kevin; Devanbu, Premkumar T.; Ahmed, Toufique

Date Published:: 2021-08-18

Journal Name:: Proceedings of ESEC/FSE Conference

Page Range / eLocation ID:: 1483 to 1486

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3468264.3473135

More Like this