skip to main content


This content will become publicly available on April 12, 2025

Title: Data-Driven Evidence-Based Syntactic Sugar Design
Programming languages are essential tools for developers, and their evolution plays a crucial role in supporting the activities of developers. One instance of programming language evolution is the introduction of syntactic sugars, which are additional syntax elements that provide alternative, more readable code constructs. However, the process of designing and evolving a programming language has traditionally been guided by anecdotal experiences and intuition. Recent advances in tools and methodologies for mining open-source repositories have enabled developers to make data-driven software engineering decisions. In light of this, this paper proposes an approach for motivating data-driven programming evolution by applying frequent subgraph mining techniques to a large dataset of 166,827,154 open-source Java methods. The dataset is mined by generalizing Java control-flow graphs to capture broad programming language usages and instances of duplication. Frequent subgraphs are then extracted to identify potentially impactful opportunities for new syntactic sugars. Our diverse results demonstrate the benefits of the proposed technique by identifying new syntactic sugars involving a variety of programming constructs that could be implemented in Java, thus simplifying frequent code idioms. This approach can potentially provide valuable insights for Java language designers, and serve as a proof-of-concept for data-driven programming language design and evolution.  more » « less
Award ID(s):
2223812 2120448 2152117
PAR ID:
10543737
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Association for Computing Machinery
Date Published:
ISBN:
9798400702174
Subject(s) / Keyword(s):
syntactic sugars, data-driven language design, subgraph mining
Format(s):
Medium: X
Location:
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal
Sponsoring Org:
National Science Foundation
More Like this
  1. With Kotlin becoming a viable language replacement for Java, there is a need for translators and data flow analysis libraries to create maintainable and readable source code. Instagram, Uber, and Gradle are only a few of the large corporations that have either switched from Java to Kotlin completely or started to use it in internal tools in order to reduce code base size. Developers have claimed that Kotlin is fun to use in comparison to Java and much of the boilerplate code is reduced. With Java being the main language for the open source organization, PhenoApps, there is a need to support both Java and Kotlin to increase the maintainability of the code. Fortunately, JetBrains has an open-source IDE plugin for translating Java to Kotlin; however, the translation has some fundamental issues which shall be discussed further in this paper. Introducing, j2k, a CLI translation tool which includes various anti-pattern detection for syntactical formatting, performance, and other Android requirements. The new tool introduced within this paper, j2kCLI allows users to directly translate strings of Java code to Kotlin, or entire directories. This facilitates the maintainability of a large open source code base. 
    more » « less
  2. null (Ed.)
    Polyglot programming, the use of multiple programming languages during the development process, is common practice in modern software development. This study investigates this practice through a randomized controlled trial conducted under the context of database programming. Participants in the study were given coding tasks written in Java and one of three SQL-like embedded languages. One was plain SQL in strings, one was in Java only, and the third was a hybrid embedded language that was closer to the host language. We recorded 109 valid data points. Results showed significant differences in how developers of different experience levels code using polyglot techniques. Notably, less experienced programmers wrote correct programs faster in the hybrid condition (frequent, but less severe, switches), while more experienced developers that already knew both languages performed better in traditional SQL (less frequent, but more complete, switches). The results indicate that the productivity impact of polyglot programming is complex and experience level dependent. 
    more » « less
  3. Abstract

    Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language modelsurprisalas a guiding mechanism. While surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity—do programmers prefer more predictable expressions in code? Usingmeaning‐preserving transformations, we produce equivalent alternatives to developer‐written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developerschooseto write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmersdoprefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences.

     
    more » « less
  4. Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generates a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach. 
    more » « less
  5. Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision. 
    more » « less