In-app privacy notices can help smartphone users make informed privacy decisions. However, they are rarely used in real-world apps, since developers often lack the knowledge, time, and resources to design and implement them well. We present Honeysuckle, a programming tool that helps Android developers build in-app privacy notices using an annotation-based code generation approach facilitated by an IDE plugin, a build system plugin, and a library. We conducted a within-subjects study with 12 Android developers to evaluate Honeysuckle. Each participant was asked to implement privacy notices for two popular open-source apps using the Honeysuckle library as a baseline as well as the annotation-based approach. Our results show that the annotation-based approach helps developers accomplish the task faster with significantly lower cognitive load. Developers preferred the annotation-based approach over the library approach because it was much easier to learn and use and allowed developers to achieve various types of privacy notices using a unified code format, which can enhance code readability and benefit team collaboration.
more »
« less
CYCLE: Learning to Self-Refine the Code Generation
Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well. In this paper, we propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback, such as the execution results reported by the test suites. We evaluate CYCLE on three popular code generation benchmarks, HumanEval, MBPP, and APPS. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B, and the experiments show that CYCLE consistently boosts the code generation performance, by up to 63.5
more »
« less
- PAR ID:
- 10523085
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 8
- Issue:
- OOPSLA1
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 392 to 418
- Subject(s) / Keyword(s):
- CodeLanguageModels SourceCodeModeling CodeGeneration Iterative Programming
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Large Language Models (LLMs) have demonstrated unprecedented capability in code generation. However, LLM-generated code is still plagued with a wide range of functional errors, especially for complex programming tasks that LLMs have not seen before. Recent studies have shown that developers often struggle with inspecting and fixing incorrect code generated by LLMs, diminishing their productivity and trust in LLM-based code generation. Inspired by the mutual grounding theory in communication, we propose an interactive approach that leverages code comments as a medium for developers and LLMs to establish a shared understanding. Our approach facilitates iterative grounding by interleaving code generation, inline comment generation, and contextualized user feedback through editable comments to align generated code with developer intent. We evaluated our approach on two popular benchmarks and demonstrated that our approach significantly improved multiple state-of-the-art LLMs, e.g., 17.1% pass@1 improvement for code-davinci-002 on HumanEval. Furthermore, we conducted a user study with 12 participants in comparison to two baselines: (1) interacting with GitHub Copilot, and (2) interacting with a multi-step code generation paradigm called Multi-Turn Program Synthesis. Participants completed the given programming tasks 16.7% faster and with 10.5% improvement in task success rate when using our approach. Both results show that interactively refining code comments enables the collaborative establishment of mutual grounding, leading to more accurate code generation and higher developer confidence.more » « less
-
We present Roots, a full-stack monitoring and analysis system for performance anomaly detection and bottleneck identification in cloud platform-as-a-service (PaaS) systems. Roots facilitates application performance monitoring as a core capability of PaaS clouds, and relieves the developers from having to instrument application code. Roots tracks HTTP/S requests to hosted cloud applications and their use of PaaS services. To do so it employs lightweight monitoring of PaaS service interfaces. Roots processes this data in the background using multiple statistical techniques that in combination detect performance anomalies (i.e. violations of service-level objectives). For each anomaly, Roots determines whether the event was caused by a change in the request workload or by a performance bottleneck in a PaaS service. By correlating data collected across different layers of the PaaS, Roots is able to trace high-level performance anomalies to bottlenecks in specific components in the cloud platform. We implement Roots using the AppScale PaaS and evaluate its overhead and accuracy.more » « less
-
Recent advances in large language models (LMs) have facilitated their ability to synthesize programming code. However, they have also raised concerns about intellectual property (IP) rights violations. Despite the significance of this issue, it has been relatively less explored. In this paper, we aim to bridge the gap by presenting CODEIPPROMPT, a platform for automatic evaluation of the extent to which code language models may reproduce licensed programs. It comprises two key components: prompts constructed from a licensed code database to elicit LMs to generate IP-violating code, and a measurement tool to evaluate the extent of IP violation of code LMs. We conducted an extensive evaluation of existing open-source code LMs and commercial products, and revealed the prevalence of IP violations in all these models. We further identified that the root cause is the substantial proportion of training corpus subject to restrictive licenses, resulting from both intentional inclusion and inconsistent license practice in the real world. To address this issue, we also explored potential mitigation strategies, including fine-tuning and dynamic token filtering. Our study provides a testbed for evaluating the IP violation issues of the existing code generation platforms and stresses the need for a better mitigation strategy.more » « less
-
Abstract Pre-trained transformer language models (LMs) on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for the generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or EB samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal LMs can generate chemically valid material compositions with as high as 97.61% to be charge neutral and 91.22% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our LMs also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using density functional theory calculations. All our trained materials transformer models and code can be accessed freely at http://www.github.com/usccolumbia/MTransformer .more » « less
An official website of the United States government

