NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching

Nong, Yu; Yang, Haoran; Cheng, Long; Hu, Hongxin; Cai, Haipeng (August 2025, USENIX)

Free, publicly-accessible full text available August 13, 2026
Code Speaks Louder: Exploring Security and Privacy Relevant Regional Variations in Mobile Applications

https://doi.org/10.1109/SP61157.2025.00225

Guo, Jiawei; Nong, Yu; Lin, Zhiqiang; Cai, Haipeng (May 2025, IEEE)

Free, publicly-accessible full text available May 12, 2026
Towards LLM-Assisted Vulnerability Detection and Repair for Open-Source 5G UE Implementations

https://doi.org/10.14722/futureg.2025.23021

Patir, Rupam; Huang, Qiqing; Guo, Keyan; Guo, Wanda; Gu, Guofei; Cai, Haipeng; Hu, Hongxin (January 2025, Internet Society)

Free, publicly-accessible full text available January 1, 2026
Learning to Detect and Localize Multilingual Bugs

https://doi.org/10.1145/3660804

Yang, Haoran; Nong, Yu; Zhang, Tao; Luo, Xiapu; Cai, Haipeng (July 2024, Proceedings of the ACM on Software Engineering)

Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e., multilingual bugs). Yet existing tool support for bug detection/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static/dynamic analysis and deep learning (DL) based approaches all face major challenges in addressing multilingual bugs. In this paper, we present xLoc, a DL-based technique/tool for detecting and localizing multilingual bugs. Motivated by results of our bug-characteristics study on top locations of multilingual bugs, xLoc first learns the general knowledge relevant to differentiating various multilingual control-flow structures. This is achieved by pre-training a Transformer model with customized position encoding against novel objectives. Then, xLoc learns task-specific knowledge for the task of multilingual bug detection/localization, through another new position encoding scheme (based on cross-language API vicinity) that allows for the model to attend particularly to control-flow constructs that bear most multilingual bugs during fine-tuning. We have implemented xLoc for Python-C software and curated a dataset of 3,770 buggy and 15,884 non-buggy Python-C samples, which enabled our extensive evaluation of xLoc against two state-of-the-art baselines: fine-tuned CodeT5 and zero-shot ChatGPT. Our results show that xLoc achieved 94.98% F1 and 87.24%@Top-1 accuracy, which are significantly (up to 162.88% and 511.75%) higher than the baselines. Ablation studies further confirmed significant contributions of each of the novel design elements in xLoc. With respective bug-location characteristics and labeled bug datasets for fine-tuning, our design may be applied to other language combinations beyond Python-C.
more » « less
Full Text Available
Multi-Language Software Development: Issues, Challenges, and Solutions

https://doi.org/10.1109/TSE.2024.3358258

Yang, Haoran; Nong, Yu; Wang, Shaowei; Cai, Haipeng (March 2024, IEEE Transactions on Software Engineering)

Developing software projects that incorporate multiple languages has been a prevalent practice for many years. However, the issues encountered by developers during the development process, the underlying challenges causing these issues, and the solutions provided to developers remain unknown. In this paper, our objective is to provide answers to these questions by conducting a study on developer discussions on Stack Overflow (SO). Through a manual analysis of 586 highly relevant posts spanning 14 years, we revealed that multilingual development is a highly and sustainably active topic on SO, with older questions becoming inactive and newer ones getting first asked (and then mostly remaining active for more than one year). From these posts, we observed a diverse array of issues (11 categories), primarily centered around interfacing and data handling across different languages. Our analysis suggests that error/exception handling issues were the most difficult to resolve among those issue categories, while security related issues were most likely to receive an accepted answer. The primary challenge faced by developers was the complexity and diversity inherent in building multilingual code and ensuring interoperability. Additionally, developers often struggled due to a lack of technical expertise on the varied features of different programming languages (e.g., threading and memory management mechanisms). In addition, properly handling message passing across languages constituted a key challenge with using implicit language interfacing. Notably, Stack Overflow emerged as a crucial source of solutions to these challenges, with the majority (73%) of the posts receiving accepted answers, most within a week (36.5% within 24 hours and 25% in the following six days). Based on our analysis results, we have formulated actionable insights and recommendations that can be utilized by researchers and developers in this field.
more » « less
Full Text Available
Understanding GDPR Non-Compliance in Privacy Policies of Alexa Skills in European Marketplaces

https://doi.org/10.1145/3589334.3645409

Liao, Song; Aldeen, Mohammed; Yan, Jingwen; Cheng, Long; Luo, Xiapu; Cai, Haipeng; Hu, Hongxin (May 2024, ACM)

Full Text Available
How Are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual Software

https://doi.org/10.1145/3631967

Li, Wen; Marino, Austin; Yang, Haoran; Meng, Na; Li, Li; Cai, Haipeng (March 2024, ACM Transactions on Software Engineering and Methodology)

For many years now, modern software is known to be developed in multiple languages (hence termed asmultilingualormulti-languagesoftware). Yet, to date, we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e.,language profile) as a basic element of themultilingual constructionin contemporary software engineering is an essential first step. In this article, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presentingan updated overviewof language use in 7,113 GitHub projects spanning the 5 past years by characterizing overall statistics of language profiles, followed bya deeper lookinto the functionality relevance/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of the 10 past years to providea longitudinal viewof how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other findings, our study revealed a growing trend of using three to five languages in one multilingual software project and the noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.
more » « less
Full Text Available
SkillScanner: Detecting Policy-Violating Voice Applications Through Static Analysis at the Development Phase

https://doi.org/10.1145/3576915.3616650

Liao, Song; Cheng, Long; Cai, Haipeng; Guo, Linke; Hu, Hongxin (November 2023, ACM)

Full Text Available
PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing

https://doi.org/10.1145/3576915.3623166

Li, Wen; Yang, Haoran; Luo, Xiapu; Cheng, Long; Cai, Haipeng (November 2023, ACM SIGSAC Conference on Computer and Communications Security (CCS))

Full Text Available
PolyFuzz: Holistic Greybox Fuzzing of Multi-Language Systems

Li, Wen; Ruan, Jinyang; Yi, Guangbei; Cheng, Long; Luo, Xiapu; Cai, Haipeng (August 2023, USENIX Security Symposium)

Full Text Available

« Prev Next »

Search for: All records