NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reinforest: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models

https://doi.org/10.1109/SCAM63643.2024.00026

Saieva, Anthony; Chakraborty, Saikat; Kaiser, Gail (October 2024, IEEE)

This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs) by including both static and dynamic features as well as utilizing both similar and dissimilar examples during training. We present the first-ever code search method that encodes dynamic runtime information during training without the need to execute either the corpus under search or the search query at inference time and the first code search technique that trains on both positive and negative reference samples. To validate the efficacy of our approach, we perform a set of studies demonstrating the capability of enhanced LLMs to perform cross-language code-to-code search. Our evaluation demonstrates that the effectiveness of our approach is consistent across various model architectures and programming languages. We outperform the state-of-the-art crosslanguage search tool by up to 44.7%. Moreover, our ablation studies reveal that even a single positive and negative reference sample in the training process results in substantial performance improvements demonstrating both similar and dissimilar references are important parts of code search. Importantly, we show that enhanced well-crafted, fine-tuned models consistently outperform enhanced larger modern LLMs without fine tuning, even when enhancing the largest available LLMs highlighting the importance for open-sourced models. To ensure the reproducibility and extensibility of our research, we present an open-sourced implementation of our tool and training procedures called REINFOREST.
more » « less
Full Text Available
Elucidating the Role of Ferroelectric in Memory Window Expansion of Ferroelectric FETs With Gate-Side Injection

https://doi.org/10.1109/TED.2025.3552013

Qin, Yixin; Chakraborty, Saikat; Zhao, Zijian; Ma, Sizhe; Jung, Moonyoung; Kim, Kijoon; Lim, Suhwan; Seo, Kwangyou; Kim, Kwangsoo; Kim, Wanki; et al (May 2025, IEEE Transactions on Electron Devices)

Free, publicly-accessible full text available May 1, 2026
Retention Analysis of Ferroelectric FETs with Gate-Side Injection for Vertical NAND Storage

https://doi.org/10.1109/IRPS48204.2025.10983219

Qin, Yixin; Chakraborty, Saikat; Zhao, Zijian; Ma, Sizhe; Jung, Moonyoung; Kim, Kijoon; Lim, Suhwan; Seo, Kwangyou; Kim, Kwangsoo; Kim, Wanki; et al (March 2025, IEEE)

Free, publicly-accessible full text available March 30, 2026
Investigating Read-After-Write Delay in Ferroelectric FET with Gate-Side Injection

https://doi.org/10.1109/IRPS48204.2025.10983406

Ma, Sizhe; Chakraborty, Saikat; Qin, Yixin; Zhao, Zijian; Duan, Jiahui; Jung, Moonyoung; Kim, Kijoon; Lim, Suhwan; Seo, Kwangyou; Kim, Kwangsoo; et al (March 2025, IEEE)

Free, publicly-accessible full text available March 30, 2026
Clarifying the Role of Ferroelectric in Expanding the Memory Window of Ferroelectric FETs with Gate-Side Injection: Isolating Contributions from Polarization and Charge Trapping

https://doi.org/10.1109/IEDM50854.2024.10873569

Qin, Yixin; Chakraborty, Saikat; Zhao, Zijian; Kim, Kijoon; Lim, Suhwan; Woo, Jongho; Kim, Kwangsoo; Kim, Wanki; Ha, Daewon; Gong, Xiao; et al (December 2024, IEEE)

Full Text Available
Towards Causal Deep Learning for Vulnerability Detection

https://doi.org/10.1145/3597503.3639170

Rahman, Md Mahbubur; Ceka, Ira; Mao, Chengzhi; Chakraborty, Saikat; Ray, Baishakhi; Le, Wei (April 2024, ACM)

Full Text Available
CONCORD: Clone-Aware Contrastive Learning for Source Code

https://doi.org/10.1145/3597926.3598035

Ding, Yangruibo; Chakraborty, Saikat; Buratti, Luca; Pujar, Saurabh; Morari, Alessandro; Kaiser, Gail; Ray, Baishakhi (July 2023, 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA))

Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection. While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. On the one hand, human developers tend to write repetitive programs referencing existing code snippets from the current codebase or online resources (e.g., Stack Overflow website) rather than implementing functions from scratch; such behaviors result in a vast number of code clones. In contrast, a deviant clone by mistake might trigger malicious program behaviors. Thus, as a proxy to incorporate developers' coding behavior into the pre-training scheme, we propose to include code clones and their deviants. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart. We show that CONCORD's clone-aware contrastive learning drastically reduces the need for expensive pre-training resources while improving the performance of downstream SE tasks. We also empirically demonstrate that CONCORD can improve existing pre-trained models to learn better representations that consequently become more efficient in both identifying semantically equivalent programs and differentiating buggy from non-buggy code.
more » « less
Full Text Available
NatGen: generative pre-training by “naturalizing” source code

https://doi.org/10.1145/3540250.3549162

Chakraborty, Saikat; Ahmed, Toufique; Ding, Yangruibo; Devanbu, Premkumar T.; Ray, Baishakhi (November 2022, ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Full Text Available
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

https://doi.org/10.18653/v1/2022.acl-long.436

Ding, Yangruibo; Buratti, Luca; Pujar, Saurabh; Morari, Alessandro; Ray, Baishakhi; Chakraborty, Saikat (April 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available
On Multi-Modal Learning of Editing Source Code

https://doi.org/10.1109/ASE51524.2021.9678559

Chakraborty, Saikat; Ray, Baishakhi (January 2021, The 36th IEEE/ACM International Conference on Automated Software Engineering)

Full Text Available

« Prev Next »

Search for: All records