Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The burgeoning sophistication of Artificial Intelligence (AI) has catalyzed the rapid proliferation of Large Language Models (LLMs) within software development. These models are increasingly employed to automate the generation of functionally correct code, address complex computational problems, and facilitate the debugging of existing software systems. However, LLM-generated code often faces challenges due to inherent inefficiencies, including redundant logical structures, factually inconsistent content (hallucinations), and programming errors. To address this issue, our research rigorously evaluated the computational efficiency of Python code generated by three prominent LLMs: GPT-4o-Mini, GPT-3.5-Turbo, and GPT-4-Turbo. The evaluation metrics encompass execution time, memory utilization, and peak memory consumption, while maintaining the functional correctness of the generated code. Leveraging the EffiBench benchmark datasets within the Google Vertex AI Workbench environment, across a spectrum of machine configurations, the study implemented a consistent seed parameter to ensure experimental reproducibility. Furthermore, we investigated the impact of two distinct optimization strategies: Chain-of-Thought (CoT) prompting and model fine-tuning. Our findings reveal a significant enhancement in efficiency metrics for GPT-4o-Mini and GPT-3.5-Turbo when employing CoT prompting; however, this trend was not observed for GPT-4-Turbo. Based on its promising performance with CoT prompting, we selected the GPT-4o-Mini model for subsequent fine-tuning, aiming to further enhance both its computational efficiency and accuracy. However, contrary to our expectations, fine-tuning the GPT-4o-Mini model led to a discernible degradation in both its accuracy and computational efficiency. In conclusion, this study provides empirical evidence suggesting that the deployment of high-CPU machine configurations, in synergy with the utilization of the GPT-4o-Mini model and CoT prompting techniques, yields demonstrably more efficient and accurate LLM-generated Python code, particularly within computationally intensive application scenarios.more » « lessFree, publicly-accessible full text available July 16, 2026
-
In current software applications, numerous vulnerabilities may be present. Attackers attempt to exploit these vulnerabilities, leading to security breaches, unauthorized entry, data theft, or the incapacitation of computer systems. Instead of addressing software or hardware vulnerabilities at a later stage, it is better to address them immediately or during the development phase. Tools such as AIBugHunter provide solutions designed to tackle software issues by predicting, categorizing, and fixing coding vulnerabilities. Essentially, developers can see where their code is susceptible to attacks and obtain details about the nature and severity of these vulnerabilities. AIBugHunter incorporates VulRepair to detect and repair vulnerabilities. VulRepair currently predicts patches for vulnerable functions at 44%. To be truly effective, this number needs to be increased. This study examines VulRepair to see whether the 44% perfect prediction can be increased. VulRepair is based on T5 and uses both natural language and programming languages during its pretraining phase, along with byte pair encoding. T5 is a text-to-text transfer transformer model with an encoder and decoder as part of its neural network. It outperforms other models such as VRepair and CodeBERT. However, the hyperparameters may not be optimized due to the development of new optimizers. We reviewed a deep neural network (DNN) optimizer developed by Google in 2023. This optimizer, the Evolved Sign Momentum (LION), is available in PyTorch. We applied LION to VulRepair and tested its influence on the hyperparameters. After adjusting the hyperparameters, we obtained a 56% perfect prediction, which exceeds the value of the VulRepair report of 44%. This means that VulRepair can repair more vulnerabilities and avoid more attacks. As far as we know, our approach utilizing an alternative to AdamW, the standard optimizer, has not been previously applied to enhance VulRepair and similar models.more » « less
An official website of the United States government
