Automatic software system optimization can improve software speed, reduce operating costs, and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases and system contexts. Recent methods using Large Language Models (LLMs) offer automation to address these limitations, but often fail to scale to the complexity of realworld software systems and applications. We present SysLLMatic, a system that integrates LLMs with profiling-guided feedback and system performance insights to automatically optimize software code. We evaluate it on three benchmark suites: HumanEval CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapoBench (large-scale software systems in Java). Results show that SysLLMatic can improve system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, it surpasses traditional compiler optimizations, achieving average relative improvements of 1.85× in latency and 2.24× in throughput. Our findings demonstrate that LLMs, guided by principled systems thinking and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner. Index Terms—Software Engineering, Automatic Programming,
more »
« less
Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions
Energy-efficient software helps improve mobile de- vice experiences and reduce the carbon footprint of data centers. However, energy goals are often de-prioritized in order to meet other requirements. We take inspiration from recent work exploring the use of large language models (LLMs) for different software engineering activities. We propose a novel application of LLMs: as code optimizers for energy efficiency. We describe and evaluate a prototype, finding that over 6 small programs our system can improve energy efficiency in 3 of them, up to 2x better than compiler optimizations alone. From our experience, we identify some of the challenges of energy-efficient LLM code optimization and propose a research agenda.
more »
« less
- Award ID(s):
- 2343596
- PAR ID:
- 10621259
- Publisher / Repository:
- arXiv
- Date Published:
- Subject(s) / Keyword(s):
- Energy efficiency Research agenda Software optimization Large language models
- Format(s):
- Medium: X
- Institution:
- PURDUE UNIVERSITY; Loyola University Chicago
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Computing systems are consuming an increasing and unsustainable fraction of society’s energy footprint, notably in data centers. Meanwhile, energy-efficient software engineering techniques are often absent from undergraduate curricula. We propose to develop a learning module for energy-efficient software, suitable for incorporation into an undergraduate software engineering class. There is one major problem with such an endeavor: undergraduate curricula have limited space for mastering energy-related systems programming aspects. To address this problem, we propose to leverage the domain expertise afforded by large language models (LLMs). In our preliminary studies, we observe that LLMs can generate energy-efficient variations of basic linear algebra codes tailored to both ARM64 and AMD64 architectures, as well as unit tests and energy measurement harnesses. On toy examples suitable for classroom use, this approach reduces energy expenditure by 30–90%. These initial experiences give rise to our vision of LLM-based metacompilers as a tool for students to transform high-level algorithms into efficient, hardware-specific implementations. Complementing this tooling, we will incorporate systems thinking concepts into the learning module so that students can reason both locally and globally about the effects of energy optimizations.more » « less
-
Large Language Models (LLMs) are increasingly used to automate software development, yet most prior evaluations focus on functional correctness or high-level languages such as Python. As one of the first systematic explorations of LLM-assisted software performance engineering, we present a comprehensive study of LLMs’ ability to generate efficient C implementations of graph-analysis routines—code that must satisfy stringent runtime and memory constraints. This emerging field of LLM-assisted algorithm engineering holds significant promise, as these models may possess the capability to design novel approaches that improve existing algorithms and their implementations. Eight state-of-the-art models (OpenAI ChatGPT o3 and o4-mini-high, Anthropic Claude 4 Sonnet and Sonnet Extended, Google Gemini 2.5 Flash and Pro, xAI Grok 3-Think, and DeepSeek DeepThink R1) are benchmarked using two distinct approaches. The first approach evaluates the ability of LLMs to generate algorithms that outperform existing benchmarks. The second approach assesses their capability to generate graph algorithms for integration into performance-critical systems. The results show that Claude Sonnet 4 Extended achieves superior performance in ready-to-use code generation and efficiency, outperforming human-written baselines in triangle counting. Although our findings demonstrate that contemporary LLMs excel in optimizing and integrating established algorithms, the potential for these models to eventually invent transformative algorithmic techniques represents a compelling frontier for future research. We provide prompts, generated code, and measurement scripts to promote reproducible research in this rapidly evolving domain. All of the source code is available on GitHub at https://github.com/Bader-Research/LLM-triangle-counting/.more » « less
-
We propose a method for optimizing the energy efficiency of software code running on small computing devices in the Internet of Things (IoT) that are powered exclusively by electricity harvested from ambient energy in the environment. Due to the weak and unstable nature of the energy source, it is challenging for developers to manually optimize the software code to deal with mismatch between the intermittent power supply and the computation demand. Our method overcomes the challenge by using a combination of three techniques. First, we use static program analysis to automatically identify opportunities for precomputation, i.e., computation that may be performed ahead of time as opposed to just in time. Second, we optimize the precomputation policy, i.e., a way to split and reorder steps of a computation task in the original software to match the intermittent power supply while satisfying a variety of system requirements; this is accomplished by formulating energy optimization as a constraint satisfiability problem and then solving the problem using an off-the-shelf SMT solver. Third, we use a state-of-the-art compiler platform (LLVM) to automate the program transformation to ensure that the optimized software code is correct by construction. We have evaluated our method on a large number of benchmark programs, which are C programs implementing secure communication protocols that are popular for energy-harvesting IoT devices. Our experimental results show that the method is efficient in optimizing all benchmark programs. Furthermore, the optimized programs significantly outperform the original programs in terms of energy efficiency and latency, and the overall improvement ranges from 2.3X to 36.7X.more » « less
-
Recent advances in large language models (LLMs) have transformed software development by automatically generating code from natural language. Yet challenges remain in generating fully correct code that aligns with user intent. Our study reveals that LLMs tend to pay less attention to user prompts as more code tokens are generated. We hypothesize that this attention dilution issue is an important reason for code generation errors. To mitigate this issue, we propose Selective Prompt Anchoring (SPA) to guide code LLMs to pay more attention to user intent when generating code. We evaluate SPA using six base LLMs across six benchmarks. Our results demonstrate that SPA enhances Pass@1 by up to 12.9%, consistently outperforming SOTA methods in all settings. Our code is available at https://github.com/magic-YuanTian/Selective- Prompt-Anchoring.more » « less
An official website of the United States government

