NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fusing Reward and Dueling Feedback in Stochastic Bandits

Wang, Xuchuang; Zeng, Qirun; Zuo, Jinhang; Liu, Xutong; Hajiesmaili, Mohammad; Lui, John; Wierman, Adam (July 2025, ICML)

Free, publicly-accessible full text available July 25, 2026
Carbon- and Precedence-Aware Scheduling for Data Processing Clusters

https://doi.org/10.1145/3718958.3750478

Lechowicz, Adam; Shenoy, Rohan; Bashir, Noman; Hajiesmaili, Mohammad; Wierman, Adam; Delimitrou, Christina (August 2025, ACM)

Free, publicly-accessible full text available August 27, 2026
Toward Environmentally Equitable AI

https://doi.org/10.1145/3725980

Hajiesmaili, Mohammad; Ren, Shaolei; Sitaraman, Ramesh; Wierman, Adam (July 2025, Communications of the ACM)

The environmental cost of AI is often disproportionately higher in certain regions than in others.
more » « less
Free, publicly-accessible full text available July 1, 2026
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Gu, Shangding; Shi, Laixi; Wen, Muning; Jin, Ming; Mazumdar, Eric; Chi, Yuejie; Wierman, Adam; Spanos, Costas (April 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Learning-Augmented Competitive Algorithms for Spatiotemporal Online Allocation with Deadline Constraints

https://doi.org/10.1145/3711701

Lechowicz, Adam; Christianson, Nicolas; Sun, Bo; Bashir, Noman; Hajiesmaili, Mohammad; Wierman, Adam; Shenoy, Prashant (March 2025, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

We introduce and study spatiotemporal online allocation with deadline constraints (SOAD), a new online problem motivated by emerging challenges in sustainability and energy. In SOAD, an online player completes a workload by allocating and scheduling it on the points of a metric space (X,d) while subject to a deadlineT. At each time step, a service cost function is revealed that represents the cost of servicing the workload at each point, and the player must irrevocably decide the current allocation of work to points. Whenever the player moves this allocation, they incur a movement cost defined by the distance metricd(⋅, ⋅) that captures, e.g., an overhead cost. SOAD formalizes the open problem of combining general metrics and deadline constraints in the online algorithms literature, unifying problems such as metrical task systems and online search. We propose a competitive algorithm for SOAD along with a matching lower bound establishing its optimality. Our main algorithm, ST-CLIP, is a learning-augmented algorithm that takes advantage of predictions (e.g., forecasts of relevant costs) and achieves an optimal consistency-robustness trade-off. We evaluate our proposed algorithms in a simulated case study of carbon-aware spatiotemporal workload management, an application in sustainable computing that schedules a delay-tolerant batch compute job on a distributed network of data centers. In these experiments, we show that ST-CLIP substantially improves on heuristic baseline methods.
more » « less
Free, publicly-accessible full text available March 6, 2026
Learning-Augmented Competitive Algorithms for Spatiotemporal Online Allocation with Deadline Constraints

https://doi.org/10.1145/3726854.3727292

Lechowicz, Adam; Christianson, Nicolas; Sun, Bo; Bashir, Noman; Hajiesmaili, Mohammad; Wierman, Adam; Shenoy, Prashant (June 2025, ACM)

Free, publicly-accessible full text available June 9, 2026
Characterizing Controllability and Observability for Systems with Locality, Communication, and Actuation Constraints

https://doi.org/10.1109/CDC56724.2024.10886635

Conger, Lauren; Lin, Yiheng; Wierman, Adam; Mazumdar, Eric (December 2024, IEEE)

Full Text Available
Learning-Augmented Decentralized Online Convex Optimization in Networks

https://doi.org/10.1145/3700420

Li, Pengfei; Yang, Jianyi; Wierman, Adam; Ren, Shaolei (December 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

This paper studies learning-augmented decentralized online convex optimization in a networked multi-agent system, a challenging setting that has remained under-explored. We first consider a linear learning-augmented decentralized online algorithm (LADO-Lin) that combines a machine learning (ML) policy with a baseline expert policy in a linear manner. We show that, while LADO-Lin can exploit the potential of ML predictions to improve the average cost performance, it cannot have guaranteed worst-case performance. To address this limitation, we propose a novel online algorithm (LADO) that adaptively combines the ML policy and expert policy to safeguard the ML predictions to achieve strong competitiveness guarantees. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement. Finally, we run an experiment on decentralized battery management. Our results highlight the potential of ML augmentation to improve the average performance as well as the guaranteed worst-case performance of LADO.
more » « less
Full Text Available
Model-Free Robust φ-Divergence Reinforcement Learning Using Both Offline and Online Data

Panaganti, Kishan; Wierman, Adam; Mazumdar, Eric (July 2024, Proceedings of the 41st International Conference on Machine Learning)

The robust 𝜙-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust 𝜙-regularized fitted Q-iteration for learning an 𝜖-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of 𝜙-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust 𝜙-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust 𝜙-regularized reinforcement learning framework.
more » « less
Full Text Available
Best of both worlds guarantees for smoothed online quadratic optimization

Bhuyan, Neelkamal; Mukherjee, Debankur; Wierman, Adam (July 2024, JMLR.org)

We study the smoothed online quadratic optimization (SOQO) problem where, at each round t, a player plays an action xt in response to a quadratic hitting cost and an additional squared ℓ2-norm cost for switching actions. This problem class has strong connections to a wide range of application domains including smart grid management, adaptive control, and data center management, where switching-efficient algorithms are highly sought after. We study the SOQO problem in both adversarial and stochastic settings, and in this process, perform the first stochastic analysis of this class of problems. We provide the online optimal algorithm when the minimizers of the hitting cost function evolve as a general stochastic process, which, for the case of martingale process, takes the form of a distribution-agnostic dynamic interpolation algorithm that we call Lazy Adaptive Interpolation (LAI). Next, we present the stochastic-adversarial trade-off by proving an Ω(T) expected regret for the adversarial optimal algorithm in the literature (ROBD) with respect to LAI and, a sub-optimal competitive ratio for LAI in the adversarial setting. Finally, we present a best-of-both-worlds algorithm that obtains a robust adversarial performance while simultaneously achieving a near-optimal stochastic performance.
more » « less
Full Text Available

« Prev Next »

Search for: All records