NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Encouraging Inferable Behavior for Autonomy: Repeated Bimatrix Stackelberg Games with Observations

https://doi.org/10.23919/ACC60939.2024.10644936

Karabag, Mustafa O; Smith, Sophia; Fridovich-Keil, David; Topcu, Ufuk (July 2024, Proceedings of the American Control Conference)

When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss.
more » « less
Full Text Available
Simulator-Driven Deceptive Control via Path Integral Approach

https://doi.org/10.1109/CDC49753.2023.10383936

Patil, Apurva; Karabag, Mustafa O; Tanaka, Takashi; Topcu, Ufuk (December 2023, IEEE)

Full Text Available
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

https://doi.org/10.1609/aaai.v37i7.25989

Karabag, Mustafa O; Topcu, Ufuk (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning.
more » « less
Full Text Available
Formal Methods for Autonomous Systems

https://doi.org/10.1561/2600000029

Wongpiromsarn, Tichakorn; Ghasemi, Mahsa; Cubuktepe, Murat; Bakirtzis, Georgios; Carr, Steven; Karabag, Mustafa O.; Neary, Cyrus; Gohari, Parham; Topcu, Ufuk (September 2023, Foundations and Trends® in Systems and Control)

Full Text Available
Alternating Direction Method of Multipliers for Decomposable Saddle-Point Problems

https://doi.org/10.1109/Allerton49937.2022.9929349

Karabag, Mustafa O.; Fridovich-Keil, David; Topcu, Ufuk (September 2022, 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton))

Saddle-point problems appear in various settings including machine learning, zero-sum stochastic games, and regression problems. We consider decomposable saddle-point problems and study an extension of the alternating direction method of multipliers to such saddle-point problems. Instead of solving the original saddle-point problem directly, this algorithm solves smaller saddle-point problems by exploiting the decomposable structure. We show the convergence of this algorithm for convex-concave saddle-point problems under a mild assumption. We also provide a sufficient condition for which the assumption holds. We demonstrate the convergence properties of the saddle-point alternating direction method of multipliers with numerical examples on a power allocation problem in communication channels and a network routing problem with adversarial costs.
more » « less
Full Text Available
Smooth Convex Optimization Using Sub-Zeroth-Order Oracles

https://doi.org/10.1609/aaai.v35i5.16499

Karabag, Mustafa O.; Neary, Cyrus; Topcu, Ufuk (May 2021, Proceedings of the AAAI Conference on Artificial Intelligence)

We consider the problem of minimizing a smooth, Lipschitz, convex function over a compact, convex set using sub-zeroth-order oracles: an oracle that outputs the sign of the directional derivative for a given point and a given direction, an oracle that compares the function values for a given pair of points, and an oracle that outputs a noisy function value for a given point. We show that the sample complexity of optimization using these oracles is polynomial in the relevant parameters. The optimization algorithm that we provide for the comparator oracle is the first algorithm with a known rate of convergence that is polynomial in the number of dimensions. We also give an algorithm for the noisy-value oracle that incurs sublinear regret in the number of queries and polynomial regret in the number of dimensions.
more » « less
Full Text Available

Search for: All records