skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On Robust Control of Partially Observed Uncertain Systems with Additive Costs
In this paper, we consider the problem of optimizing the worst-case behavior of a partially observed system. All uncontrolled disturbances are modeled as finite-valued uncertain variables. Using the theory of cost distributions, we present a dynamic programming (DP) approach to compute a control strategy that minimizes the maximum possible total cost over a given time horizon. To improve the computational efficiency of the optimal DP, we introduce a general definition for information states and show that many information states constructed in previous research efforts are special cases of ours. Additionally, we define approximate information states and an approximate DP that can further improve computational tractability by conceding a bounded performance loss. We illustrate the utility of these results using a numerical example.  more » « less
Award ID(s):
2149520 2219761
PAR ID:
10421259
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2023 American Control Conference
Page Range / eLocation ID:
4639-4644
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Weller, Adrian (Ed.)
    Differential privacy (DP) offers strong theoretical privacy guarantees, though implementations of DP mechanisms may be vulnerable to side-channel attacks, such as timing attacks. When sampling methods such as MCMC or rejection sampling are used to implement a mechanism, the runtime can leak private information. We characterize the additional privacy cost due to the runtime of a rejection sampler in terms of both (epsilon,delta)-DP as well as f-DP. We also show that unless the acceptance probability is constant across databases, the runtime of a rejection sampler does not satisfy epsilon-DP for any epsilon. We show that there is a similar breakdown in privacy with adaptive rejection samplers. We propose three modifications to the rejection sampling algorithm, with varying assumptions, to protect against timing attacks by making the runtime independent of the data. The modification with the weakest assumptions is an approximate sampler, introducing a small increase in the privacy cost, whereas the other modifications give perfect samplers. We also use our techniques to develop an adaptive rejection sampler for log-Holder densities, which also has data-independent runtime. We give several examples of DP mechanisms that fit the assumptions of our methods and can thus be implemented using our samplers. 
    more » « less
  2. Weller, Adrian (Ed.)
    While differential privacy (DP) offers strong theoretical privacy guarantees, implementations of DP mechanisms may be vulnerable to side-channel attacks, such as timing attacks. When sampling methods such as MCMC or rejection sampling are used to implement a privacy mechanism, the runtime can leak private information. We characterize the additional privacy cost due to the runtime of a rejection sampler in terms of both (, δ)-DP as well as f -DP. We also show that unless the acceptance probability is constant across databases, the runtime of a rejection sampler does not satisfy -DP for any . We show that there is a similar breakdown in privacy with adaptive rejection samplers. We propose three modifications to the rejection sampling algorithm, with varying assumptions, to protect against timing attacks by making the runtime independent of the data. The modification with the weakest assumptions is an approximate sampler, introducing a small increase in the privacy cost, whereas the other modifications give perfect samplers. We also use our techniques to develop an adaptive rejection sampler for log-H ̈older densities, which also has data-independent runtime. We give several examples of DP mechanisms that fit the assumptions of our methods and can thus be implemented using our samplers. 
    more » « less
  3. Differentially Private Stochastic Gradient Descent (DP-SGD) has become a widely used technique for safeguarding sensitive information in deep learning applications. Unfortunately, DP-SGD’s per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We observe that the latest DP-SGD-Global-Adapt’s average gradient norm is the same throughout the training. Even when it is integrated with the existing linear decay noise multiplier, it has little or no advantage. Moreover, we notice that its upper clipping threshold increases exponentially towards the end of training, potentially impacting the model’s convergence. Other algorithms, DP-PSAC, Auto-S, DP-SGD-Global, and DP-F, have utility and fairness that are similar to or worse than DP-SGD, as demonstrated in experiments. To overcome these problems and improve utility and fairness, we developed the DP-SGD-Global-Adapt-V2-S. It has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise. DP-SGD-Global-Adapt-V2-S with a privacy budget of 1 improves accuracy by 0.9795%, 0.6786%, and 4.0130% in MNIST, CIFAR10, and CIFAR100, respectively. It also reduces the privacy cost gap by 89.8332% and 60.5541% in unbalanced MNIST and Thinwall datasets, respectively. Finally, we develop mathematical expressions to compute the privacy budget using truncated concentrated differential privacy (tCDP) for DP-SGD-Global-Adapt-V2-T and DP-SGD-Global-Adapt-V2-S. 
    more » « less
  4. Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme. 
    more » « less
  5. Marc'Aurelio Ranzato, Alina Beygelzimer (Ed.)
    Implementations of the exponential mechanism in differential privacy often require sampling from intractable distributions. When approximate procedures like Markov chain Monte Carlo (MCMC) are used, the end result incurs costs to both privacy and accuracy. Existing work has examined these effects asymptotically, but implementable finite sample results are needed in practice so that users can specify privacy budgets in advance and implement samplers with exact privacy guarantees. In this paper, we use tools from ergodic theory and perfect simulation to design exact finite runtime sampling algorithms for the exponential mechanism by introducing an intermediate modified target distribution using artificial atoms. We propose an additional modification of this sampling algorithm that maintains its ǫ-DP guarantee and has improved runtime at the cost of some utility. We then compare these methods in scenarios where we can explicitly calculate a δ cost (as in (ǫ, δ)-DP) incurred when using standard MCMC techniques. Much as there is a well known trade-off between privacy and utility, we demonstrate that there is also a trade-off between privacy guarantees and runtime. 
    more » « less