Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algo- rithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner’s cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms.more » « lessFree, publicly-accessible full text available June 1, 2026
-
How to design a Markov decision process (MDP)-based radar controller that makes small sacrifices in performance to mask its sensing plan from an adversary? The radar controller purposefully minimizes the Fisher information of its emissions so that an adversary cannot identify the controller’s model parameters accurately. Unlike classical open-loop statistical inference, where the Fisher information serves as a lower bound for the achievable covariance, this article employs the Fisher information as a design constraint for a closed-loop radar controller to mask its sensing plan. We analytically derive a closed-form expression for the determinant of the Fisher informa- tion matrix (FIM) pertaining to the parameters of the MDP-based controller. Subsequently, we formulate an MDP, which is regularized by the determinant of the adversary’s FIM. This results in the per- turbations to the total cost of operation, state–action costs, and the transition matrix. In addition, we propose a maximum-entropy-based convex lower bound for the FIM-constrained MDP, whose solution can serve as an initialization point for the proposed nonlinear optimization problem. This convex lower bound aligns with existing work that employs the same maximum entropy criteria to mask the sensing plan. Numerical results show that the introduction of minor perturbations to the MDP’s state–action costs, transition kernel, and the total operation cost can reduce the Fisher information of the emissions. Consequently, this reduction amplifies the variability in policy and transition kernel estimation errors, thwarting the adversary’s accuracy in estimating the controller’s sensing plan. We demonstrate this by comparing the error in the estimate of the transition kernel under different criteria: the FIM criteria, the maximum entropy criteria, a sensing plan where actions are chosen uniformly, and the unmasked sensing plan.more » « lessFree, publicly-accessible full text available February 1, 2026
-
How can non-communicating agents learn to share congested resources efficiently? This is a challeng- ing task when the agents can access the same resource simultaneously (in contrast to multi-agent multi-armed bandit problems) and the resource valuations differ among agents. We present a fully distributed algorithm for learning to share in congested environments and prove that the agents’ regret with respect to the optimal allocation is poly-logarithmic in the time horizon. Performance in the non-asymptotic regime is illustrated in numerical simulations. The distributed algorithm has applications in cloud computing and spectrum sharing.more » « less
-
A learner aims to minimize a function f by repeatedly querying a distributed oracle that provides noisy gradient evaluations. At the same time, the learner seeks to hide arg min f from a malicious eavesdropper that observes the learner’s queries. This paper considers the problem of covert or learner-private optimization, where the learner has to dynamically choose between learning and obfuscation by exploiting the stochasticity. The problem of controlling the stochastic gradient algorithm for covert optimization is modeled as a Markov decision process, and we show that the dynamic programming operator has a supermodular structure implying that the optimal policy has a monotone threshold structure. A computationally efficient policy gradient algorithm is proposed to search for the optimal querying policy without knowledge of the transition probabilities. As a practical application, our methods are demonstrated on a hate speech classification task in a federated setting where an eavesdropper can use the optimal weights to generate toxic content, which is more easily misclassified. Numerical results show that when the learner uses the optimal policy, an eavesdropper can only achieve a validation accuracy of 52% with no information and 69% when it has a public dataset with 10% positive samples compared to 83% when the learner employs a greedy policy.more » « less
-
This letter studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such prob- lems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and per- form SG. The oracle probabilistically returns either a noisy gradient of the function or a non-informative measure- ment, depending on the oracle state and incentive. The learner’s query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This letter formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi– armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.more » « less
-
Can deep convolutional neural networks (CNNs) for image classification be interpreted as utility maximizers with information costs? By performing set-valued system identifica- tion for Bayesian decision systems, we demonstrate that deep CNNs behave equivalently (in terms of necessary and sufficient conditions) to rationally inattentive Bayesian utility maximizers, a generative model used extensively in economics for human decision-making. Our claim is based on approximately 500 numerical experiments on 5 widely used neural network archi- tectures. The parameters of the resulting interpretable model are computed efficiently via convex feasibility algorithms. As a practical application, we also illustrate how the reconstructed interpretable model can predict the classification performance of deep CNNs with high accuracy. The theoretical foundation of our approach lies in Bayesian revealed preference studied in micro-economics. All our results are on GitHub and completely reproducible.more » « less
-
A metacognitive radar switches between two modes of cognition— one mode to achieve a high-quality estimate of targets, and the other mode to hide its utility function (plan). To achieve high-quality es- timates of targets, a cognitive radar performs a constrained utility maximization to adapt its sensing mode in response to a changing target environment. If an adversary can estimate the utility function of a cognitive radar, it can determine the radar’s sensing strategy and mitigate the radar performance via electronic countermeasures (ECM). This article discusses a metacognitive radar that switches between two modes of cognition: achieving satisfactory estimates of a target while hiding its strategy from an adversary that detects cognition. The radar does so by transmitting purposefully designed suboptimal responses to spoof the adversary’s Neyman–Pearson de- tector. We provide theoretical guarantees by ensuring that the Type-I error probability of the adversary’s detector exceeds a predefined level for a specified tolerance on the radar’s performance loss. We illustrate our cognition-masking scheme via numerical examples in- volving waveform adaptation and beam allocation. We show that small purposeful deviations from the optimal emission confuse the adversary by significant amounts, thereby masking the radar’s cognition. Our approach uses ideas from revealed preference in microeconomics and adversarial inverse reinforcement learning. Our proposed algorithms provide a principled approach for system-level electronic counter- countermeasures to hide the radar’s strategy from an adversary. We also provide performance bounds for our cognition-masking scheme when the adversary has misspecified measurements of the radar’s response.more » « less
-
In this article, we exploit the spiked covariance structure of the clutter plus noise covariance matrix for radar signal processing. Using state-of-the-art techniques high dimensional statistics, we propose a nonlinear shrinkage-based rotation invariant spiked covariance ma- trix estimator. We state the convergence of the estimated spiked eigen- values. We use a dataset generated from the high-fidelity, site-specific physics-based radar simulation software RFView to compare the proposed algorithm against the existing rank constrained maximum likelihood (RCML)-expected likelihood (EL) covariance estimationmore » « less
-
Suppose L simultaneous independent stochastic sys- tems generate observations, where the observations from each system depend on the underlying model parameter of that system. The observations are unlabeled (anonymized), in the sense that an analyst does not know which observation came from which stochastic system. How can the analyst estimate the underlying model parameters of the L systems? Since the anonymized observations at each time are an unordered set of L measurements (rather than a vector), classical stochastic gradient algorithms cannot be directly used. By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. By exploiting the fact that the algebraic ring of multi-variable polynomials is a unique factorization domain over the ring of one-variable polynomials, we construct an adaptive filtering algorithm that yields a statistically consistent estimate of the underlying param- eters. We analyze the asymptotic covariance of these estimates to quantify the effect of anonymization. Finally, we characterize the anonymity of the observations in terms of the error probability of the maximum aposteriori Bayesian estimator. Specifically using Blackwell dominance of mean preserving spreads, we construct a partial ordering of the noise densities which relates the anonymity of the observations to the asymptotic covariance of the adaptive filtering algorithm.more » « less
An official website of the United States government

Full Text Available