Motivated by engineering applications such as resource allocation in networks and inventory systems, we consider average-reward Reinforcement Learning with unbounded state space and reward function. Recent work Murthy et al. (2024) studied this problem in the actor-critic framework and established finite sample bounds assuming access to a critic with certain error guarantees. We complement their work by studying Temporal Difference (TD) learning with linear function approximation and establishing finite-time bounds with the optimal sample complexity. These results are obtained using the following general-purpose theorem for non-linear Stochastic Approximation (SA). Suppose that one constructs a Lyapunov function for a non-linear SA with certain drift condition. Then, our theorem establishes finite-time bounds when this SA is driven by unbounded Markovian noise under suitable conditions. It serves as a black box tool to generalize sample guarantees on SA from i.i.d. or martingale difference case to potentially unbounded Markovian noise. The generality and the mild assumptions of the setup enables broad applicability of our theorem. We illustrate its power by studying two more systems: (i) We improve upon the finite-time bounds of Q-learning in Chen et al. (2024) by tightening the error bounds and also allowing for a larger class of behavior policies. (ii) We establish the first ever finite-time bounds for distributed stochastic optimization of high-dimensional smooth strongly convex function using cyclic block coordinate descent.
more »
« less
This content will become publicly available on March 1, 2026
Learning Linear Polytree Structural Equation Model
We study learning the directed acyclic graph (DAG) for linear structural equation models (SEMs) when the causal structure is a polytree. Under Gaussian polytree models, we derive sufficient sample-size conditions under which the Chow–Liu algorithm exactly recovers both the skeleton and the equivalence class (CPDAG). Matching information-theoretic lower bounds provide necessary conditions, yielding sharp characterizations of problem difficulty. We further analyze inverse correlation matrix estimation with error bounds depending on dimension and the number of v-structures, and extend to group linear polytrees. Comprehensive simulations and benchmark experiments demonstrate robustness when true graphs are only approximately polytrees.
more »
« less
- Award ID(s):
- 1848575
- PAR ID:
- 10644311
- Publisher / Repository:
- Transactions on Machine Learning Research (via OpenReview)
- Date Published:
- Journal Name:
- Transactions on Machine Learning Research (TMLR)
- ISSN:
- 2835-8856
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We study convex empirical risk minimization for high-dimensional inference in binary linear classification under both discriminative binary linear models, as well as generative Gaussian-mixture models. Our first result sharply predicts the statistical performance of such estimators in the proportional asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit to prove bounds on the best achievable performance. Notably, we show that the proposed bounds are tight for popular binary models (such as signed and logistic) and for the Gaussian-mixture model by constructing appropriate loss functions that achieve it. Our numerical simulations suggest that the theory is accurate even for relatively small problem dimensions and that it enjoys a certain universality property.more » « less
-
We study the problem of observation selection in a resource-constrained networked sensing system, where the objective is to select a small subset of observations from a large network to perform a state estimation task. When the measurements are gathered using nonlinear systems, majority of prior work resort to approximation techniques such as linearization of the measurement model to utilize the methods developed for linear models, e.g., (weak) submodular objectives and greedy selection schemes. In contrast, when the measurement model is quadratic, e.g., the range measurements in a radar system, by exploiting a connection to the classical Van Trees' inequality, we derive new optimality criteria without distorting the relational structure of the measurement model. We further show that under certain conditions these optimality criteria are monotone and (weak) submodular set functions. These results enable us to develop an efficient greedy observation selection algorithm uniquely tailored for constrained networked sensing systems following quadratic models and provide theoretical bounds on its achievable utility. Extensive numerical experiments demonstrate efficacy of the proposed framework.more » « less
-
Cassio de Campos; Marloes H. Maathuis (Ed.)When data contains measurement errors, it is necessary to make modeling assumptions relating the error-prone measurements to the unobserved true values. Work on measurement error has largely focused on models that fully identify the parameter of interest. As a result, many practically useful models that result in bounds on the target parameter -- known as partial identification -- have been neglected. In this work, we present a method for partial identification in a class of measurement error models involving discrete variables. We focus on models that impose linear constraints on the tar- get parameter, allowing us to compute partial identification bounds using off-the-shelf LP solvers. We show how several common measurement error assumptions can be composed with an extended class of instrumental variable-type models to create such linear constraint sets. We further show how this approach can be used to bound causal parameters, such as the average treatment effect, when treatment or outcome variables are measured with error. Using data from the Oregon Health Insurance Experiment, we apply this method to estimate bounds on the effect Medicaid enrollment has on depression when depression is measured with error.more » « less
-
Yiming Ying (Ed.)Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyz- ing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine learning models. Leveraging the Uniform-LGI, we first derive convergence rates for gradient flow algorithm, then we give generalization bounds for a large class of machine learning models. We further apply our framework to three distinct machine learning models: linear regression, kernel regression, and two-layer neural networks. Through our approach, we obtain generalization estimates that match or extend previous results.more » « less
An official website of the United States government
