skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning from many trajectories
We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Our conditions for efficient learning generalize the former setting--trajectories must be non-degenerate in ways that extend standard requirements for independent examples. Notably, we do not require that trajectories be ergodic, long, nor strictly stable. For linear least-squares regression, given n-dimensional examples produced by m trajectories, each of length T, we observe a notable change in statistical efficiency as the number of trajectories increases from a few (namely m<=n) to many (namely m>=n). Specifically, we establish that the worst-case error rate of this problem is n/(mT) whenever m>=n. Meanwhile, when m<=n, we establish a (sharp) lower bound of n^2/(m^2T) on the worst-case error rate, realized by a simple, marginally unstable linear dynamical system. A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent, drawn from their marginals. As a corollary of our analysis, we also improve guarantees for the linear system identification problem.  more » « less
Award ID(s):
1846369
PAR ID:
10581119
Author(s) / Creator(s):
; ;
Publisher / Repository:
JMLR
Date Published:
Journal Name:
Journal on Machine Learning Research
ISSN:
1533-7928
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We focus on robust estimation of the unobserved state of a discrete-time stochastic system with linear dynamics. A standard analysis of this estimation problem assumes a baseline innovation model; with Gaussian innovations we recover the Kalman filter. However, in many settings, there is insufficient or corrupted data to validate the baseline model. To cope with this problem, we minimize the worst-case mean-squared estimation error of adversarial models chosen within a Wasserstein neighborhood around the baseline. We also constrain the adversarial innovations to form a martingale difference sequence. The martingale constraint relaxes the i.i.d. assumptions which are often imposed on the baseline model. Moreover, we show that the martingale constraints guarantee that the adversarial dynamics remain adapted to the natural time-generated information. Therefore, adding the martingale constraint allows to improve upon over-conservative policies that also protect against unrealistic omniscient adversaries. We establish a strong duality result which we use to develop an efficient subgradient method to compute the distributionally robust estimation policy. If the baseline innovations are Gaussian, we show that the worst-case adversary remains Gaussian. Our numerical experiments indicate that the martingale constraint may also aid in adding a layer of robustness in the choice of the adversarial power. 
    more » « less
  2. null (Ed.)
    The object of study of this paper is the following multi-determinantal algebraic variety, SINGn, m, which captures the symbolic determinant identity testing (SDIT) problem (a canonical version of the polynomial identity testing (PIT) problem), and plays a central role in algebra, algebraic geometry and computational complexity theory. SINGn, m is the set of all m-tuples of n×n complex matrices which span only singular matrices. In other words, the determinant of any linear combination of the matrices in such a tuple vanishes. The algorithmic complexity of testing membership in SINGn, m is a central question in computational complexity. Having almost a trivial probabilistic algorithm, finding an efficient deterministic algorithm is a holy grail of derandomization, and to top it, will imply super-polynomial circuit lower bounds! A sequence of recent works suggests efficient deterministic “geodesic descent” algorithms for memberships in a general class of algebraic varieties, namely the null cones of (reductive) linear group actions. Can such algorithms be used for the problem above? Our main result is negative: SINGn, m is not the null cone of any such group action! This stands in stark contrast to a non-commutative analog of this variety (for which such algorithms work), and points to an inherent structural difficulty of SINGn, m. In other words, we provide a barrier for the attempts of derandomizing SDIT via these algorithms. To prove this result we identify precisely the group of symmetries of SINGn, m. We find this characterization, and the tools we introduce to prove it, of independent interest. Our characterization significantly generalizes a result of Frobenius for the special case m=1 (namely, computing the symmetries of the determinant). Our proof suggests a general method for determining the symmetries of general algebraic varieties, an algorithmic problem that was hardly studied and we believe is central to algebraic complexity. 
    more » « less
  3. Kumar, Amit; Ron-Zewi, Noga (Ed.)
    The goal of trace reconstruction is to reconstruct an unknown n-bit string x given only independent random traces of x, where a random trace of x is obtained by passing x through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of x rather than individual traces themselves. Such an algorithm is said to be 𝓁-local if each of its statistical queries corresponds to an 𝓁-junta function over some block of 𝓁 consecutive bits in the trace. Since several - but not all - known algorithms for trace reconstruction fall under the local statistical query paradigm, it is interesting to understand the abilities and limitations of local SQ algorithms for trace reconstruction. In this paper we establish nearly-matching upper and lower bounds on local Statistical Query algorithms for both worst-case and average-case trace reconstruction. For the worst-case problem, we show that there is an Õ(n^{1/5})-local SQ algorithm that makes all its queries with tolerance τ ≥ 2^{-Õ(n^{1/5})}, and also that any Õ(n^{1/5})-local SQ algorithm must make some query with tolerance τ ≤ 2^{-Ω̃(n^{1/5})}. For the average-case problem, we show that there is an O(log n)-local SQ algorithm that makes all its queries with tolerance τ ≥ 1/poly(n), and also that any O(log n)-local SQ algorithm must make some query with tolerance τ ≤ 1/poly(n). 
    more » « less
  4. We consider a networked linear dynamical system with p agents/nodes. We study the problem of learning the underlying graph of interactions/dependencies from observations of the nodal trajectories over a time-interval T. We present a regularized non-casual consistent estimator for this problem and analyze its sample complexity over two regimes: (a) where the interval T consists of n i.i.d. observation windows of length T/n (restart and record), and (b) where T is one continuous observation window (consecutive). Using the theory of M-estimators, we show that the estimator recovers the underlying interactions, in either regime, in a time-interval that is logarithmic in the system size p. To the best of our knowledge, this is the first work to analyze the sample complexity of learning linear dynamical systems driven by unobserved not-white wide-sense stationary (WSS) inputs. 
    more » « less
  5. We consider a networked linear dynamical system with p agents/nodes. We study the problem of learning the underlying graph of interactions/dependencies from observations of the nodal trajectories over a time-interval T. We present a regularized non-casual consistent estimator for this problem and analyze its sample complexity over two regimes: (a) where the interval T consists of n i.i.d. observation windows of length T/n (restart and record), and (b) where T is one continuous observation window (consecutive). Using the theory of M-estimators, we show that the estimator recovers the underlying interactions, in either regime, in a time-interval that is logarithmic in the system size p. To the best of our knowledge, this is the first work to analyze the sample complexity of learning linear dynamical systems driven by unobserved not-white wide-sense stationary (WSS) inputs. 
    more » « less