Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood for the training data, we instead fit the score function, obviating the need to evaluate the partition function. Though this estimator is known to be consistent, it's unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood---which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated---i.e. the Poincar\'e, log-Sobolev and isoperimetric constant---quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant---even for simple families of distributions like exponential families with rich enough sufficient statistics---score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.
more »
« less
Estimation Rates for Sparse Linear Cyclic Causal Models
Causal models are fundamental tools to understand complex systems and predict the effect of interventions on such systems. However, despite an extensive literature in the population (or infinite-sample) case, where distributions are assumed to be known, little is known about the statistical rates of convergence of various methods, even for the simplest models. In this work, allowing for cycles, we study linear structural equations models with homoscedastic Gaussian noise and in the presence of interventions that make the model identifiable. More specifically, we present statistical rates of estimation for both the LLC estimator introduced by Hyttinen, Eberhardt and Hoyer and a novel two-step penalized maximum likelihood estimator. We establish asymptotic near minimax optimality for the maximum likelihood estimator over a class of sparse causal graphs in the case of near-optimally chosen interventions. Moreover, we find evidence for practical advantages of this estimator compared to LLC in synthetic numerical experiments.
more »
« less
- Award ID(s):
- 1712596
- PAR ID:
- 10219147
- Editor(s):
- Peters, Joans; Sontag, David
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 124
- ISSN:
- 2640-3498
- Page Range / eLocation ID:
- 1169--1178
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We consider the parameter estimation problem of a probabilistic generative model prescribed using a natural exponential family of distributions. For this problem, the typical maximum likelihood estimator usually overfits under limited training sample size, is sensitive to noise and may perform poorly on downstream predictive tasks. To mitigate these issues, we propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric Kullback-Leibler ball around a parametric nominal distribution. Leveraging the analytical expression of the Kullback-Leibler divergence between two distributions in the same natural exponential family, we show that the min-max estimation problem is tractable in a broad setting, including the robust training of generalized linear models. Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.more » « less
-
Modeling unknown systems from data is a precursor of system optimization and sequential decision making. In this paper, we focus on learning a Markov model from a single trajectory of states. Suppose that the transition model has a small rank despite having a large state space, meaning that the system admits a low-dimensional latent structure. We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states. We propose two maximum-likelihood estimation methods: a convex approach with nuclear norm regularization and a nonconvex approach with rank constraint. We explicitly derive the statistical rates of both estimators in terms of the Kullback-Leiber divergence and the [Formula: see text] error and also establish a minimax lower bound to assess the tightness of these rates. For computing the nonconvex estimator, we develop a novel DC (difference of convex function) programming algorithm that starts with the convex M-estimator and then successively refines the solution till convergence. Empirical experiments demonstrate consistent superiority of the nonconvex estimator over the convex one.more » « less
-
Summary For decades, $ N $-of-1 experiments, where a unit serves as its own control and treatment in different time windows, have been used in certain medical contexts. However, due to effects that accumulate over long time windows and interventions that have complex evolution, a lack of robust inference tools has limited the widespread applicability of such $ N $-of-1 designs. This work combines techniques from experimental design in causal inference and system identification from control theory to provide such an inference framework. We derive a model of the dynamic interference effect that arises in linear time-invariant dynamical systems. We show that a family of causal estimands analogous to those studied in potential outcomes are estimable via a standard estimator derived from the method of moments. We derive formulae for higher moments of this estimator and describe conditions under which $ N $-of-1 designs may provide faster ways to estimate the effects of interventions in dynamical systems. We also provide conditions under which our estimator is asymptotically normal and derive valid confidence intervals for this setting.more » « less
-
Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to combining hypothesis tests of a common causal null hypothesis under two or more candidate causal models. Under certain conditions, this yields a test that is valid if at least one of the underlying models is correct, which is a form of causal robustness. We propose a method of combining semiparametric theory with evidence factors. We develop a causal null hypothesis test based on joint asymptotic normality of asymptotically linear semiparametric estimators, where each estimator is based on a distinct identifying functional derived from each of candidate causal models. We show that this test provides both statistical and causal robustness in the sense that it is valid if at least one of the proposed causal models is correct, while also allowing for slower than parametric rates of convergence in estimating nuisance functions. We demonstrate the effectiveness of our method via simulations and applications to the Framingham Heart Study and Wisconsin Longitudinal Study.more » « less
An official website of the United States government

