 Award ID(s):
 2238523
 NSFPAR ID:
 10489598
 Publisher / Repository:
 International Conference on Learning Representations (ICLR), 2024
 Date Published:
 Subject(s) / Keyword(s):
 ["score matching","logSobolev inequality","isoperimetry","relative efficiency","sample complexity"]
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable). While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihoodboth computational and statisticalare not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradientbased method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zerothorder or firstorder oracle for optimizing the maximum likelihood loss is NPhard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension.more » « less

Energybased models (EBMs) assign an unnormalized log probability to data samples. This functionality has a variety of applications, such as sample synthesis, data denoising, sample restoration, outlier detection, Bayesian reasoning and many more. But, the training of EBMs using standard maximum likelihood is extremely slow because it requires sampling from the model distribution. Score matching potentially alleviates this problem. In particular, denoisingscore matching has been successfully used to train EBMs. Using noisy data samples with one fixed noise level, these models learn fast and yield good results in data denoising. However, demonstrations of such models in the highquality sample synthesis of highdimensional data were lacking. Recently, a paper showed that a generative model trained by denoisingscore matching accomplishes excellent sample synthesis when trained with data samples corrupted with multiple levels of noise. Here we provide an analysis and empirical evidence showing that training with multiple noise levels is necessary when the data dimension is high. Leveraging this insight, we propose a novel EBM trained with multiscale denoisingscore matching. Our model exhibits a datageneration performance comparable to stateoftheart techniques such as GANs and sets a new baseline for EBMs. The proposed model also provides density information and performs well on an imageinpainting task.

Abstract The $p$tensor Ising model is a oneparameter discrete exponential family for modeling dependent binary data, where the sufficient statistic is a multilinear form of degree $p \geqslant 2$. This is a natural generalization of the matrix Ising model that provides a convenient mathematical framework for capturing, not just pairwise, but higherorder dependencies in complex relational data. In this paper, we consider the problem of estimating the natural parameter of the $p$tensor Ising model given a single sample from the distribution on $N$ nodes. Our estimate is based on the maximum pseudolikelihood (MPL) method, which provides a computationally efficient algorithm for estimating the parameter that avoids computing the intractable partition function. We derive general conditions under which the MPL estimate is $\sqrt N$consistent, that is, it converges to the true parameter at rate $1/\sqrt N$. Our conditions are robust enough to handle a variety of commonly used tensor Ising models, including spin glass models with random interactions and models where the rate of estimation undergoes a phase transition. In particular, this includes results on $\sqrt N$consistency of the MPL estimate in the wellknown $p$spin Sherrington–Kirkpatrick model, spin systems on general $p$uniform hypergraphs and Ising models on the hypergraph stochastic block model (HSBM). In fact, for the HSBM we pin down the exact location of the phase transition threshold, which is determined by the positivity of a certain meanfield variational problem, such that above this threshold the MPL estimate is $\sqrt N$consistent, whereas below the threshold no estimator is consistent. Finally, we derive the precise fluctuations of the MPL estimate in the special case of the $p$tensor Curie–Weiss model, which is the Ising model on the complete $p$uniform hypergraph. An interesting consequence of our results is that the MPL estimate in the Curie–Weiss model saturates the Cramer–Rao lower bound at all points above the estimation threshold, that is, the MPL estimate incurs no loss in asymptotic statistical efficiency in the estimability regime, even though it is obtained by minimizing only an approximation of the true likelihood function for computational tractability.more » « less

Deep models trained through maximum likelihood have achieved stateoftheart results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverseweighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverseweighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce InverseWeighted Survival Games. In these games, objectives for each model are built from reweighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and criticallyill patient data.more » « less

Deep models trained through maximum likelihood have achieved stateoftheart results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverseweighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverseweighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce InverseWeighted Survival Games. In these games, objectives for each model are built from reweighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and criticallyill patient data. Supplementary Material: pdfmore » « less