Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. We establish non-asymptotic absolute error bounds for a neural estimator realized by a shallow NN, focusing on four popular 𝖿-divergences---Kullback-Leibler, chi-squared, squared Hellinger, and total variation. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory to bound the two sources of error involved: function approximation and empirical estimation. The bounds characterize the effective error in terms of NN size and the number of samples, and reveal scaling rates that ensure consistency. For compactly supported distributions, we further show that neural estimators of the first three divergences above with appropriate NN growth-rate are minimax rate-optimal, achieving the parametric convergence rate.
more »
« less
Entry-Wise Eigenvector Analysis and Improved Rates for Topic Modeling on Short Documents
Topic modeling is a widely utilized tool in text analysis. We investigate the optimal rate for estimating a topic model. Specifically, we consider a scenario with n documents, a vocabulary of size p, and document lengths at the order N. When N≥c·p, referred to as the long-document case, the optimal rate is established in the literature at p/(Nn). However, when N=o(p), referred to as the short-document case, the optimal rate remains unknown. In this paper, we first provide new entry-wise large-deviation bounds for the empirical singular vectors of a topic model. We then apply these bounds to improve the error rate of a spectral algorithm, Topic-SCORE. Finally, by comparing the improved error rate with the minimax lower bound, we conclude that the optimal rate is still p/(Nn) in the short-document case.
more »
« less
- Award ID(s):
- 1943902
- PAR ID:
- 10531769
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Mathematics
- Volume:
- 12
- Issue:
- 11
- ISSN:
- 2227-7390
- Page Range / eLocation ID:
- 1682
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm designmore » « less
-
Abstract Specification and forecast of ionospheric parameters, such as ionospheric electron density (Ne), have been an important topic in space weather and ionospheric research. Neural networks (NNs) emerge as a powerful modeling tool forNeprediction. However, heavy manual adjustments are time consuming to determine the optimal NN structures. In this work, we propose to use neural architecture search (NAS), an automatic machine learning method, to mitigate this problem. NAS aims to find the optimal network structure through the alternate optimization of the hyperparameters and the corresponding network parameters within a pre‐defined hyperparameter search space. A total of 16‐year data from Millstone Hill incoherent scatter radar (ISR) are used for the NN models. One single‐layer NN (SLNN) model and one deep NN (DNN) model are both trained with NAS, namely SLNN‐NAS and DNN‐NAS, forNeprediction and compared with their manually tuned counterparts (SLNN and DNN) based on previous studies. Our results show that SLNN‐NAS and DNN‐NAS outperformed SLNN and DNN, respectively. These NN predictions ofNedaily variation patterns reveal a 27‐day mid‐latitude topsideNevariation, which cannot be reasonably represented by traditional empirical models developed using monthly averages. DNN‐NAS yields the best prediction accuracy measured by quantitative metrics and rankings of daily pattern prediction, especially with an improvement in mean absolute error more than 10% compared to the SLNN model. The limited improvement of NAS is likely due to the network complexity and the limitation of fully connected NN without the time histories of input parameters.more » « less
-
The concept of antidistinguishability of quantum states has been studied to investigate foundational questions in quantum mechanics. It is also called quantum state elimination, because the goal of such a protocol is to guess which state, among finitely many chosen at random, the system is not prepared in (that is, it can be thought of as the first step in a process of elimination). Antidistinguishability has been used to investigate the reality of quantum states, ruling out ψ-epistemic ontological models of quantum mechanics (Pusey et al. in Nat Phys 8(6):475–478, 2012). Thus, due to the established importance of antidistinguishability in quantum mechanics, exploring it further is warranted. In this paper, we provide a comprehensive study of the optimal error exponent—the rate at which the optimal error probability vanishes to zero asymptotically—for classical and quantum antidistinguishability. We derive an exact expression for the optimal error exponent in the classical case and show that it is given by the multivariate classical Chernoff divergence. Our work thus provides this divergence with a meaningful operational interpretation as the optimal error exponent for antidistinguishing a set of probability measures. For the quantum case, we provide several bounds on the optimal error exponent: a lower bound given by the best pairwise Chernoff divergence of the states, a single-letter semi-definite programming upper bound, and lower and upper bounds in terms of minimal and maximal multivariate quantum Chernoff divergences. It remains an open problem to obtain an explicit expression for the optimal error exponent for quantum antidistinguishability.more » « less
-
The concept of antidistinguishability of quantum states has been studied to investigate foundational questions in quantum mechanics. It is also called quantum state elimination, because the goal of such a protocol is to guess which state, among finitely many chosen at random, the system is not prepared in (that is, it can be thought of as the first step in a process of elimination). Antidistinguishability has been used to investigate the reality of quantum states, ruling out psi-epistemic ontological models of quantum mechanics (Pusey et al. in Nat Phys 8(6):475–478, 2012). Thus, due to the established importance of antidistinguishability in quantum mechanics, exploring it further is warranted. In this paper, we provide a comprehensive study of the optimal error exponent—the rate at which the optimal error probability vanishes to zero asymptotically—for classical and quantum antidistinguishability. We derive an exact expression for the optimal error exponent in the classical case and show that it is given by the multivariate classical Chernoff divergence. Our work thus provides this divergence with a meaningful operational interpretation as the optimal error exponent for antidistinguishing a set of probability measures. For the quantum case, we provide several bounds on the optimal error exponent: a lower bound given by the best pairwise Chernoff divergence of the states, a single-letter semi-definite programming upper bound, and lower and upper bounds in terms of minimal and maximal multivariate quantum Chernoff divergences. It remains an open problem to obtain an explicit expression for the optimal error exponent for quantum antidistinguishability.more » « less
An official website of the United States government

