 Publication Date:
 NSFPAR ID:
 10352363
 Journal Name:
 AAdvances in Neural Information Processing Systems 34 (NeurIPS 2021)
 Sponsoring Org:
 National Science Foundation
More Like this

While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples are known. Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. More precisely, we consider the expected learning dynamics of the TD(0) algorithm for value estimation. As the stepsize converges to zero, these dynamics are defined by a nonlinear ODE which depends on the geometry of the space of function approximators, the structure of the underlying Markov chain, and their interaction. We find a set of function approximators that includes ReLU networks and has geometry amenable to TD learning regardless of environment, so that the solution performs about as well as linear TD in the worst case. Then, we show how environments that are more reversible induce dynamics that are better for TD learning and prove global convergence to the true value function for wellconditioned function approximators. Finally, we generalize a divergent counterexample to a family of divergent problems to demonstrate how the interaction between approximator and environment can go wrong and to motivate the assumptions needed to prove convergence.

We develop a simple Quantile Spacing (QS) method for accurate probabilistic estimation of onedimensional entropy from equiprobable random samples, and compare it with the popular BinCounting (BC) and Kernel Density (KD) methods. In contrast to BC, which uses equalwidth bins with varying probability mass, the QS method uses estimates of the quantiles that divide the support of the data generating probability density function (pdf) into equalprobabilitymass intervals. And, whereas BC and KD each require optimal tuning of a hyperparameter whose value varies with sample size and shape of the pdf, QS only requires specification of the number of quantiles to be used. Results indicate, for the class of distributions tested, that the optimal number of quantiles is a fixed fraction of the sample size (empirically determined to be ~0.25–0.35), and that this value is relatively insensitive to distributional form or sample size. This provides a clear advantage over BC and KD since hyperparameter tuning is not required. Further, unlike KD, there is no need to select an appropriate kerneltype, and so QS is applicable to pdfs of arbitrary shape, including those with discontinuous slope and/or magnitude. Bootstrapping is used to approximate the sampling variability distribution of the resulting entropy estimate,more »

Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. We establish nonasymptotic absolute error bounds for a neural estimator realized by a shallow NN, focusing on four popular 𝖿divergencesKullbackLeibler, chisquared, squared Hellinger, and total variation. Our analysis relies on nonasymptotic function approximation theorems and tools from empirical process theory to bound the two sources of error involved: function approximation and empirical estimation. The bounds characterize the effective error in terms of NN size and the number of samples, and reveal scaling rates that ensure consistency. For compactly supported distributions, we further show that neural estimators of the first three divergences above with appropriate NN growthrate are minimax rateoptimal, achieving the parametric convergence rate.

The advent of pervasive autonomous systems such as selfdriving cars and drones has raised questions about their safety and trustworthiness. This is particularly relevant in the event of onboard subsystem errors or failures. In this research, we show how encoded Extended Kalman Filter can be used to detect anomalous behaviors of critical components of nonlinear autonomous systems: sensors, actuators, state estimation algorithms and control software. As opposed to prior work that is limited to linear systems or requires the use of cumbersome machine learned checks with fixed detection thresholds, the proposed approach necessitates the use of timevarying checks with dynamically adaptive thresholds. The method is lightweight in comparison to existing methods (does not rely on machine learning paradigms) and achieves high coverage as well as low detection latency of errors. A quadcopter and an automotive steerbywire system are used as test vehicles for the research and simulation and hardware results indicate the overhead, coverage and error detection latency benefits of the proposed approach.

Estimating the evolutionary power spectral density (EPSD) of nonstationary winds (e.g., tropical storms and downbursts) is necessary to predict the response of structures under such extreme winds. Following the review of the existing direct estimation methods of EPSD, this paper offers a twostep unified formulation, i.e., raw estimation and associated error reduction. The raw estimation is expressed in terms of a timefrequency transform with a general kernel function. It is shown that if the kernel function is described by a timefrequency analysis tool such as the shorttime Fourier transform, the wavelet transform, and the Stransform, the generalized raw EPSD estimation becomes a particular case of the existing methods. The unified estimation method presented here can be viewed as a filter bank with adjustable time and frequency resolution. The analysis of error in the raw estimation is carried out on the bias and random errors accounting for the approximation in both the time and frequency domains. Various techniques for reducing such errors are then summarized and recast in the unified formulation, including series expansion, shorttime window smoothing, and multitapering. Based on the unified perspective, a discussion and some prospects of EPSD estimating are provided.