Abstract We propose a sparse deep ReLU network (SDRN) estimator of the regression function obtained from regularized empirical risk minimization with a Lipschitz loss function. Our framework can be applied to a variety of regression and classification problems. We establish novel nonasymptotic excess risk bounds for our SDRN estimator when the regression function belongs to a Sobolev space with mixed derivatives. We obtain a new, nearly optimal, risk rate in the sense that the SDRN estimator can achieve nearly the same optimal minimax convergence rate as one-dimensional nonparametric regression with the dimension involved in a logarithm term only when the feature dimension is fixed. The estimator has a slightly slower rate when the dimension grows with the sample size. We show that the depth of the SDRN estimator grows with the sample size in logarithmic order, and the total number of nodes and weights grows in polynomial order of the sample size to have the nearly optimal risk rate. The proposed SDRN can go deeper with fewer parameters to well estimate the regression and overcome the overfitting problem encountered by conventional feedforward neural networks.
more »
« less
Deep Neural Networks for Estimation and Inference
We study deep neural networks and their use in semiparametric inference. We establish novel nonasymptotic high probability bounds for deep feedforward neural nets. These deliver rates of convergence that are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second‐step inference after first‐step estimation with deep learning, a result also new to the literature. Our nonasymptotic high probability bounds, and the subsequent semiparametric inference, treat the current standard architecture: fully connected feedforward neural networks (multilayer perceptrons), with the now‐common rectified linear unit activation function, unbounded weights, and a depth explicitly diverging with the sample size. We discuss other architectures as well, including fixed‐width, very deep networks. We establish the nonasymptotic bounds for these deep nets for a general class of nonparametric regression‐type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. We then apply our theory to develop semiparametric inference, focusing on causal parameters for concreteness, and demonstrate the effectiveness of deep learning with an empirical application to direct mail marketing.
more »
« less
- Award ID(s):
- 2042473
- PAR ID:
- 10328957
- Date Published:
- Journal Name:
- Econometrica
- Volume:
- 89
- Issue:
- 1
- ISSN:
- 0012-9682
- Page Range / eLocation ID:
- 181 to 213
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Drăgoi, C.; Mukherjee, S.; Namjoshi, K. (Ed.)This paper studies the problem of range analysis for feedforward neural networks, which is a basic primitive for applications such as robustness of neural networks, compliance to specifications and reachability analysis of neural-network feedback systems. Our approach focuses on ReLU (rectified linear unit) feedforward neural nets that present specific difficulties: approaches that exploit derivatives do not apply in general, the number of patterns of neuron activations can be quite large even for small networks, and convex approximations are generally too coarse. In this paper, we employ set-based methods and abstract interpretation that have been very successful in coping with similar difficulties in classical program verification. We present an approach that abstracts ReLU feedforward neural networks using tropical polyhedra. We show that tropical polyhedra can efficiently abstract ReLU activation function, while being able to control the loss of precision due to linear computations. We show how the connection between ReLU networks and tropical rational functions can provide approaches for range analysis of ReLU neural networks. We report on a preliminary evaluation of our approach using a prototype implementation.more » « less
-
For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a “submodel”) where classical linear regression applies. We analyze linear regression estimators resulting from model selection by proving estimation error and linear representation bounds uniformly over sets of submodels. Based on deterministic inequalities, our results provide “good” rates when applied to both independent and dependent data. These results are useful in meaningfully interpreting the linear regression estimator obtained after exploring and reducing the variables and also in justifying post-model-selection inference. All results are derived under no model assumptions and are nonasymptotic in nature.more » « less
-
We expose the statistical foundations of deep learning with the goal of facilitating conversation between the deep learning and statistics communities. We highlight core themes at the intersection; summarize key neural models, such as feedforward neural networks, sequential neural networks, and neural latent variable models; and link these ideas to their roots in probability and statistics. We also highlight research directions in deep learning where there are opportunities for statistical contributions.more » « less
-
Summary We provide adaptive inference methods, based on $$\ell _1$$ regularization, for regular (semiparametric) and nonregular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of nonregular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ‘double sparsity robustness’: either the approximation to the regression or the approximation to the representer can be ‘completely dense’ as long as the other is sufficiently ‘sparse’. Our main results are nonasymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters.more » « less
An official website of the United States government

