We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the ℓp-sparsity (0<1) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
more »
« less
Expectile regression via deep residual networks
Expectile is a generalization of the expected value in probability and statistics. In finance and risk management, the expectile is considered to be an important risk measure due to its connection with gain–loss ratio and its coherent and elicitable properties. Linear multiple expectile regression was proposed in 1987 for estimating the conditional expectiles of a response given a set of covariates. Recently, more flexible nonparametric expectile regression models were proposed based on gradient boosting and kernel learning. In this paper, we propose a new nonparametric expectile regression model by adopting the deep residual network learning framework and name itExpectile NN. Extensive numerical studies on simulated and real datasets demonstrate that Expectile NN has very competitive performance compared with existing methods. We explicitly specify the architecture of Expectile NN so that it is easy to be reproduced and used by others. Expectile NN is the first deep learning model for nonparametric expectile regression.
more »
« less
- PAR ID:
- 10446037
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Stat
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2049-1573
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We propose a sparse deep ReLU network (SDRN) estimator of the regression function obtained from regularized empirical risk minimization with a Lipschitz loss function. Our framework can be applied to a variety of regression and classification problems. We establish novel nonasymptotic excess risk bounds for our SDRN estimator when the regression function belongs to a Sobolev space with mixed derivatives. We obtain a new, nearly optimal, risk rate in the sense that the SDRN estimator can achieve nearly the same optimal minimax convergence rate as one-dimensional nonparametric regression with the dimension involved in a logarithm term only when the feature dimension is fixed. The estimator has a slightly slower rate when the dimension grows with the sample size. We show that the depth of the SDRN estimator grows with the sample size in logarithmic order, and the total number of nodes and weights grows in polynomial order of the sample size to have the nearly optimal risk rate. The proposed SDRN can go deeper with fewer parameters to well estimate the regression and overcome the overfitting problem encountered by conventional feedforward neural networks.more » « less
-
Abstract Causal inference practitioners have increasingly adopted machine learning techniques with the aim of producing principled uncertainty quantification for causal effects while minimizing the risk of model misspecification. Bayesian nonparametric approaches have attracted attention as well, both for their flexibility and their promise of providing natural uncertainty quantification. Priors on high‐dimensional or nonparametric spaces, however, can often unintentionally encode prior information that is at odds with substantive knowledge in causal inference—specifically, the regularization required for high‐dimensional Bayesian models to work can indirectly imply that the magnitude of the confounding is negligible. In this paper, we explain this problem and provide tools for (i) verifying that the prior distribution does not encode an inductive bias away from confounded models and (ii) verifying that the posterior distribution contains sufficient information to overcome this issue if it exists. We provide a proof‐of‐concept on simulated data from a high‐dimensional probit‐ridge regression model, and illustrate on a Bayesian nonparametric decision tree ensemble applied to a large medical expenditure survey.more » « less
-
ABSTRACT Individualized modeling has become increasingly popular in recent years with its growing application in fields such as personalized medicine and mobile health studies. With rich longitudinal measurements, it is of great interest to model certain subject‐specific time‐varying covariate effects. In this paper, we propose an individualized time‐varying nonparametric model by leveraging the subgroup information from the population. The proposed method approximates the time‐varying covariate effect using nonparametric B‐splines and aggregates the estimated nonparametric coefficients that share common patterns. Moreover, the proposed method can effectively handle various missing data patterns that frequently arise in mobile health data. Specifically, our method achieves subgrouping by flexibly accommodating varying dimensions of B‐spline coefficients due to missingness. This capability sets it apart from other fusion‐type approaches for subgrouping. The subgroup information can also potentially provide meaningful insight into the characteristics of subjects and assist in recommending an effective treatment or intervention. An efficient ADMM algorithm is developed for implementation. Our numerical studies and application to mobile health data on monitoring pregnant women's deep sleep and physical activities demonstrate that the proposed method achieves better performance compared to other existing methods.more » « less
-
Summary We study quantile trend filtering, a recently proposed method for nonparametric quantile regression, with the goal of generalizing existing risk bounds for the usual trend-filtering estimators that perform mean regression. We study both the penalized and the constrained versions, of order $$r \geqslant 1$$, of univariate quantile trend filtering. Our results show that both the constrained and the penalized versions of order $$r \geqslant 1$$ attain the minimax rate up to logarithmic factors, when the $(r-1)$th discrete derivative of the true vector of quantiles belongs to the class of bounded-variation signals. Moreover, we show that if the true vector of quantiles is a discrete spline with a few polynomial pieces, then both versions attain a near-parametric rate of convergence. Corresponding results for the usual trend-filtering estimators are known to hold only when the errors are sub-Gaussian. In contrast, our risk bounds are shown to hold under minimal assumptions on the error variables. In particular, no moment assumptions are needed and our results hold under heavy-tailed errors. Our proof techniques are general, and thus can potentially be used to study other nonparametric quantile regression methods. To illustrate this generality, we employ our proof techniques to obtain new results for multivariate quantile total-variation denoising and high-dimensional quantile linear regression.more » « less
An official website of the United States government
