Abstract We consider the nonparametric estimation of an S-shaped regression function. The least squares estimator provides a very natural, tuning-free approach, but results in a non-convex optimization problem, since the inflection point is unknown. We show that the estimator may nevertheless be regarded as a projection onto a finite union of convex cones, which allows us to propose a mixed primal-dual bases algorithm for its efficient, sequential computation. After developing a projection framework that demonstrates the consistency and robustness to misspecification of the estimator, our main theoretical results provide sharp oracle inequalities that yield worst-case and adaptive risk bounds for the estimation of the regression function, as well as a rate of convergence for the estimation of the inflection point. These results reveal not only that the estimator achieves the minimax optimal rate of convergence for both the estimation of the regression function and its inflection point (up to a logarithmic factor in the latter case), but also that it is able to achieve an almost-parametric rate when the true regression function is piecewise affine with not too many affine pieces. Simulations and a real data application to air pollution modelling also confirm the desirable finite-sample properties of the estimator, and our algorithm is implemented in the R package Sshaped. 
                        more » 
                        « less   
                    This content will become publicly available on March 18, 2026
                            
                            Nearly Optimal Learning Using Sparse Deep ReLU Networks in Regularized Empirical Risk Minimization With Lipschitz Loss
                        
                    
    
            Abstract We propose a sparse deep ReLU network (SDRN) estimator of the regression function obtained from regularized empirical risk minimization with a Lipschitz loss function. Our framework can be applied to a variety of regression and classification problems. We establish novel nonasymptotic excess risk bounds for our SDRN estimator when the regression function belongs to a Sobolev space with mixed derivatives. We obtain a new, nearly optimal, risk rate in the sense that the SDRN estimator can achieve nearly the same optimal minimax convergence rate as one-dimensional nonparametric regression with the dimension involved in a logarithm term only when the feature dimension is fixed. The estimator has a slightly slower rate when the dimension grows with the sample size. We show that the depth of the SDRN estimator grows with the sample size in logarithmic order, and the total number of nodes and weights grows in polynomial order of the sample size to have the nearly optimal risk rate. The proposed SDRN can go deeper with fewer parameters to well estimate the regression and overcome the overfitting problem encountered by conventional feedforward neural networks. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2310288
- PAR ID:
- 10607949
- Publisher / Repository:
- Massachusetts Institute of Technology
- Date Published:
- Journal Name:
- Neural Computation
- Volume:
- 37
- Issue:
- 4
- ISSN:
- 0899-7667
- Page Range / eLocation ID:
- 815 to 870
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application.more » « less
- 
            Abstract In this article, we introduce a functional structural equation model for estimating directional relations from multivariate functional data. We decouple the estimation into two major steps: directional order determination and selection through sparse functional regression. We first propose a score function at the linear operator level, and show that its minimization can recover the true directional order when the relation between each function and its parental functions is nonlinear. We then develop a sparse functional additive regression, where both the response and the multivariate predictors are functions and the regression relation is additive and nonlinear. We also propose strategies to speed up the computation and scale up our method. In theory, we establish the consistencies of order determination, sparse functional additive regression, and directed acyclic graph estimation, while allowing both the dimension of the Karhunen–Loéve expansion coefficients and the number of random functions to diverge with the sample size. We illustrate the efficacy of our method through simulations, and an application to brain effective connectivity analysis.more » « less
- 
            null (Ed.)We consider the regression problem of estimating functions on $$ \mathbb{R}^D $$ but supported on a $ d $-dimensional manifold $$ \mathcal{M} ~~\subset \mathbb{R}^D $$ with $$ d \ll D $$. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $$ \mathcal{M} $$ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $$ \mathbb{R}^D $$. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.more » « less
- 
            null (Ed.)Summary We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $$n$$, the smoothing spline estimator can be expressed as a linear combination of $$n$$ basis functions, requiring $O(n^3)$ computational time when the number $$d$$ of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the full-sample smoothing spline estimator can be approximated by an estimator based on $$q$$ randomly selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the same rate when $$q$$ is of order $$O\{n^{2/(pr+1)}\}$$, where $$p\in [1,2]$$ depends on the true function and $r > 1$ depends on the type of spline. Such a $$q$$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease $$q$$ to around $$O\{n^{1/(pr+1)}\}$$ when $$d\leq pr+1$$. Applications to synthetic and real-world datasets show that the proposed method leads to a smaller prediction error than other basis selection methods.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
