skip to main content


Title: Efficient Algorithms for Least Square Piecewise Polynomial Regression
We present approximation and exact algorithms for piecewise regression of univariate and bivariate data using fixed-degree polynomials. Specifically, given a set S of n data points (x1, y1), . . . , (xn, yn) ∈ Rd × R where d ∈ {1, 2}, the goal is to segment xi’s into some (arbitrary) number of disjoint pieces P1, . . . , Pk, where each piece Pj is associated with a fixed-degree polynomial fj : Rd → R, to minimize the total loss function λk+􏰄ni=1(yi −f(xi))2, where λ ≥ 0 is a regularization term that penalizes model complexity (number of pieces) and f : 􏰇kj=1 Pj → R is the piecewise polynomial function defined as f|Pj = fj. The pieces P1,...,Pk are disjoint intervals of R in the case of univariate data and disjoint axis-aligned rectangles in the case of bivariate data. Our error approximation allows use of any fixed-degree polynomial, not just linear functions. Our main results are the following. For univariate data, we present a (1 + ε)-approximation algorithm with time complexity O(nε log1ε), assuming that data is presented in sorted order of xi’s. For bivariate data, we √ present three results: a sub-exponential exact algorithm with running time nO( n); a polynomial-time constant- approximation algorithm; and a quasi-polynomial time approximation scheme (QPTAS). The bivariate case is believed to be NP-hard in the folklore but we could not find a published record in the literature, so in this paper we also present a hardness proof for completeness.  more » « less
Award ID(s):
1814172
NSF-PAR ID:
10284631
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ESA21: Proceedings of European Symposium on Algorithms
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Oh, A ; Naumann, T ; Globerson, A ; Saenko, K ; Hardt, M ; Levine, S (Ed.)
    We consider the problem of learning a single-index target function f∗ : Rd → R under the spiked covariance data: f∗(x) = σ∗   √ 1 1+θ ⟨x,μ⟩   , x ∼ N(0, Id + θμμ⊤), θ ≍ dβ for β ∈ [0, 1), where the link function σ∗ : R → R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ∗), and it depends on the projection of input x onto the spike (signal) direction μ ∈ Rd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d → ∞, n/d → ψ ∈ (0,∞), we ask the following question: how large should the spike magnitude θ be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f∗? We show that for kernel ridge regression, β ≥ 1 − 1 p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, β > 1 − 1 k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k ≤ p by definition, neural networks can adapt to such structures more effectively. 
    more » « less
  2. We give a tight characterization of the (vectorized Euclidean) norm of weights required to realize a function f : Rd → R as a single hidden-layer ReLU network with an unbounded number of units (infinite width), extending the univariate characterization of Savarese et al. (2019) to the multivariate case. 
    more » « less
  3. Amir Hashemi (Ed.)
    We present Hermite polynomial interpolation algorithms that for a sparse univariate polynomial f with coefficients from a field compute the polynomial from fewer points than the classical algorithms. If the interpolating polynomial f has t terms, our algorithms, require argument/value triples (w^i, f(w^i), f'(w^i)) for i=0,...,t + ceiling( (t+1)/2 ) - 1, where w is randomly sampled and the probability of a correct output is determined from a degree bound for f. With f' we denote the derivative of f. Our algorithms generalize to multivariate polynomials, higher derivatives and sparsity with respect to Chebyshev polynomial bases. We have algorithms that can correct errors in the points by oversampling at a limited number of good values. If an upper bound B >= t for the number of terms is given, our algorithms use a randomly selected w and, with high probability, ceiling( t/2 ) + B triples, but then never return an incorrect output. The algorithms are based on Prony's sparse interpolation algorithm. While Prony's algorithm and its variants use fewer values, namely, 2t+1 and t+B values f(w^i), respectively, they need more arguments w^i. The situation mirrors that in algebraic error correcting codes, where the Reed-Solomon code requires fewer values than the multiplicity code, which is based on Hermite interpolation, but the Reed-Solomon code requires more distinct arguments. Our sparse Hermite interpolation algorithms can interpolate polynomials over finite fields and over the complex numbers, and from floating point data. Our Prony-based approach does not encounter the Birkhoff phenomenon of Hermite interpolation, when a gap in the derivative values causes multiple interpolants. We can interpolate from t+1 values of f and 2t-1 values of f'. 
    more » « less
  4. We present a finite element technique for approximating the surface Hessian of a discrete scalar function on triangulated surfaces embedded in $\R^{3}$, with or without boundary. We then extend the method to compute approximations of the full shape operator of the underlying surface using only the known discrete surface. The method is based on the Hellan--Herrmann--Johnson (HHJ) element and does not require any ad-hoc modifications. Convergence is established provided the discrete surface satisfies a Lagrange interpolation property related to the exact surface. The convergence rate, in $L^2$, for the shape operator approximation is $O(h^m)$, where $m \geq 1$ is the polynomial degree of the surface, i.e. the method converges even for piecewise linear surface triangulations. For surfaces with boundary, some additional boundary data is needed to establish optimal convergence, e.g. boundary information about the surface normal vector or the curvature in the co-normal direction. Numerical examples are given on non-trivial surfaces that demonstrate our error estimates and the efficacy of the method. 
    more » « less
  5. Abstract This paper concerns first-order approximation of the piecewise-differentiable flow generated by a class of nonsmooth vector fields. Specifically, we represent and compute the Bouligand (or B-)derivative of the piecewise-differentiable flow generated by a vector field with event-selected discontinuities. Our results are remarkably efficient: although there are factorially many “pieces” of the derivative, we provide an algorithm that evaluates its action on a tangent vector using polynomial time and space, and verify the algorithm's correctness by deriving a representation for the B-derivative that requires “only” exponential time and space to construct. We apply our methods in two classes of illustrative examples: piecewise-constant vector fields and mechanical systems subject to unilateral constraints. 
    more » « less