Title: Efficient Algorithms for Least Square Piecewise Polynomial Regression
We present approximation and exact algorithms for piecewise regression of univariate and bivariate data using fixed-degree polynomials. Specifically, given a set S of n data points (x1, y1), . . . , (xn, yn) ∈ Rd × R where d ∈ {1, 2}, the goal is to segment xi’s into some (arbitrary) number of disjoint pieces P1, . . . , Pk, where each piece Pj is associated with a fixed-degree polynomial fj : Rd → R, to minimize the total loss function λk+ni=1(yi −f(xi))2, where λ ≥ 0 is a regularization term that penalizes model complexity (number of pieces) and f : kj=1 Pj → R is the piecewise polynomial function defined as f|Pj = fj. The pieces P1,...,Pk are disjoint intervals of R in the case of univariate data and disjoint axis-aligned rectangles in the case of bivariate data. Our error approximation allows use of any fixed-degree polynomial, not just linear functions.
Our main results are the following. For univariate data, we present a (1 + ε)-approximation algorithm with time complexity O(nε log1ε), assuming that data is presented in sorted order of xi’s. For bivariate data, we
√
present three results: a sub-exponential exact algorithm with running time nO( n); a polynomial-time constant- approximation algorithm; and a quasi-polynomial time approximation scheme (QPTAS). The bivariate case is believed to be NP-hard in the folklore but we could not find a published record in the literature, so in this paper we also present a hardness proof for completeness. more »« less
Vyas, Nikhil; Williams, R. Ryan(
, Theory of Computing Systems)
Abstract
We continue the program of proving circuit lower bounds via circuit satisfiability algorithms. So far, this program has yielded several concrete results, proving that functions in$\mathsf {Quasi}\text {-}\mathsf {NP} = \mathsf {NTIME}[n^{(\log n)^{O(1)}}]$and other complexity classes do not have small circuits (in the worst case and/or on average) from various circuit classes$\mathcal { C}$, by showing that$\mathcal { C}$admits non-trivial satisfiability and/or#SAT algorithms which beat exhaustive search by a minor amount. In this paper, we present a new strong lower bound consequence of having a non-trivial#SAT algorithm for a circuit class${\mathcal C}$. Say that a symmetric Boolean functionf(x1,…,xn) issparseif it outputs 1 onO(1) values of${\sum }_{i} x_{i}$. We show that for every sparsef, and for all “typical”$\mathcal { C}$, faster#SAT algorithms for$\mathcal { C}$circuits imply lower bounds against the circuit class$f \circ \mathcal { C}$, which may bestrongerthan$\mathcal { C}$itself. In particular:
#SAT algorithms fornk-size$\mathcal { C}$-circuits running in 2n/nktime (for allk) implyNEXPdoes not have$(f \circ \mathcal { C})$-circuits of polynomial size.
Applying#SAT algorithms from the literature, one immediate corollary of our results is thatQuasi-NPdoes not haveEMAJ∘ACC0∘THRcircuits of polynomial size, whereEMAJis the “exact majority” function, improving previous lower bounds againstACC0[Williams JACM’14] andACC0∘THR[Williams STOC’14], [Murray-Williams STOC’18]. This is the first nontrivial lower bound against such a circuit class.
Xiangyu Guo, Guy Kortsarz(
, Leibniz international proceedings in informatics)
null
(Ed.)
Directed Steiner Tree (DST) is a central problem in combinatorial optimization and theoretical computer science: Given a directed graph G = (V, E) with edge costs c ∈ ℝ_{≥ 0}^E, a root r ∈ V and k terminals K ⊆ V, we need to output a minimum-cost arborescence in G that contains an rrightarrow t path for every t ∈ K. Recently, Grandoni, Laekhanukit and Li, and independently Ghuge and Nagarajan, gave quasi-polynomial time O(log²k/log log k)-approximation algorithms for the problem, which are tight under popular complexity assumptions. In this paper, we consider the more general Degree-Bounded Directed Steiner Tree (DB-DST) problem, where we are additionally given a degree bound d_v on each vertex v ∈ V, and we require that every vertex v in the output tree has at most d_v children. We give a quasi-polynomial time (O(log n log k), O(log² n))-bicriteria approximation: The algorithm produces a solution with cost at most O(log nlog k) times the cost of the optimum solution that violates the degree constraints by at most a factor of O(log²n). This is the first non-trivial result for the problem. While our cost-guarantee is nearly optimal, the degree violation factor of O(log²n) is an O(log n)-factor away from the approximation lower bound of Ω(log n) from the Set Cover hardness. The hardness result holds even on the special case of the Degree-Bounded Group Steiner Tree problem on trees (DB-GST-T). With the hope of closing the gap, we study the question of whether the degree violation factor can be made tight for this special case. We answer the question in the affirmative by giving an (O(log nlog k), O(log n))-bicriteria approximation algorithm for DB-GST-T.
Kaltofen, Erich L.(
, ISSAC '22 Proc. 2022 ACM Internat. Symp. Symbolic Algebraic Comput.)
Amir Hashemi
(Ed.)
We present Hermite polynomial interpolation algorithms that for a sparse univariate polynomial f with coefficients from a field compute the polynomial from fewer points than the classical algorithms. If the interpolating polynomial f has t terms, our algorithms, require argument/value triples (w^i, f(w^i), f'(w^i)) for i=0,...,t + ceiling( (t+1)/2 ) - 1, where w is randomly sampled and the probability of a correct output is determined from a degree bound for f. With f' we denote the derivative of f. Our algorithms generalize to multivariate polynomials, higher derivatives and sparsity with respect to Chebyshev polynomial bases. We have algorithms that can correct errors in the points by oversampling at a limited number of good values. If an upper bound B >= t for the number of terms is given, our algorithms use a randomly selected w and, with high probability, ceiling( t/2 ) + B triples, but then never return an incorrect output.
The algorithms are based on Prony's sparse interpolation algorithm. While Prony's algorithm and its variants use fewer values, namely, 2t+1 and t+B values f(w^i), respectively, they need more arguments w^i. The situation mirrors that in algebraic error correcting codes, where the Reed-Solomon code requires fewer values than the multiplicity code, which is based on Hermite interpolation, but the Reed-Solomon code requires more distinct arguments. Our sparse Hermite interpolation algorithms can interpolate polynomials over finite fields and over the complex numbers, and from floating point data. Our Prony-based approach does not encounter the Birkhoff phenomenon of Hermite interpolation, when a gap in the derivative values causes multiple interpolants. We can interpolate from t+1 values of f and 2t-1 values of f'.
Chekuri, Chandra; Quanrud, Kent; Torres, Manuel(
, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms)
The densest subgraph problem in a graph (\dsg), in the simplest
form, is the following. Given an undirected graph $G=(V,E)$ find a
subset $S \subseteq V$ of vertices that maximizes the ratio
$|E(S)|/|S|$ where $E(S)$ is the set of edges with both endpoints
in $S$. \dsg and several of its variants are well-studied in theory
and practice and have many applications in data mining and network
analysis. In this paper we study fast algorithms and structural
aspects of \dsg via the lens of \emph{supermodularity}.
For this we consider the densest supermodular subset problem (\dssp):
given a non-negative supermodular function $f: 2^V
\rightarrow \mathbb{R}_+$, maximize $f(S)/|S|$.
For \dsg we describe a simple flow-based algorithm that outputs a
$(1-\eps)$-approximation in deterministic $\tilde{O}(m/\eps)$ time
where $m$ is the number of edges. Our algorithm is the first to have
a near-linear dependence on $m$ and $1/\eps$ and improves previous
methods based on an LP relaxation. It generalizes to hypergraphs, and
also yields a faster algorithm for directed \dsg.
Greedy peeling algorithms have been very popular for \dsg and
several variants due to their efficiency, empirical performance, and
worst-case approximation guarantees. We describe a simple peeling
algorithm for \dssp and analyze its approximation guarantee in a
fashion that unifies several existing results. Boob et
al.\ \cite{bgpstww-20} developed an \emph{iterative} peeling
algorithm for \dsg which appears to work very well in practice, and
made a conjecture about its convergence to optimality. We
affirmatively answer their conjecture, and in fact prove that a
natural generalization of their algorithm converges to a
$(1-\eps)$-approximation for \emph{any} supermodular function $f$;
the key to our proof is to consider an LP formulation that is
derived via the \Lovasz extension of a supermodular function. For
\dsg the bound on the number of iterations we prove is $O(\frac{\Delta
\ln |V|}{\lambda^*}\cdot \frac{1}{\eps^2})$ where $\Delta$ is
the maximum degree and $\lambda^*$ is the optimum value. Our work suggests
that iterative peeling can be an effective heuristic for
several objectives considered in the literature.
Finally, we show that the $2$-approximation for densest-at-least-$k$
subgraph \cite{ks-09} extends to the supermodular setting. We also
give a unified analysis of the peeling algorithm for this problem, and
via this analysis derive an approximation guarantee for a
generalization of \dssp to maximize $f(S)/g(|S|)$ for a concave
function $g$.
Chudnovsky, Maria.; Pillipczuk, Marcin.; Pillipczuk, Mihal; and Thomasse, Stephan.(
, Proceedings of the annual ACMSIAM symposium on discrete algorithms)
In the Maximum Independent Set problem we are asked to find a set of pairwise nonadjacent vertices in a given graph with the maximum possible cardinality. In general graphs, this classical problem is known to be NP-hard and hard to approximate within a factor of n1−ε for any ε > 0. Due to this, investigating the complexity of Maximum Independent Set in various graph classes in hope of finding better tractability results is an active research direction.
In H-free graphs, that is, graphs not containing a fixed graph H as an induced subgraph, the problem is known to remain NP-hard and APX-hard whenever H contains a cycle, a vertex of degree at least four, or two vertices of degree at least three in one connected component. For the remaining cases, where every component of H is a path or a subdivided claw, the complexity of Maximum Independent Set remains widely open, with only a handful of polynomial-time solvability results for small graphs H such as P5, P6, the claw, or the fork.
We prove that for every such “possibly tractable” graph H there exists an algorithm that, given an H-free graph G and an accuracy parameter ε > 0, finds an independent set in G of cardinality within a factor of (1 – ε) of the optimum in time exponential in a polynomial of log | V(G) | and ε−1. That is, we show that for every graph H for which Maximum Independent Set is not known to be APX-hard in H-free graphs, the problem admits a quasi-polynomial time approximation scheme in this graph class. Our algorithm works also in the more general weighted setting, where the input graph is supplied with a weight function on vertices and we are maximizing the total weight of an independent set.
Lokshtanov, Daniel, Suri, Subhash, and Xue, Jie. Efficient Algorithms for Least Square Piecewise Polynomial Regression. Retrieved from https://par.nsf.gov/biblio/10284631. ESA21: Proceedings of European Symposium on Algorithms .
Lokshtanov, Daniel, Suri, Subhash, & Xue, Jie. Efficient Algorithms for Least Square Piecewise Polynomial Regression. ESA21: Proceedings of European Symposium on Algorithms, (). Retrieved from https://par.nsf.gov/biblio/10284631.
Lokshtanov, Daniel, Suri, Subhash, and Xue, Jie.
"Efficient Algorithms for Least Square Piecewise Polynomial Regression". ESA21: Proceedings of European Symposium on Algorithms (). Country unknown/Code not available. https://par.nsf.gov/biblio/10284631.
@article{osti_10284631,
place = {Country unknown/Code not available},
title = {Efficient Algorithms for Least Square Piecewise Polynomial Regression},
url = {https://par.nsf.gov/biblio/10284631},
abstractNote = {We present approximation and exact algorithms for piecewise regression of univariate and bivariate data using fixed-degree polynomials. Specifically, given a set S of n data points (x1, y1), . . . , (xn, yn) ∈ Rd × R where d ∈ {1, 2}, the goal is to segment xi’s into some (arbitrary) number of disjoint pieces P1, . . . , Pk, where each piece Pj is associated with a fixed-degree polynomial fj : Rd → R, to minimize the total loss function λk+ni=1(yi −f(xi))2, where λ ≥ 0 is a regularization term that penalizes model complexity (number of pieces) and f : kj=1 Pj → R is the piecewise polynomial function defined as f|Pj = fj. The pieces P1,...,Pk are disjoint intervals of R in the case of univariate data and disjoint axis-aligned rectangles in the case of bivariate data. Our error approximation allows use of any fixed-degree polynomial, not just linear functions. Our main results are the following. For univariate data, we present a (1 + ε)-approximation algorithm with time complexity O(nε log1ε), assuming that data is presented in sorted order of xi’s. For bivariate data, we √ present three results: a sub-exponential exact algorithm with running time nO( n); a polynomial-time constant- approximation algorithm; and a quasi-polynomial time approximation scheme (QPTAS). The bivariate case is believed to be NP-hard in the folklore but we could not find a published record in the literature, so in this paper we also present a hardness proof for completeness.},
journal = {ESA21: Proceedings of European Symposium on Algorithms},
author = {Lokshtanov, Daniel and Suri, Subhash and Xue, Jie},
editor = {null}
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.