Given a matrix A ∈ ℝn\texttimes{}d and a vector b ∈ ℝn, we consider the regression problem with ℓ∞ guarantees: finding a vector x′ ∈ ℝd such that $$||x'-x^* ||_infty leq frac{epsilon}{sqrt{d}}cdot ||Ax^*-b||_2cdot ||A^dagger||$$, where x* = arg minx∈Rd ||Ax – b||2. One popular approach for solving such ℓ2 regression problem is via sketching: picking a structured random matrix S ∈ ℝm\texttimes{}n with m < n and S A can be quickly computed, solve the "sketched" regression problem arg minx∈ℝd ||S Ax – Sb||2. In this paper, we show that in order to obtain such ℓ∞ guarantee for ℓ2 regression, one has to use sketching matrices that are dense. To the best of our knowledge, this is the first user case in which dense sketching matrices are necessary. On the algorithmic side, we prove that there exists a distribution of dense sketching matrices with m = ε-2d log3(n/δ) such that solving the sketched regression problem gives the ℓ∞ guarantee, with probability at least 1 – δ. Moreover, the matrix S A can be computed in time O(nd log n). Our row count is nearly-optimal up to logarithmic factors, and significantly improves the result in (Price et al., 2017), in which a superlinear in d rows, m = Ω(ε-2d1+γ) for γ ∈ (0, 1) is required. Moreover, we develop a novel analytical framework for ℓ∞ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property introduced in (Song \& Yu, 2021). Our analysis is much simpler and more general than that of (Price et al., 2017). Leveraging this framework, we extend the ℓ∞ guarantee regression result to dense sketching matrices for computing the fast tensor product of vectors.
more »
« less
L1 Regression with Lewis Weights Subsampling
We consider the problem of finding an approximate solution to ℓ1 regression while only observing a small number of labels. Given an n×d unlabeled data matrix X, we must choose a small set of m≪n rows to observe the labels of, then output an estimate βˆ whose error on the original problem is within a 1+ε factor of optimal. We show that sampling from X according to its Lewis weights and outputting the empirical minimizer succeeds with probability 1−δ for m>O(1ε2dlogdεδ). This is analogous to the performance of sampling according to leverage scores for ℓ2 regression, but with exponentially better dependence on δ. We also give a corresponding lower bound of Ω(dε2+(d+1ε2)log1δ).
more »
« less
- Award ID(s):
- 1751040
- PAR ID:
- 10336428
- Date Published:
- Journal Name:
- RANDOM
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract May the triforce be the 3-uniform hypergraph on six vertices with edges {123′, 12′3, 1′23}. We show that the minimum triforce density in a 3-uniform hypergraph of edge density δ is δ 4– o (1) but not O ( δ 4 ). Let M ( δ ) be the maximum number such that the following holds: for every ∊ > 0 and $$G = {\mathbb{F}}_2^n$$ with n sufficiently large, if A ⊆ G × G with A ≥ δ | G | 2 , then there exists a nonzero “popular difference” d ∈ G such that the number of “corners” ( x , y ), ( x + d , y ), ( x , y + d ) ∈ A is at least ( M ( δ )–∊)| G | 2 . As a corollary via a recent result of Mandache, we conclude that M ( δ ) = δ 4– o (1) and M ( δ ) = ω ( δ 4 ). On the other hand, for 0 < δ < 1/2 and sufficiently large N , there exists A ⊆ [ N ] 3 with | A | ≥ δN 3 such that for every d ≠ 0, the number of corners ( x , y , z ), ( x + d , y , z ), ( x , y + d , z ), ( x , y , z + d ) ∈ A is at most δ c log(1/ δ ) N 3 . A similar bound holds in higher dimensions, or for any configuration with at least 5 points or affine dimension at least 3.more » « less
-
null (Ed.)Abstract Let $$u_{k}$$ u k be a solution of the Helmholtz equation with the wave number k , $$\varDelta u_{k}+k^{2} u_{k}=0$$ Δ u k + k 2 u k = 0 , on (a small ball in) either $${\mathbb {R}}^{n}$$ R n , $${\mathbb {S}}^{n}$$ S n , or $${\mathbb {H}}^{n}$$ H n . For a fixed point p , we define $$M_{u_{k}}(r)=\max _{d(x,p)\le r}|u_{k}(x)|.$$ M u k ( r ) = max d ( x , p ) ≤ r | u k ( x ) | . The following three ball inequality $$M_{u_{k}}(2r)\le C(k,r,\alpha )M_{u_{k}}(r)^{\alpha }M_{u_{k}}(4r)^{1-\alpha }$$ M u k ( 2 r ) ≤ C ( k , r , α ) M u k ( r ) α M u k ( 4 r ) 1 - α is well known, it holds for some $$\alpha \in (0,1)$$ α ∈ ( 0 , 1 ) and $$C(k,r,\alpha )>0$$ C ( k , r , α ) > 0 independent of $$u_{k}$$ u k . We show that the constant $$C(k,r,\alpha )$$ C ( k , r , α ) grows exponentially in k (when r is fixed and small). We also compare our result with the increased stability for solutions of the Cauchy problem for the Helmholtz equation on Riemannian manifolds.more » « less
-
Consider a system of m polynomial equations {pi(x)=bi}i≤m of degree D≥2 in n-dimensional variable x∈ℝn such that each coefficient of every pi and bis are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest m -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations. We show that for every d∈ℕ, the (n+m)O(d)-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever m≥O(n)⋅(nd)D−1. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all d. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-4 sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at m≳O˜(n)⋅n(1−δ)(D−1) for 2nδ-time algorithms for all δ.more » « less
-
A Chor–Goldreich (CG) source is a sequence of random variables X = X1 ∘ … ∘ Xt, where each Xi ∼ {0,1}d and Xi has δ d min-entropy conditioned on any fixing of X1 ∘ … ∘ Xi−1. The parameter 0<δ≤ 1 is the entropy rate of the source. We typically think of d as constant and t as growing. We extend this notion in several ways, defining almost CG sources. Most notably, we allow each Xi to only have conditional Shannon entropy δ d. We achieve pseudorandomness results for almost CG sources which were not known to hold even for standard CG sources, and even for the weaker model of Santha–Vazirani sources: We construct a deterministic condenser that on input X, outputs a distribution which is close to having constant entropy gap, namely a distribution Z ∼ {0,1}m for m ≈ δ dt with min-entropy m−O(1). Therefore, we can simulate any randomized algorithm with small failure probability using almost CG sources with no multiplicative slowdown. This result extends to randomized protocols as well, and any setting in which we cannot simply cycle over all seeds, and a “one-shot” simulation is needed. Moreover, our construction works in an online manner, since it is based on random walks on expanders. Our main technical contribution is a novel analysis of random walks, which should be of independent interest. We analyze walks with adversarially correlated steps, each step being entropy-deficient, on good enough lossless expanders. We prove that such walks (or certain interleaved walks on two expanders), starting from a fixed vertex and walking according to X1∘ … ∘ Xt, accumulate most of the entropy in X.more » « less