Abstract We provide high-probability bounds on the condition number of random feature matrices. In particular, we show that if the complexity ratio $N/m$, where $$N$$ is the number of neurons and $$m$$ is the number of data samples, scales like $$\log ^{-1}(N)$$ or $$\log (m)$$, then the random feature matrix is well-conditioned. This result holds without the need of regularization and relies on establishing various concentration bounds between dependent components of the random feature matrix. Additionally, we derive bounds on the restricted isometry constant of the random feature matrix. We also derive an upper bound for the risk associated with regression problems using a random feature matrix. This upper bound exhibits the double descent phenomenon and indicates that this is an effect of the double descent behaviour of the condition number. The risk bounds include the underparameterized setting using the least squares problem and the overparameterized setting where using either the minimum norm interpolation problem or a sparse regression problem. For the noiseless least squares or sparse regression cases, we show that the risk decreases as $$m$$ and $$N$$ increase. The risk bound matches the optimal scaling in the literature and the constants in our results are explicit and independent of the dimension of the data.
more »
« less
Quantile-based Random Kaczmarz for corrupted linear systems of equations
Abstract We consider linear systems $Ax = b$ where $$A \in \mathbb{R}^{m \times n}$$ consists of normalized rows, $$\|a_i\|_{\ell ^2} = 1$$, and where up to $$\beta m$$ entries of $$b$$ have been corrupted (possibly by arbitrarily large numbers). Haddock, Needell, Rebrova & Swartworth propose a quantile-based Random Kaczmarz method and show that for certain random matrices $$A$$ it converges with high likelihood to the true solution. We prove a deterministic version by constructing, for any matrix $$A$$, a number $$\beta _A$$ such that there is convergence for all perturbations with $$\beta < \beta _A$$. Assuming a random matrix heuristic, this proves convergence for tall Gaussian matrices with up to $$\sim 0.5\%$$ corruption (a number that can likely be improved).
more »
« less
- Award ID(s):
- 2123224
- PAR ID:
- 10362563
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Information and Inference: A Journal of the IMA
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2049-8772
- Page Range / eLocation ID:
- p. 448-465
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper we revisit a detector first derived by Reed and Yu [1], generalized by Bliss and Parker [2], and recently studied by Hiltunen, Loubaton, and Chevalier [3], [4]. The problem is to detect a known signal transmitted over an unknown MIMO channel of unknown complex gains and unknown additive noise covariance. The probability distribution of a CFAR detector for this problem was first derived for the SIMO channel in [1]. We generalize this distribution for the case of a MIMO channel, and show that the CFAR detector statistic is distributed as the product of independent scalar beta random variables under the null. Our results, based on the theory of beta distributed random matrices, hold for M symbols transmitted from p transmitters and received at L receivers. The asymptotic results of [3], [4] are based on large random matrix theory, which assumes L and M to be unbounded.more » « less
-
Given a matrix A ∈ ℝn\texttimes{}d and a vector b ∈ ℝn, we consider the regression problem with ℓ∞ guarantees: finding a vector x′ ∈ ℝd such that $$||x'-x^* ||_infty leq frac{epsilon}{sqrt{d}}cdot ||Ax^*-b||_2cdot ||A^dagger||$$, where x* = arg minx∈Rd ||Ax – b||2. One popular approach for solving such ℓ2 regression problem is via sketching: picking a structured random matrix S ∈ ℝm\texttimes{}n with m < n and S A can be quickly computed, solve the "sketched" regression problem arg minx∈ℝd ||S Ax – Sb||2. In this paper, we show that in order to obtain such ℓ∞ guarantee for ℓ2 regression, one has to use sketching matrices that are dense. To the best of our knowledge, this is the first user case in which dense sketching matrices are necessary. On the algorithmic side, we prove that there exists a distribution of dense sketching matrices with m = ε-2d log3(n/δ) such that solving the sketched regression problem gives the ℓ∞ guarantee, with probability at least 1 – δ. Moreover, the matrix S A can be computed in time O(nd log n). Our row count is nearly-optimal up to logarithmic factors, and significantly improves the result in (Price et al., 2017), in which a superlinear in d rows, m = Ω(ε-2d1+γ) for γ ∈ (0, 1) is required. Moreover, we develop a novel analytical framework for ℓ∞ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property introduced in (Song \& Yu, 2021). Our analysis is much simpler and more general than that of (Price et al., 2017). Leveraging this framework, we extend the ℓ∞ guarantee regression result to dense sketching matrices for computing the fast tensor product of vectors.more » « less
-
Abernethy, Jacob; Agarwal, Agarwal (Ed.)Motivated by problems in controlled experiments, we study the discrepancy of random matrices with continuous entries where the number of columns $$n$$ is much larger than the number of rows $$m$$. Our first result shows that if $$\omega(1) = m = o(n)$$, a matrix with i.i.d. standard Gaussian entries has discrepancy $$\Theta(\sqrt{n} \, 2^{-n/m})$$ with high probability. This provides sharp guarantees for Gaussian discrepancy in a regime that had not been considered before in the existing literature. Our results also apply to a more general family of random matrices with continuous i.i.d. entries, assuming that $$m = O(n/\log{n})$$. The proof is non-constructive and is an application of the second moment method. Our second result is algorithmic and applies to random matrices whose entries are i.i.d. and have a Lipschitz density. We present a randomized polynomial-time algorithm that achieves discrepancy $$e^{-\Omega(\log^2(n)/m)}$$ with high probability, provided that $$m = O(\sqrt{\log{n}})$$. In the one-dimensional case, this matches the best known algorithmic guarantees due to Karmarkar–Karp. For higher dimensions $$2 \leq m = O(\sqrt{\log{n}})$$, this establishes the first efficient algorithm achieving discrepancy smaller than $$O( \sqrt{m} )$$.more » « less
-
This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on- the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm’s parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to approximately 45%. We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of sqrt(M), where $$M$$ is the cache size. Finally, we incorporate our sketching method into a randomized algorithm for overdetermined least squares with sparse data matrices. Our results are competitive with SuiteSparse for highly overdetermined problems; in some cases, we obtain a speedup of 10x over SuiteSparse.more » « less
An official website of the United States government
