NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal Embedding Dimension for Sparse Subspace Embeddings

Chenakkod, Shabarish; Derezinski, Michal; Dong, Xiaoyu; Rudelson, Mark (June 2024, Association for Computing Machinery (ACM), New York)

Full Text Available
Unbiased estimators for random design regression

Derezinski, Michal; Warmuth, Manfred; Hsu, Daniel (January 2022, Journal of machine learning research)

Full Text Available
Domain Sparsification of Discrete Distributions Using Entropic Independence

Anari, Nima; Derezinski, Michal; Vuong, Thuy-Duong; Yang, Elizabeth (January 2022, Leibniz international proceedings in informatics)
Braverman, Mark (Ed.)
We present a framework for speeding up the time it takes to sample from discrete distributions $$\mu$$ defined over subsets of size $$k$$ of a ground set of $$n$$ elements, in the regime where $$k$$ is much smaller than $$n$$. We show that if one has access to estimates of marginals $$\mathbb{P}_{S\sim \mu}[i\in S]$$, then the task of sampling from $$\mu$$ can be reduced to sampling from related distributions $$\nu$$ supported on size $$k$$ subsets of a ground set of only $$n^{1-\alpha}\cdot \operatorname{poly}(k)$$ elements. Here, $$1/\alpha\in [1, k]$$ is the parameter of entropic independence for $$\mu$$. Further, our algorithm only requires sparsified distributions $$\nu$$ that are obtained by applying a sparse (mostly $$0$$) external field to $$\mu$$, an operation that for many distributions $$\mu$$ of interest, retains algorithmic tractability of sampling from $$\nu$$. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of $$\mu$$, and in return reduce the amortized cost needed to produce many samples from the distribution $$\mu$$, as is often needed in upstream tasks such as counting and inference. For a wide range of distributions where $$\alpha=\Omega(1)$$, our result reduces the domain size, and as a corollary, the cost-per-sample, by a $$\operatorname{poly}(n)$$ factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi\'nski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to $$\alpha=1$$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work.
more » « less
Full Text Available
Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Up-date

Derezinski, Michal; Lacotte, Jonathan; Pilanci, Mert; Mahoney, Michael W. (January 2021, Advances in neural information processing systems)

In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.
more » « less
Full Text Available
Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

Derezinski, Michal; Bartan, Burak; Pilanci, Mert; Mahoney, Michael W. (January 2020, Conference on Neural Information Processing Systems)
null (Ed.)
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging. Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods. Our technique has two novel components: (1) modifying standard sketching techniques to obtain what we call a surrogate sketch; and (2) carefully scaling the global regularization parameter for local computations. Our surrogate sketches are based on determinantal point processes, a family of distributions for which the bias of an estimate of the inverse Hessian can be computed exactly. Based on this computation, we show that when the objective being minimized is l2-regularized with parameter ! and individual machines are each given a sketch of size m, then to eliminate the bias, local estimates should be computed using a shrunk regularization parameter given by (See PDF), where d(See PDF) is the (See PDF)-effective dimension of the Hessian (or, for quadratic problems, the data matrix).
more » « less
Full Text Available

Search for: All records