Search for: All records

Creators/Authors contains: "Mahoney, Michael W."

« Prev Next »

Total Resources

13

Resource Type
Conference Paper

7

Conference Proceeding

1

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

12

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

Long, Da ; Xing, Wei ; Krishnapriyan, Aditi S ; Kirby, Robert M ; Zhe, Shandian ; Mahoney, Michael W ( March 2024 , Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS))

Free, publicly-accessible full text available March 8, 2025
Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching

https://doi.org/10.3150/22-BEJ1463

Lopes, Miles E. ; Erichson, N. Benjamin ; Mahoney, Michael W. ( February 2023 , Bernoulli)

Full Text Available
Flow-Based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance

https://doi.org/10.1137/20m1333055

Fountoulakis, Kimon ; Liu, Meng ; Gleich, David F. ; Mahoney, Michael W. ( February 2023 , SIAM Review)

Full Text Available
Inexact Newton-CG algorithms with complexity guarantees

https://doi.org/10.1093/imanum/drac043

Yao, Zhewei ; Xu, Peng ; Roosta, Fred ; Wright, Stephen J ; Mahoney, Michael W ( August 2022 , IMA Journal of Numerical Analysis)

Abstract
We consider variants of a recently developed Newton-CG algorithm for nonconvex problems (Royer, C. W. & Wright, S. J. (2018) Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim., 28, 1448–1477) in which inexact estimates of the gradient and the Hessian information are used for various steps. Under certain conditions on the inexactness measures, we derive iteration complexity bounds for achieving $\epsilon $-approximate second-order optimality that match best-known lower bounds. Our inexactness condition on the gradient is adaptive, allowing for crude accuracy in regions with large gradients. We describe two variants of our approach, one in which the step size along the computed search direction is chosen adaptively, and another in which the step size is pre-defined. To obtain second-order optimality, our algorithms will make use of a negative curvature direction on some steps. These directions can be obtained, with high probability, using the randomized Lanczos algorithm. In this sense, all of our results hold with high probability over the run of the algorithm. We evaluate the performance of our proposed algorithms empirically on several machine learning models. Our approach is a first attempt to introduce inexact Hessian and/or gradient information into the Newton-CG algorithm of Royer & Wright (2018, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim., 28, 1448–1477).

more » « less
Full Text Available
Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

Ma, Ping ; Chen, Yongkai ; Zhang, Xinlian ; Xing, Xin ; Ma, Jingyi ; Mahoney, Michael W ( June 2022 , Journal of machine learning research)

Full Text Available
Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism

https://doi.org/10.1145/3447548.3467080

Gupta, Vipul ; Choudhary, Dhruv ; Tang, Peter ; Wei, Xiaohan ; Wang, Xing ; Huang, Yuzhen ; Kejariwal, Arun ; Ramchandran, Kannan ; Mahoney, Michael W. ( August 2021 , KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)

In this paper, we consider hybrid parallelism—a paradigm that em- ploys both Data Parallelism (DP) and Model Parallelism (MP)—to scale distributed training of large recommendation models. We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training. DCT filters the entities to be communicated across the network through a simple hard-thresholding function, allowing only the most relevant information to pass through. For communication efficient DP, DCT compresses the parameter gradients sent to the parameter server during model synchronization. The threshold is updated only once every few thousand iterations to reduce the computational overhead of compression. For communication efficient MP, DCT incorporates a novel technique to compress the activations and gradients sent across the network during the forward and backward propagation, respectively. This is done by identifying and updating only the most relevant neurons of the neural network for each training sample in the data. We evaluate DCT on publicly available natural language processing and recommender models and datasets, as well as recommendation systems used in production at Facebook. DCT reduces communication by at least 100× and 20× during DP and MP, respectively. The algorithm has been deployed in production, and it improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.
more » « less
Full Text Available
Good Classifiers are Abundant in the Interpolating Regime

Theisen, Ryan ; Klusowski, Jason M ; Mahoney, Michael W ( January 2021 , International Conference on Artificial Intelligence and Statistics)
null (Ed.)
Full Text Available
Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Up-date

Derezinski, Michal ; Lacotte, Jonathan ; Pilanci, Mert ; Mahoney, Michael W. ( January 2021 , Advances in neural information processing systems)

In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.
more » « less
Full Text Available
OverSketched Newton: Fast Convex Optimization for Serverless Systems

https://doi.org/10.1109/BigData50022.2020.9378289

Gupta, Vipul ; Kadhe, Swanand ; Courtade, Thomas ; Mahoney, Michael W. ; Ramchandran, Kannan ( December 2020 , 2020 IEEE International Conference on Big Data (Big Data))
null (Ed.)
Full Text Available
Error Estimation for Sketched SVD via the Bootstrap

Lopes, Miles E ; Erichson, Benjamin ; Mahoney, Michael W ( July 2020 , International Conference on Machine Learning)

Full Text Available

« Prev Next »