NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Attention-Only Transformers via Unrolled Subspace Denoising

Wang, Peng; Lu, Yifu; Yu, Yaodong; Pai, Druv; Qu, Qing; Ma, Yi (May 2025, International Conference on Machine Learning)

Free, publicly-accessible full text available May 31, 2026
Does Worst-Performing Agent Lead the Pack? Analyzing Agent Dynamics in Unified Distributed SGD

Hu, Jie; Ma, Yi-Ting; Eun, Do_Young (December 2024, Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available December 10, 2025
Convergence Acceleration in Wireless Federated Learning: A Stackelberg Game Approach

https://doi.org/10.1109/TVT.2024.3452933

Wang, Kaidi; Ma, Yi; Mashhadi, Mahdi Boloursaz; Foh, Chuan Heng; Tafazolli, Rahim; Ding, Zhi (January 2025, IEEE Transactions on Vehicular Technology)

Free, publicly-accessible full text available January 1, 2026
Discovering Mixtures of Structural Causal Models from Time Series Data

Varambally, Sumanth; Ma, Yi-An; Yu, Rose (July 2024, Proceedings of Machine Learning Research)

Full Text Available
Demystifying SGD with Doubly Stochastic Gradients

Kim, Kyurae; Ko, Joohwan; Ma, Yi-An; Gardner, Jacob R (July 2024, International Conference on Machine Learning (ICML 2024))

Full Text Available
Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kim, Kyurae; Ma, Yi-An; Gardner, Jacob R (May 2024, Conference on Artificial Intelligence and Statistics (AISTATS 2024))

We prove that black-box variational infer- ence (BBVI) with control variates, particularly the sticking-the-landing (STL) estima- tor, converges at a geometric (traditionally called “linear”) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the pro jection onto is computable in O(𝑑) time, where 𝑑 is the dimensionality of the target posterior. We also improve existing analysis on the reg- ular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.
more » « less
Full Text Available
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

Huang, Xunpeng; Zou, Difan; Dong, Hanze; Ma, Yi-An; Zhang, Tong (June 2024, Proceedings of Thirty Seventh Conference on Learning Theory)

Full Text Available
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

Huang, Xunpeng; Zou, Difan; Dong, Hanze; Ma, Yi-An; Zhang, Tong (June 2024, Proceedings of Thirty Seventh Conference on Learning Theory)

Full Text Available
A Global Geometric Analysis of Maximal Coding Rate Reduction

Wang, Peng; Liu, Huikang; Pai, Druv; Yu, Yaodong; Zhu, Zhihui; Qu, Qing; Ma, Yi (June 2024, International Conference on Machine Learning)

The maximal coding rate reduction (MCR2) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape has not been studied. In this work, we give a complete characterization of the properties of all its local and global optima, as well as other types of critical points. Specifically, we show that each (local or global) maximizer of the MCR2 problem corresponds to a low-dimensional, discriminative, and diverse representation, and furthermore, each critical point of the objective is either a local maximizer or a strict saddle point. Such a favorable landscape makes MCR2 a natural choice of objective for learning diverse and discriminative representations via first-order optimization methods. To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets.
more » « less
Full Text Available
A Global Geometric Analysis of Maximal Coding Rate Reduction

Wang, Peng; Liu, Huikang; Pai, Druv; Yu, Yaodong; Zhu, Zhihui; Qu, Qing; Ma, Yi (June 2024, International Conference in Machine Learning (ICML))

Full Text Available

« Prev Next »

Search for: All records