NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Posterior Computation with the Gibbs Zig-Zag Sampler

https://doi.org/10.1214/22-BA1319

Sachs, Matthias; Sen, Deborshee; Lu, Jianfeng; Dunson, David (September 2023, Bayesian Analysis)

Full Text Available
Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Chidambaram, Muthu; Wang, Xiang; Wu, Chenwei; Ge, Rong (January 2023, International Conference on Machine Learning)

Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. We focus our attention on classification problems in which each class may have multiple associated features (or views) that can be used to predict the class correctly. Our main theoretical results demonstrate that, for a non-trivial class of data distributions with two features per class, training a 2-layer convolutional network using empirical risk minimization can lead to learning only one feature for almost all classes while training with a specific instantiation of Mixup succeeds in learning both features for every class. We also show empirically that these theoretical insights extend to the practical settings of image benchmarks modified to have additional synthetic features.
more » « less
Full Text Available
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Zhou, Mo; Ge, Rong (January 2023, International Conference on Machine Learning)

In deep learning, often the training process nds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often characterized by minimizing certain norms. However, even for a simple sparse linear regression problem y = Ax+ noise with sparse x , neither minimum l_1 orl_`2 norm interpolator gives the optimal test loss. In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of l_1 and l_2 interpolators. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss. Our result is based on careful analysis of the training dynamics and provides another example of implicit regularization effect that goes beyond norm minimization.
more » « less
Full Text Available
Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Chidambaram, Muthu; Wu, Chenwei; Cheng, Yu; Ge, Rong (January 2023, International Conference on Machine Learning)

Sparse coding refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary. Sparse coding has proven to be a successful and interpretable approach in many applications, such as signal processing, computer vision, and medical imaging. While this success has spurred much work on sparse coding with provable guarantees, work on the setting where the learned dictionary is larger (or over-realized) with respect to the ground truth is comparatively nascent. Existing theoretical results in the over-realized regime are limited to the case of noise-less data. In this paper, we show that for over-realized sparse coding in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the ground-truth dictionary, regardless of the magnitude of the signal in the data-generating process. Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective and we prove that minimizing this new objective can recover the ground-truth dictionary. We corroborate our theoretical results with experiments across several parameter regimes, showing that our proposed objective enjoys better empirical performance than the standard reconstruction objective.
more » « less
Full Text Available
FasterRisk: Fast and Accurate Interpretable Risk Scores

Liu, Jiachang; Zhong, Chudi; Li, Boxuan; Seltzer, Margo; Rudin, Cynthia (December 2022, Advances in Neural Information Processing Systems)

Full Text Available
Gibbs posterior convergence and the thermodynamic formalism

https://doi.org/10.1214/21-AAP1685

McGoff, Kevin; Mukherjee, Sayan; Nobel, Andrew B. (February 2022, The Annals of Applied Probability)

Full Text Available
Subspace Clustering through Sub-Clusters

Weiwei Li, Jan Hannig (January 2021, Journal of machine learning research)
null (Ed.)
Full Text Available
Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Wang, Xiang and (January 2021, Proceedings of the 38th International Conference on Machine Learning)
null (Ed.)
Full Text Available
Efficient sampling from the Bingham distribution

Rong Ge, Holden Lee (January 2021, Algorithmic Learning Theory)
null (Ed.)
Full Text Available
Random Coordinate Underdamped Langevin Monte Carlo

Ding, Zhiyan; Li, Qin; Lu, Jianfeng; Wright, Stephen J. (January 2021, Proceedings of the 24th International Conference on Artifi- cial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records