NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Connecting Pre-trained Language Model and Downstream Task via Properties of Representation

Wu, C; Lee, H; Ge, R (December 2023, NeurIPS 2023)

Full Text Available
Do Transformers Parse while Predicting the Masked Word?

https://doi.org/10.18653/v1/2023.emnlp-main.1029

Zhao, H; Panigrahi, A; Ge, R; Arora, S (November 2023, EMNLP 23)

Full Text Available
Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

Li, S; Cheng, Y; Diakonikolas, I; Diakonikolas, J; Ge, R; Wright, S (December 2023, NeurIPS 2023)

Full Text Available
Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

Li, S; Chen, Y; DIakonikolas, I; Diakonikolas, J; Ge, R; Wright, SJ (December 2023, NeurIPS)

Full Text Available
Depth-Separation with Multilayer Mean-Field Networks

Ren, Y.; Zhou, M.; Ge, R. (January 2023, International Conference on Learning Representations)

Mean-field limit has been successfully applied to neural networks, leading to many results in optimizing overparametrized networks. However, existing works often focus on two-layer networks and/or require large number of neurons. We give a new framework for extending the mean-field limit to multilayer network, and show that a polynomial-size three-layer network in our framework can learn the function constructed by Safran et al. (2019) – which is known to be not approximable by any two-layer networks
more » « less
Full Text Available
Outlier-Robust Sparse Estimation via Non-Convex Optimization

Cheng, Y.; Diakonikolas, I.; Ge, R.; Gupta, S.; Kane, D.; Soltanolkotabi, M. (December 2022, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
4. Online Algorithms with Multiple Predictions

Anand, K.; Ge, R.; Kumar, A.; Panigrahi, D. (July 2022, The Thirty-ninth International Conference on Machine Learning (ICML 2022))

A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances. These papers treat the ML algorithm as a blackbox, and redesign online algorithms to take advantage of ML predictions. In this paper, we ask the complementary question: can we redesign ML algorithms to provide better predictions for online algorithms? We explore this question in the context of the classic rent-or-buy problem, and show that incorporating optimization benchmarks directly in ML loss functions leads to significantly better performance, while maintaining a worst-case adversarial result when the advice is completely wrong. We support this finding both through theoretical bounds and numerical simulations, and posit that “learning for optimization” is a fertile area for future research.
more » « less
Full Text Available
Towards Understanding the Data Dependency of Mixup-style Training.

Chidambaram, M.; Wang, X.; Hu, Y.; Wu, C.; Ge, R. (July 2022, The Tenth International Conference on Learning Representations)

In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixup-optimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.
more » « less
Full Text Available
Understanding Deflation Process in Over-parametrized Tensor Decomposition

Ge, R; Ren, Y.; Wang, X.; Zhou, M. (January 2021, Thirty-fifth Conference on Neural Information Processing Systems)

In this paper we study the training dynamics for gradient flow on overparametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor deflation process that is commonly used in tensor decomposition algorithms. We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components. Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors.
more » « less
Full Text Available

Search for: All records