NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

Zhang, Xuechen; Li, Mingchen; Chen, Jiasi; Thrampoulidis, Christos; Oymak, Samet (February 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Full Text Available
Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation

https://doi.org/10.1109/TIT.2023.3320098

Wang, Ke; Muthukumar, Vidya; Thrampoulidis, Christos (December 2023, IEEE Transactions on Information Theory)

Full Text Available
On the Role of Attention in Prompt-tuning

Oymak, Samet; Rawat, Ankit Singh; Soltanolkotabi, Mahdi; Thrampoulidis, Christos (July 2023, Proceedings of the 40th International Conference on Machine Learning)

Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set. We isolate the role of prompttuning through a self-contained prompt-attention model. Our contributions are as follows: (1) We show that softmax-prompt-attention is provably more expressive than softmax-self-attention and linear-prompt-attention under our contextual data model. (2) We analyze the initial trajectory of gradient descent and show that it learns the prompt and prediction head with near-optimal sample complexity and demonstrate how the prompt can provably attend to sparse context-relevant tokens. (3) Assuming a known prompt but an unknown prediction head, we characterize the exact finite sample performance of prompt-attention which reveals the fundamental performance limits and the precise benefit of the context information. We also provide experiments that verify our theoretical insights on real datasets and demonstrate how prompt-tuning enables the model to attend to context-relevant information.
more » « less
Sharp global convergence guarantees for iterative nonconvex optimization with random data

https://doi.org/10.1214/22-AOS2246

Chandrasekher, Kabir Aladin; Pananjady, Ashwin; Thrampoulidis, Christos (February 2023, The Annals of Statistics)

Full Text Available
Asymptotic Behavior of Adversarial Training in Binary Linear Classification

https://doi.org/10.1109/TNNLS.2023.3290592

Taheri, Hossein; Pedarsani, Ramtin; Thrampoulidis, Christos (January 2023, IEEE Transactions on Neural Networks and Learning Systems)

Full Text Available
Asymptotic Behavior of Adversarial Training in Binary Linear Classification

https://doi.org/10.1109/ISIT50566.2022.9834717

Taheri, Hossein; Pedarsani, Ramtin; Thrampoulidis, Christos (June 2022, 2022 IEEE International Symposium on Information Theory (ISIT))

Full Text Available
FedNest: Federated bilevel, minimax, and compositional optimization

Tarzanagh, Davoud Ataee; Li, Mingchen; Thrampoulidis, Christos; Oymak, Samet (July 2022, International Conference on Machine Learning)

Full Text Available
FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization

Tarzanagh, Davoud Ataee; Li, Mingchen; Thrampoulidis, Christos; Oymak, Samet (January 2022, Proceedings of the 39th International Conference on Machine Learning)

Standard federated optimization methods successfully apply to stochastic problems with singlelevel structure. However, many contemporary ML problems – including adversarial robustness, hyperparameter tuning, actor-critic – fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose FEDNEST: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for FEDNEST in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. FEDNEST introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter & hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice.
more » « less
Full Text Available
Multi-Environment Meta-Learning in Stochastic Linear Bandits

https://doi.org/10.1109/ISIT50566.2022.9834636

Moradipari, Ahmadreza; Ghavamzadeh, Mohammad; Rajabzadeh, Taha; Thrampoulidis, Christos; Alizadeh, Mahnoosh (January 2022, 2022 IEEE International Symposium on Information Theory (ISIT))

Full Text Available
Sharp Guarantees and Optimal Performance for Inference in Binary and Gaussian-Mixture Models

https://doi.org/10.3390/e23020178

Taheri, Hossein; Pedarsani, Ramtin; Thrampoulidis, Christos (February 2021, Entropy)
null (Ed.)
We study convex empirical risk minimization for high-dimensional inference in binary linear classification under both discriminative binary linear models, as well as generative Gaussian-mixture models. Our first result sharply predicts the statistical performance of such estimators in the proportional asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit to prove bounds on the best achievable performance. Notably, we show that the proposed bounds are tight for popular binary models (such as signed and logistic) and for the Gaussian-mixture model by constructing appropriate loss functions that achieve it. Our numerical simulations suggest that the theory is accurate even for relatively small problem dimensions and that it enjoys a certain universality property.
more » « less
Full Text Available

« Prev Next »

Search for: All records