NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improved Sample Complexity Bounds for Diffusion Model Training

Gupta, S; Parulekar, A; Price, E; Xun, Z (December 2024, https://doi.org/10.48550/arXiv.2311.13745)

Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a number of recent works~\cite{chen2022,chen2022improved,benton2023linear} have studied the iteration complexity of sampling, assuming access to an accurate diffusion model. In this work, we focus on understanding the \emph{sample complexity} of training such a model; how many samples are needed to learn an accurate diffusion model using a sufficiently expressive neural network? Prior work~\cite{BMR20} showed bounds polynomial in the dimension, desired Total Variation error, and Wasserstein error. We show an \emph{exponential improvement} in the dependence on Wasserstein error and depth, along with improved dependencies on other relevant parameters.
more » « less
Free, publicly-accessible full text available December 9, 2025
In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

Collins, L; Parulekar, A; Mokhtari, A; Sanghavi, S; Shakkottai, S (May 2024, https://doi.org/10.48550/arXiv.2402.11639)

A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such, that learner must adapt to the context without additional training. We explore the role of softmax attention in an ICL setting where each context encodes a regression task. We show that an attention unit learns a window that it uses to implement a nearest-neighbors predictor adapted to the landscape of the pretraining tasks. Specifically, we show that this window widens with decreasing Lipschitzness and increasing label noise in the pretraining tasks. We also show that on low-rank, linear problems, the attention unit learns to project onto the appropriate subspace before inference. Further, we show that this adaptivity relies crucially on the softmax activation and thus cannot be replicated by the linear activation often studied in prior theoretical analyses.
more » « less
Full Text Available
Diffusion Posterior Sampling is Computationally Intractable

Gupta, S; Jalal, A; Parulekar, A; Price, E; Xun, Z (February 2024, https://doi.org/10.48550/arXiv.2402.12727)

Full Text Available
Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Vial, D.; Parulekar, A; Shakkottai, S.; Srikant, R. (January 2022, ICML)

Full Text Available

Search for: All records