skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Patel, Vivak"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Gradient descent (GD) is a collection of continuous optimization methods that have achieved immeasurable success in practice. Owing to data science applications, GD with diminishing step sizes has become a prominent variant. While this variant of GD has been well studied in the literature for objectives with globally Lipschitz continuous gradients or by requiring bounded iterates, objectives from data science problems do not satisfy such assumptions. Thus, in this work, we provide a novel global convergence analysis of GD with diminishing step sizes for differentiable nonconvex functions whose gradients are only locally Lipschitz continuous. Through our analysis, we generalize what is known about gradient descent with diminishing step sizes, including interesting topological facts, and we elucidate the varied behaviors that can occur in the previously overlooked divergence regime. Thus, we provide a general global convergence analysis of GD with diminishing step sizes under realistic conditions for data science problems. 
    more » « less
    Free, publicly-accessible full text available September 30, 2025
  2. Abstract In large-scale applications including medical imaging, collocation differential equation solvers, and estimation with differential privacy, the underlying linear inverse problem can be reformulated as a streaming problem. In theory, the streaming problem can be effectively solved using memory-efficient, exponentially-converging streaming solvers. In special cases when the underlying linear inverse problem is finite-dimensional, streaming solvers can periodically evaluate the residual norm at a substantial computational cost. When the underlying system is infinite dimensional, streaming solver can only access noisy estimates of the residual. While such noisy estimates are computationally efficient, they are useful only when their accuracy is known. In this work, we rigorously develop a general family of computationally-practical residual estimators and their uncertainty sets for streaming solvers, and we demonstrate the accuracy of our methods on a number of large-scale linear problems. Thus, we further enable the practical use of streaming solvers for important classes of linear inverse problems. 
    more » « less
  3. Stochastic Gradient Descent (SGD) is a widely used, foundational algorithm in data science and machine learning. As a result, analyses of SGD abound making use of a variety of assumptions, especially on the noise behavior of the stochastic gradients. While recent works have achieved a high-degree of generality on assumptions about the noise behavior of the stochastic gradients, it is unclear that such generality is necessary. In this work, we construct a simple example that shows that less general assumptions will be violated, while the most general assumptions will hold. 
    more » « less
  4. In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we demonstrate the restrictiveness of these assumptions using three canonical models in machine learning. Then, we develop novel theory to address this shortcoming in two ways. First, we establish that SGD’s iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability. 
    more » « less
  5. null (Ed.)