NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Lion secretly solves constrained optimization: As lyapunov predicts

Chen, Lizhang; Liu, Bo; Liang, Kaizhao; Liu, Qiang (January 2024, International Conference of Learning Representations)

Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $$\norm{x}_\infty \leq 1/\lambda$$. Lion achieves this through the incorporation of decoupled weight decay, where $$\lambda$$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$$\phi$$ algorithms, where the $$\text{sign}(\cdot)$$ operator in Lion is replaced by the subgradient of a convex function $$\phi$$, leading to the solution of a general composite optimization problem of $$\min_x f(x) + \phi^*(x)$$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
more » « less
Full Text Available
LEARNING DIFFUSION BRIDGES ON CONSTRAINED DOMAINS

Xingchao Liu, Lemeng Wu (January 2023, international conference on learning representations (ICLR))

Diffusion models have achieved promising results on generative learning recently. However, because diffusion processes are most naturally applied on the uncon- strained Euclidean space Rd, key challenges arise for developing diffusion based models for learning data on constrained and structured domains. We present a simple and unified framework to achieve this that can be easily adopted to various types of domains, including product spaces of any type (be it bounded/unbounded, continuous/discrete, categorical/ordinal, or their mix). In our model, the diffu- sion process is driven by a drift force that is a sum of two terms: one singular force designed by Doob’s h-transform that ensures all outcomes of the process to belong to the desirable domain, and one non-singular neural force field that is trained to make sure the outcome follows the data distribution statistically. Ex- periments show that our methods perform superbly on generating tabular data, images, semantic segments and 3D point clouds. Code is available at https: //github.com/gnobitab/ConstrainedDiffusionBridge.
more » « less
Full Text Available
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu; Chengyue Gong; Qiang Liu (January 2023, International conference on learning representations (ICLR))

We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π0 and π1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from π0 and π1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of π0 and π1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step.
more » « less
Full Text Available
CONTINUAL LEARNING AND PRIVATE UNLEARNING

Bo Liu; Qiang Liu; Peter Stone (January 2022, Advances in neural information processing systems)

As intelligent agents become autonomous over longer periods of time, they may eventually be- come lifelong counterparts to specific people. If so, it may be common for a user to want the agent to master a task temporarily but later on to forget the task due to privacy concerns. How- ever enabling an agent to forget privately what the user specified without degrading the rest of the learned knowledge is a challenging problem. With the aim of addressing this challenge, this paper formalizes this continual learning and private unlearning (CLPU) problem. The pa- per further introduces a straightforward but exactly private solution, CLPU-DER++, as the first step towards solving the CLPU problem, along with a set of carefully designed benchmark prob- lems to evaluate the effectiveness of the proposed solution.
more » « less
Full Text Available
BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bo Liu; Mao Ye; Stephen Wright; Peter Stone; Qiang Liu (January 2022, Advances in neural information processing systems)

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta- learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit dif- ferentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods pro- posed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that de- pends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide a non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.
more » « less
Full Text Available
First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data

Mao Ye; Lemeng Wu; Qiang Liu (January 2022, Advances in neural information processing systems)

We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixed-time diffusion models that terminate at a pre-specified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is natu- rally designed to learn distributions on continuous as well as a range of discrete and structure domains. Moreover, FHDM enables instance-dependent terminate time and accelerates the diffusion process to sample higher quality data with fewer diffusion steps. Technically, we train FHDM by maximum likelihood estimation on diffusion trajectories augmented from observed data with conditional first hitting processes (i.e., bridge) derived based on Doob’s h-transform, deviating from the commonly used time-reversal mechanism. We apply FHDM to generate data in various domains such as point cloud (general continuous distribution), climate and geographical events on earth (continuous distribution on the sphere), unweighted graphs (distribution of binary matrices), and segmentation maps of 2D images (high-dimensional categorical distribution). We observe considerable improvement compared with the state-of-the-art approaches in both quality and speed.
more » « less
Full Text Available
Stein’s method meets computational statistics: a review of some recent developments

https://doi.org/10.1214/22-STS863

Andreas Anastasiou; Alessandro Barp; François-Xavier Briol; Bruno Ebner; Robert E Gaunt; Fatemeh Ghaderinezhad; Jackson Gorham; Arthur Gretton; Christophe Ley; Qiang Liu; et al (January 2022, Statistical science)

Stein’s method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein’s method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments, and in doing so, to stimulate further research into the successful field of Stein’s method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing.
more » « less
Full Text Available
KeepAugment: A Simple Information-Preserving Data Augmentation Approach

Gong, Chengyue; Wang, Dilin; Li, Meng; Chandra, Vikas; Liu, Qiang (January 2021, Conference on Computer Vision and Pattern Recognition (CVPR))

Data augmentation (DA) is an essential technique for training state-of-the-art deep learning systems. In this paper, we empirically show that the standard data augmentation methods may introduce distribution shift and consequently hurt the performance on unaugmented data during inference. To alleviate this issue, we propose a simple yet effective approach, dubbed KeepAugment, to increase the fidelity of augmented images. The idea is to use the saliency map to detect important regions on the original images and preserve these informative regions during augmentation. This information-preserving strategy allows us to generate more faithful training examples. Empirically, we demonstrate that our method significantly improves upon a number of prior art data augmentation schemes, e.g. AutoAugment, Cutout, random erasing, achieving promising results on image classification, semi-supervised image classification, multi-view multi-camera tracking and object detection.
more » « less
Full Text Available
MaxUp: Lightweight Adversarial Training with Data Augmentation Improves Neural Network Training

Gong, Chengyue; Ren, Tongzheng; Ye, Mao; Liu, Qiang (January 2021, Advances in computer vision and pattern recognition)

We propose MaxUp, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, MaxUp is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test MaxUp on a range of tasks, including image classification, language modeling, and adversarial certification, on which MaxUp consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy 85.5% without extra data to 85.8%. Code will be released soon.
more » « less
Full Text Available
Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Liu, Xingchao; Ye, Mao; Zhou, Dengyong; Liu, Qiang (January 2021, AAAI Conference on Artificial Intelligence (AAAI))

We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
more » « less
Full Text Available

« Prev Next »

Search for: All records