skip to main content


Title: Inverting Deep Generative models, One layer at a time.
We study the problem of inverting a deep generative model with ReLU activations. Inversion corresponds to finding a latent code vector that explains observed measurements as much as possible. In most prior works this is performed by attempting to solve a non-convex optimization problem involving the generator. In this paper we obtain several novel theoretical results for the inversion problem. We show that for the realizable case, single layer inversion can be performed exactly in polynomial time, by solving a linear program. Further, we show that for multiple layers, inversion is NP-hard and the pre-image set can be non-convex. For generative models of arbitrary depth, we show that exact recovery is possible in polynomial time with high probability, if the layers are expanding and the weights are randomly selected. Very recent work analyzed the same problem for gradient descent inversion. Their analysis requires significantly higher expansion (logarithmic in the latent dimension) while our proposed algorithm can provably reconstruct even with constant factor expansion. We also provide provable error bounds for different norms for reconstructing noisy observations. Our empirical validation demonstrates that we obtain better reconstructions when the latent dimension is large.  more » « less
Award ID(s):
1901281
NSF-PAR ID:
10176162
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
32
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hazay, Carmit ; Stam, Martijn (Ed.)
    We study the computational problem of finding a shortest non-zero vector in a rotation of ℀𝑛 , which we call β„€ SVP. It has been a long-standing open problem to determine if a polynomial-time algorithm for β„€ SVP exists, and there is by now a beautiful line of work showing how to solve it efficiently in certain very special cases. However, despite all of this work, the fastest known algorithm that is proven to solve β„€ SVP is still simply the fastest known algorithm for solving SVP (i.e., the problem of finding shortest non-zero vectors in arbitrary lattices), which runs in 2𝑛+π‘œ(𝑛) time. We therefore set aside the (perhaps impossible) goal of finding an efficient algorithm for β„€ SVP and instead ask what else we can say about the problem. E.g., can we find any non-trivial speedup over the best known SVP algorithm? And, if β„€ SVP actually is hard, then what consequences would follow? Our results are as follows. We show that β„€ SVP is in a certain sense strictly easier than SVP on arbitrary lattices. In particular, we show how to reduce β„€ SVP to an approximate version of SVP in the same dimension (in fact, even to approximate unique SVP, for any constant approximation factor). Such a reduction seems very unlikely to work for SVP itself, so we view this as a qualitative separation of β„€ SVP from SVP. As a consequence of this reduction, we obtain a 2𝑛/2+π‘œ(𝑛) -time algorithm for β„€ SVP, i.e., the first non-trivial speedup over the best known algorithm for SVP on general lattices. (In fact, this reduction works for a more general class of latticesβ€”semi-stable lattices with not-too-large πœ†1 .) We show a simple public-key encryption scheme that is secure if (an appropriate variant of) β„€ SVP is actually hard. Specifically, our scheme is secure if it is difficult to distinguish (in the worst case) a rotation of ℀𝑛 from either a lattice with all non-zero vectors longer than 𝑛/logπ‘›β€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβˆš or a lattice with smoothing parameter significantly smaller than the smoothing parameter of ℀𝑛 . The latter result has an interesting qualitative connection with reverse Minkowski theorems, which in some sense say that β€œβ„€π‘› has the largest smoothing parameter.” We show a distribution of bases 𝐁 for rotations of ℀𝑛 such that, if β„€ SVP is hard for any input basis, then β„€ SVP is hard on input 𝐁 . This gives a satisfying theoretical resolution to the problem of sampling hard bases for ℀𝑛 , which was studied by Blanks and Miller [9]. This worst-case to average-case reduction is also crucially used in the analysis of our encryption scheme. (In recent independent work that appeared as a preprint before this work, Ducas and van Woerden showed essentially the same thing for general lattices [15], and they also used this to analyze the security of a public-key encryption scheme. Similar ideas also appeared in [5, 11, 20] in different contexts.) We perform experiments to determine how practical basis reduction performs on bases of ℀𝑛 that are generated in different ways and how heuristic sieving algorithms perform on ℀𝑛 . Our basis reduction experiments complement and add to those performed by Blanks and Miller, as we work with a larger class of algorithms (i.e., larger block sizes) and study the β€œprovably hard” distribution of bases described above. Our sieving experiments confirm that heuristic sieving algorithms perform as expected on ℀𝑛 . 
    more » « less
  2. We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions Ο€0 and Ο€1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from Ο€0 and Ο€1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of Ο€0 and Ο€1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step. 
    more » « less
  3. We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers. A recent line of work has provided sophisticated polynomial-time algorithms for this problem with dimension-independent error guarantees for a range of natural distribution families. In this work, we show that a natural non-convex formulation of the problem can be solved directly by gradient descent. Our approach leverages a novel structural lemma, roughly showing that any approximate stationary point of our non-convex objective gives a near-optimal solution to the underlying robust estimation task. Our work establishes an intriguing connection between algorithmic high-dimensional robust statistics and non-convex optimization, which may have broader applications to other robust estimation tasks. 
    more » « less
  4. We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an `2 norm regularized convex program. We then show that multi-layer circular CNN training problems with a single ReLU layer are equivalent to an `1 regularized convex program that encourages sparsity in the spectral domain. We also extend these results to three-layer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers. 
    more » « less
  5. The Random Dot Product Graph (RDPG) is a popular generative graph model for relational data. RDPGs postulate there exist latent positions for each node, and specifies the edge formation probabilities via the inner product of the corresponding latent vectors. The embedding task of estimating these latent positions from observed graphs is usually posed as a non-convex matrix factorization problem. The workhorse Adjacency Spectral Embedding offers an approximate solution obtained via the eigendecomposition of the adjacency matrix, which enjoys solid statistical guarantees but can be computationally intensive and is formally solving a surrogate problem. In this paper, we bring to bear recent non-convex optimization advances and demonstrate their impact to RDPG inference. We develop first-order gradient descent methods to better solve the original optimization problem, and to accommodate broader network embedding applications in an organic way. The effectiveness of the resulting graph representation learning framework is demonstrated on both synthetic and real data. We show the algorithms are scalable, robust to missing network data, and can track the latent positions over time when the graphs are acquired in a streaming fashion. 
    more » « less