skip to main content


Title: Global Guarantees for Blind Demodulation with Generative Priors
We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication. We consider the case where the unknown vectors are in the range of known deep generative models, G(1):R^n→R^l and G(2):R^p→R^l. In the case when the networks corresponding to the generative models are expansive, the weight matrices are random and the dimension of the unknown vectors satisfy l=Omega(n^2+p^2), up to log factors, we show that the empirical risk objective has a favorable landscape for optimization. That is, the objective function has a descent direction at every point outside of a small neighborhood around four hyperbolic curves. We also characterize the local maximizers of the empirical risk objective and, hence, show that there does not exist any other stationary points outside of these neighborhood around four hyperbolic curves and the set of local maximizers. We also implement a gradient descent scheme inspired by the geometry of the landscape of the objective function. In order to converge to a global minimizer, this gradient descent scheme exploits the fact that exactly one of the hyperbolic curve corresponds to the global minimizer, and thus points near this hyperbolic curve have a lower objective value than points close to the other spurious hyperbolic curves. We show that this gradient descent scheme can effectively remove distortions synthetically introduced to the MNIST dataset.  more » « less
Award ID(s):
1848087
PAR ID:
10157457
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
32
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a mathematical analysis of a non-convex energy landscape for robust subspace recovery. We prove that an underlying subspace is the only stationary point and local minimizer in a specified neighborhood under a deterministic condition on a dataset. If the deterministic condition is satisfied, we further show that a geodesic gradient descent method over the Grassmannian manifold can exactly recover the underlying subspace when the method is properly initialized. Proper initialization by principal component analysis is guaranteed with a simple deterministic condition. Under slightly stronger assumptions, the gradient descent method with a piecewise constant step-size scheme achieves linear convergence. The practicality of the deterministic condition is demonstrated on some statistical models of data, and the method achieves almost state-of-the-art recovery guarantees on the Haystack Model for different regimes of sample size and ambient dimension. In particular, when the ambient dimension is fixed and the sample size is large enough, we show that our gradient method can exactly recover the underlying subspace for any fixed fraction of outliers (less than 1). 
    more » « less
  2. Abstract Given $n$ general points $p_1, p_2, \ldots , p_n \in{\mathbb{P}}^r$ it is natural to ask whether there is a curve of given degree $d$ and genus $g$ passing through them; by counting dimensions a natural conjecture is that such a curve exists if and only if $$\begin{equation*}n \leq \left\lfloor \frac{(r + 1)d - (r - 3)(g - 1)}{r - 1}\right\rfloor.\end{equation*}$$The case of curves with nonspecial hyperplane section was recently studied in [2], where the above conjecture was shown to hold with exactly three exceptions. In this paper, we prove a “bounded-error analog” for special linear series on general curves; more precisely we show that existence of such a curve subject to the stronger inequality $$\begin{equation*}n \leq \left\lfloor \frac{(r + 1)d - (r - 3)(g - 1)}{r - 1}\right\rfloor - 3.\end{equation*}$$Note that the $-3$ cannot be replaced with $-2$ without introducing exceptions (as a canonical curve in ${\mathbb{P}}^3$ can only pass through nine general points, while a naive dimension count predicts twelve). We also use the same technique to prove that the twist of the normal bundle $N_C(-1)$ satisfies interpolation for curves whose degree is sufficiently large relative to their genus, and deduce from this that the number of general points contained in the hyperplane section of a general curve is at least $$\begin{equation*}\min\left(d, \frac{(r - 1)^2 d - (r - 2)^2 g - (2r^2 - 5r + 12)}{(r - 2)^2}\right).\end{equation*}$$ As explained in [7], these results play a key role in the author’s proof of the maximal rank conjecture [9]. 
    more » « less
  3. null (Ed.)
    We experimentally examine how gradient descent navigates the landscape of matrix factorization to obtain a global minimum. First, we review the critical points of matrix factorization and introduce a balanced factorization. By focusing on the balanced critical point at the origin and a subspace of unbalanced critical points, we study the effect of balance on gradient descent, including an initially unbalanced factorization and adding a balance-regularizer to the objective in the MF problem. Simulations demonstrate that maintaining a balanced factorization enables faster escape from saddle points and overall faster convergence to a global minimum. 
    more » « less
  4. We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally \emph{rank deficient}, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be \emph{overparameterized} with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$. 
    more » « less
  5. Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an l-th order tensor in (Rd)⊗l of rank r (where r≪d), can variants of gradient descent find a rank m decomposition where m>r? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least m=Ω(dl−1), while a variant of gradient descent can find an approximate tensor when m=O∗(r2.5llogd). Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. 
    more » « less