We present a mathematical analysis of a non-convex energy landscape for robust subspace recovery. We prove that an underlying subspace is the only stationary point and local minimizer in a specified neighborhood under a deterministic condition on a dataset. If
the deterministic condition is satisfied, we further show that a geodesic gradient descent
method over the Grassmannian manifold can exactly recover the underlying subspace when the method is properly initialized. Proper initialization by principal component analysis is guaranteed with a simple deterministic condition. Under slightly stronger assumptions, the gradient descent method with a piecewise constant step-size scheme achieves linear convergence. The practicality of the deterministic condition is demonstrated on some statistical models of data, and the method achieves almost state-of-the-art recovery guarantees on the Haystack Model for different regimes of sample size and ambient dimension. In particular, when the ambient dimension is fixed and the sample size is large enough, we show that our gradient method can exactly recover the underlying subspace for any fixed fraction of outliers (less than 1).
more »
« less
Global Guarantees for Blind Demodulation with Generative Priors
We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication. We consider the case where the unknown vectors are in the range of known deep generative models, G(1):R^n→R^l and G(2):R^p→R^l. In the case when the networks corresponding to the generative models are expansive, the weight matrices are random and the dimension of the unknown vectors satisfy l=Omega(n^2+p^2), up to log factors, we show that the empirical risk objective has a favorable landscape for optimization. That is, the objective function has a descent direction at every point outside of a small neighborhood around four hyperbolic curves. We also characterize the local maximizers of the empirical risk objective and, hence, show that there does not exist any other stationary points outside of these neighborhood around four hyperbolic curves and the set of local maximizers. We also implement a gradient descent scheme inspired by the geometry of the landscape of the objective function. In order to converge to a global minimizer, this gradient descent scheme exploits the fact that exactly one of the hyperbolic curve corresponds to the global minimizer, and thus points near this hyperbolic curve have a lower objective value than points close to the other spurious hyperbolic curves. We show that this gradient descent scheme can effectively remove distortions synthetically introduced to the MNIST dataset.
more »
« less
- Award ID(s):
- 1848087
- NSF-PAR ID:
- 10157457
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- Volume:
- 32
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Curves play a fundamental role across computer graphics, physical simulation, and mathematical visualization, yet most tools for curve design do nothing to prevent crossings or self-intersections. This paper develops efficient algorithms for (self-)repulsion of plane and space curves that are well-suited to problems in computational design. Our starting point is the so-called tangent-point energy, which provides an infinite barrier to self-intersection. In contrast to local collision detection strategies used in, e.g., physical simulation, this energy considers interactions between all pairs of points, and is hence useful for global shape optimization: local minima tend to be aesthetically pleasing, physically valid, and nicely distributed in space. A reformulation of gradient descent, based on a Sobolev-Slobodeckij inner product enables us to make rapid progress toward local minima---independent of curve resolution. We also develop a hierarchical multigrid scheme that significantly reduces the per-step cost of optimization. The energy is easily integrated with a variety of constraints and penalties (e.g., inextensibility, or obstacle avoidance), which we use for applications including curve packing, knot untangling, graph embedding, non-crossing spline interpolation, flow visualization, and robotic path planning.more » « less
-
We develop a framework for reconstructing images that are sparse in an appropriate transform domain from polychromatic computed tomography (CT) measurements under the blind scenario where the material of the inspected object and incident-energy spectrum are unknown. Assuming that the object that we wish to reconstruct consists of a single material, we obtain a parsimonious measurement-model parameterization by changing the integral variable from photon energy to mass attenuation, which allows us to combine the variations brought by the unknown incident spectrum and mass attenuation into a single unknown mass-attenuation spectrum function; the resulting measurement equation has the Laplace-integral form. The mass-attenuation spectrum is then expanded into basis functions using B splines of order one. We consider a Poisson noise model and establish conditions for biconvexity of the corresponding negative log-likelihood (NLL) function with respect to the density-map and mass-attenuation spectrum parameters. We derive a block-coordinate descent algorithm for constrained minimization of a penalized NLL objective function, where penalty terms ensure nonnegativity of the mass-attenuation spline coefficients and nonnegativity and gradient-map sparsity of the density-map image, imposed using a convex total-variation (TV) norm; the resulting objective function is biconvex. This algorithm alternates between a Nesterov’s proximal-gradient (NPG) step and a limited-memory Broyden-Fletcher-Goldfarb-Shanno with box constraints (L-BFGS-B) iteration for updating the image and mass-attenuation spectrum parameters, respectively. We prove the Kurdyka-Łojasiewicz property of the objective function, which is important for establishing local convergence of block-coordinate descent schemes in biconvex optimization problems. Our framework applies to other NLLs and signal-sparsity penalties, such as lognormal NLL and ℓ₁ norm of 2D discrete wavelet transform (DWT) image coefficients. Numerical experiments with simulated and real X-ray CT data demonstrate the performance of the proposed scheme.more » « less
-
Abstract Given $n$ general points $p_1, p_2, \ldots , p_n \in{\mathbb{P}}^r$ it is natural to ask whether there is a curve of given degree $d$ and genus $g$ passing through them; by counting dimensions a natural conjecture is that such a curve exists if and only if $$\begin{equation*}n \leq \left\lfloor \frac{(r + 1)d - (r - 3)(g - 1)}{r - 1}\right\rfloor.\end{equation*}$$The case of curves with nonspecial hyperplane section was recently studied in [2], where the above conjecture was shown to hold with exactly three exceptions. In this paper, we prove a “bounded-error analog” for special linear series on general curves; more precisely we show that existence of such a curve subject to the stronger inequality $$\begin{equation*}n \leq \left\lfloor \frac{(r + 1)d - (r - 3)(g - 1)}{r - 1}\right\rfloor - 3.\end{equation*}$$Note that the $-3$ cannot be replaced with $-2$ without introducing exceptions (as a canonical curve in ${\mathbb{P}}^3$ can only pass through nine general points, while a naive dimension count predicts twelve). We also use the same technique to prove that the twist of the normal bundle $N_C(-1)$ satisfies interpolation for curves whose degree is sufficiently large relative to their genus, and deduce from this that the number of general points contained in the hyperplane section of a general curve is at least $$\begin{equation*}\min\left(d, \frac{(r - 1)^2 d - (r - 2)^2 g - (2r^2 - 5r + 12)}{(r - 2)^2}\right).\end{equation*}$$ As explained in [7], these results play a key role in the author’s proof of the maximal rank conjecture [9].more » « less
-
Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than n (sample size) neurons when the activation is smooth. First, we prove that as long as the width m>=2n=d (where d is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice region. Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. It is expected that projected gradient methods converge to KKT points under mild technical conditions, but we leave the rigorous convergence analysis to future work. Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets.more » « less