We present a mathematical analysis of a nonconvex energy landscape for robust subspace recovery. We prove that an underlying subspace is the only stationary point and local minimizer in a specified neighborhood under a deterministic condition on a dataset. If
the deterministic condition is satisfied, we further show that a geodesic gradient descent
method over the Grassmannian manifold can exactly recover the underlying subspace when the method is properly initialized. Proper initialization by principal component analysis is guaranteed with a simple deterministic condition. Under slightly stronger assumptions, the gradient descent method with a piecewise constant stepsize scheme achieves linear convergence. The practicality of the deterministic condition is demonstrated on some statistical models of data, and the method achieves almost stateoftheart recovery guarantees on the Haystack Model for different regimes of sample size and ambient dimension. In particular, when the ambient dimension is fixed and the sample size is large enough, we show that our gradient method can exactly recover the underlying subspace for any fixed fraction of outliers (less than 1).
more »
« less
Global Guarantees for Blind Demodulation with Generative Priors
We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication. We consider the case where the unknown vectors are in the range of known deep generative models, G(1):R^n→R^l and G(2):R^p→R^l. In the case when the networks corresponding to the generative models are expansive, the weight matrices are random and the dimension of the unknown vectors satisfy l=Omega(n^2+p^2), up to log factors, we show that the empirical risk objective has a favorable landscape for optimization. That is, the objective function has a descent direction at every point outside of a small neighborhood around four hyperbolic curves. We also characterize the local maximizers of the empirical risk objective and, hence, show that there does not exist any other stationary points outside of these neighborhood around four hyperbolic curves and the set of local maximizers. We also implement a gradient descent scheme inspired by the geometry of the landscape of the objective function. In order to converge to a global minimizer, this gradient descent scheme exploits the fact that exactly one of the hyperbolic curve corresponds to the global minimizer, and thus points near this hyperbolic curve have a lower objective value than points close to the other spurious hyperbolic curves. We show that this gradient descent scheme can effectively remove distortions synthetically introduced to the MNIST dataset.
more »
« less
 Award ID(s):
 1848087
 NSFPAR ID:
 10157457
 Date Published:
 Journal Name:
 Advances in neural information processing systems
 Volume:
 32
 ISSN:
 10495258
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)Curves play a fundamental role across computer graphics, physical simulation, and mathematical visualization, yet most tools for curve design do nothing to prevent crossings or selfintersections. This paper develops efficient algorithms for (self)repulsion of plane and space curves that are wellsuited to problems in computational design. Our starting point is the socalled tangentpoint energy, which provides an infinite barrier to selfintersection. In contrast to local collision detection strategies used in, e.g., physical simulation, this energy considers interactions between all pairs of points, and is hence useful for global shape optimization: local minima tend to be aesthetically pleasing, physically valid, and nicely distributed in space. A reformulation of gradient descent, based on a SobolevSlobodeckij inner product enables us to make rapid progress toward local minimaindependent of curve resolution. We also develop a hierarchical multigrid scheme that significantly reduces the perstep cost of optimization. The energy is easily integrated with a variety of constraints and penalties (e.g., inextensibility, or obstacle avoidance), which we use for applications including curve packing, knot untangling, graph embedding, noncrossing spline interpolation, flow visualization, and robotic path planning.more » « less

We develop a framework for reconstructing images that are sparse in an appropriate transform domain from polychromatic computed tomography (CT) measurements under the blind scenario where the material of the inspected object and incidentenergy spectrum are unknown. Assuming that the object that we wish to reconstruct consists of a single material, we obtain a parsimonious measurementmodel parameterization by changing the integral variable from photon energy to mass attenuation, which allows us to combine the variations brought by the unknown incident spectrum and mass attenuation into a single unknown massattenuation spectrum function; the resulting measurement equation has the Laplaceintegral form. The massattenuation spectrum is then expanded into basis functions using B splines of order one. We consider a Poisson noise model and establish conditions for biconvexity of the corresponding negative loglikelihood (NLL) function with respect to the densitymap and massattenuation spectrum parameters. We derive a blockcoordinate descent algorithm for constrained minimization of a penalized NLL objective function, where penalty terms ensure nonnegativity of the massattenuation spline coefficients and nonnegativity and gradientmap sparsity of the densitymap image, imposed using a convex totalvariation (TV) norm; the resulting objective function is biconvex. This algorithm alternates between a Nesterov’s proximalgradient (NPG) step and a limitedmemory BroydenFletcherGoldfarbShanno with box constraints (LBFGSB) iteration for updating the image and massattenuation spectrum parameters, respectively. We prove the KurdykaŁojasiewicz property of the objective function, which is important for establishing local convergence of blockcoordinate descent schemes in biconvex optimization problems. Our framework applies to other NLLs and signalsparsity penalties, such as lognormal NLL and ℓ₁ norm of 2D discrete wavelet transform (DWT) image coefficients. Numerical experiments with simulated and real Xray CT data demonstrate the performance of the proposed scheme.more » « less

Abstract Given $n$ general points $p_1, p_2, \ldots , p_n \in{\mathbb{P}}^r$ it is natural to ask whether there is a curve of given degree $d$ and genus $g$ passing through them; by counting dimensions a natural conjecture is that such a curve exists if and only if $$\begin{equation*}n \leq \left\lfloor \frac{(r + 1)d  (r  3)(g  1)}{r  1}\right\rfloor.\end{equation*}$$The case of curves with nonspecial hyperplane section was recently studied in [2], where the above conjecture was shown to hold with exactly three exceptions. In this paper, we prove a “boundederror analog” for special linear series on general curves; more precisely we show that existence of such a curve subject to the stronger inequality $$\begin{equation*}n \leq \left\lfloor \frac{(r + 1)d  (r  3)(g  1)}{r  1}\right\rfloor  3.\end{equation*}$$Note that the $3$ cannot be replaced with $2$ without introducing exceptions (as a canonical curve in ${\mathbb{P}}^3$ can only pass through nine general points, while a naive dimension count predicts twelve). We also use the same technique to prove that the twist of the normal bundle $N_C(1)$ satisfies interpolation for curves whose degree is sufficiently large relative to their genus, and deduce from this that the number of general points contained in the hyperplane section of a general curve is at least $$\begin{equation*}\min\left(d, \frac{(r  1)^2 d  (r  2)^2 g  (2r^2  5r + 12)}{(r  2)^2}\right).\end{equation*}$$ As explained in [7], these results play a key role in the author’s proof of the maximal rank conjecture [9].more » « less

Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1hiddenlayer networks with fewer than n (sample size) neurons when the activation is smooth. First, we prove that as long as the width m>=2n=d (where d is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no localmin or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice region. Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. It is expected that projected gradient methods converge to KKT points under mild technical conditions, but we leave the rigorous convergence analysis to future work. Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets.more » « less