<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Robust Low-Rank Matrix Completion via an Alternating Manifold Proximal Gradient Continuation Method</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10231772</idno>
					<idno type="doi">10.1109/TSP.2021.3073544</idno>
					<title level='j'>IEEE Transactions on Signal Processing</title>
<idno>1053-587X</idno>
<biblScope unit="volume">69</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Minhui Huang</author><author>Shiqian Ma</author><author>Lifeng Lai</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Robust low-rank matrix completion (RMC), or robust principal component analysis with partially observed data, has been studied extensively for computer vision, signal processing and machine learning applications. This problem aims at decomposing a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries of the matrix. A widely used approach to tackle RMC is to consider a convex formulation, which minimizes the nuclear norm of the low-rank matrix (to promote low-rankness) plus the 1 norm of the sparse matrix (to promote sparsity). In this paper, motivated by some recent works on low-rank matrix completion and Riemannian optimization, we formulate this problem as a nonsmooth Riemannian optimization problem over Grassmann manifold. This new formulation is scalable as the low-rank matrix is factorized to the multiplication of two much smaller matrices. We then propose an alternating manifold proximal gradient continuation method to solve the proposed new formulation. Convergence rate of the proposed algorithm is rigorously analyzed. Numerical results on both synthetic data and real data on background extraction from surveillance videos are reported to demonstrate the advantages of the proposed new formulation and algorithm over several popular existing approaches.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>(PCA) problem that has been extensively studied in the literature <ref type="bibr">[5]</ref>- <ref type="bibr">[7]</ref>.</p><p>A critical assumption of RMC is that the underlying matrix is of low rank. Therefore, RMC can be naturally formulated as a rank constrained optimization problem. However, the rank function is usually difficult to handle because of its combinatorial nature. To resolve this issue, there is a line of research focusing on convex relaxations of RMC formulations <ref type="bibr">[5]</ref>- <ref type="bibr">[7]</ref>. For example, a widely used technique in these works is to replace the rank function by the nuclear norm of the matrix under consideration. However, algorithms for solving convex relaxations involving nuclear norm usually require to compute a singular value decomposition (SVD) in each iteration, which can be very time consuming for large-scale problems. Recently, nonconvex RMC formulations based on low-rank matrix factorization were proposed in the literature <ref type="bibr">[8]</ref>- <ref type="bibr">[11]</ref>. In these formulations, the target low-rank matrix is factorized as the product of two much smaller matrices so that the low-rank property is automatically satisfied. The idea of factorization greatly reduces the price of promoting the low-rankness. Another critical assumption of RMC is that the matrix corresponding to the noise is sparse. Usually the 1 norm of the noise matrix is penalized to promote its sparsity.</p><p>In this paper, motivated by the recent developments on Riemannian optimization, we propose a new nonconvex formulation for RMC. Specifically, our new formulation is a nonsmooth optimization problem over the Grassmann manifold. The idea of modeling matrix completion as Riemannian optimization has been explored in the literature <ref type="bibr">[12]</ref>- <ref type="bibr">[16]</ref>. Modeling RMC as a Riemannian optimization is more challenging, because of the existence of the function promoting sparsity of the noise. Recently, Zhang and Yang <ref type="bibr">[17]</ref> proposed to model RMC as a Riemannian optimization over the fixed-rank manifold. This formulation requires a priori knowledge of the level of sparsity of the sparse noise matrix, which might be hard to estimate in practice. Cambier and Absil <ref type="bibr">[18]</ref> considered minimizing a nonsmooth objective over the fixed-rank manifold. However, their algorithm needs to smooth the objective using certain smoothing technique, and thus does not solve the original nonsmooth problem. The GRASTA algorithm <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref> proposed by He et al. minimizes a nonsmooth objective over the Grassmann manifold. The authors suggested to use the alternating direction method of multipliers (ADMM) to solve it, but it is known that ADMM lacks convergence guarantees in this case.</p><p>We propose an alternating manifold proximal gradient (AManPG) algorithm for solving our new RMC model. The AManPG algorithm alternatingly updates the low-rank matrix using a Riemannian gradient step and the sparse matrix using a proximal gradient step. This method is motivated by the ManPG algorithm recently proposed by Chen et al. <ref type="bibr">[21]</ref>. We show that the iteration complexity of AManPG is O( -2 ) for obtaining an -stationary point. Note that our AManPG solves the nonsmooth problem directly, and it does not need any smoothing technique as used in <ref type="bibr">[18]</ref>. Moreover, we adopt a continuation technique <ref type="bibr">[22]</ref> to further improve the computational efficiency of AManPG. Our numerical experiments show that the proposed AManPG with continuation compares favorably to some existing methods for solving the RMC problem.</p><p>The rest of this paper is organized as follows. In Section II, we briefly review some existing related work on the RMC problem and give our new RMC formulation. In Section III, we review some basics of manifold optimization and propose the ManPG algorithm for our RMC formulation. In Section IV and Section V, we propose our AManPG with continuation algorithm and provide its convergence analysis. In Section VI, numerical results are presented to demonstrate the advantages of the proposed new formulation and algorithm. Finally, we draw some conclusions in Section VII.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PROBLEM FORMULATION</head><p>In this section, we discuss some related work in the literature, and then present our RMC formulation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Robust PCA and Robust Matrix Completion</head><p>Robust PCA <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref> is an important tool in data analysis and has found many interesting applications in computer vision, signal processing, machine learning, and statistics etc. The goal is to decompose a given matrix M &#8712; R m&#215;n into the superposition of a low-rank matrix L and a sparse matrix S, i.e., M = L + S. The works by Cand&#232;s et al. <ref type="bibr">[5]</ref> and Chandrasekaran et al. <ref type="bibr">[6]</ref> formulate the problem as the following convex optimization problem:</p><p>where &#947; &gt; 0 is a weighting parameter, the nuclear norm L * sums the singular values of L, and the 1 norm S 1 sums the absolute values of all entries of S, i.e.,</p><p>When only a subset of the entries of M is observed, robust PCA becomes the robust low-rank matrix completion problem <ref type="bibr">[18]</ref>. Similar to (1), a convex formulation for RMC can be cast as follows:</p><p>min</p><p>where &#937; denotes the set of the indices of the observed entries and P &#937; : R m&#215;n &#8594; R m&#215;n denotes a projection defined as</p><p>The convex formulations (1) and ( <ref type="formula">2</ref>) have been studied extensively in the literature, and we refer to the recent survey paper <ref type="bibr">[7]</ref> for algorithms for solving them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Our Formulation and Contributions</head><p>By assuming that the rank of L is known (denoted by r), the idea of many nonconvex formulations is based on the fact that L can be factorized to L = UV , where U &#8712; R m&#215;r , V &#8712; R r&#215;n . In this paper, we also utilize this idea and propose to solve the following formulation of RMC:</p><p>where 0 &lt; &#955; &lt; 1, &#947; &gt; 0, and U denotes an orthonormal basis of the subspace U. We show that the manifold proximal gradient method (ManPG) proposed by Chen et al. <ref type="bibr">[21]</ref> can be applied to solve (3) and the corresponding convergence analysis applies naturally. We then propose a variant of ManPG, named alternating ManPG (AManPG) that can significantly improve the efficiency of ManPG for solving <ref type="bibr">(3)</ref>. We further rigorously analyze the convergence rate of AManPG. Compared with GRASTA <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref>, our proposed algorithms for solving (3) have rigorous convergence guarantees and convergence rate analysis. Compared with <ref type="bibr">(9)</ref> and the algorithm proposed by Cambier and Absil <ref type="bibr">[18]</ref>, our algorithms solve the nonsmooth problem (3) directly without any smoothing technique. Finally, to further accelerate the convergence of AManPG, we incorporate the so-called continuation technique on the weighting parameter &#947; in (3). Our numerical results on both synthetic data and real data on background extraction from surveillance video demonstrate that our AManPG with Continuation (AManPGC) algorithm, compares favorably with existing methods for RMC.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Related Work</head><p>Replacing L by UV in (1) and (2) leads to various nonconvex formulations for robust PCA and RMC. In particular, Li et al. <ref type="bibr">[23]</ref> suggested using subgradient method to solve the following nonsmooth robust matrix recovery model</p><p>where vector y denotes the linear measurements to L via the known linear operator A : R m&#215;n &#8594; R |&#937;| . Shen et al. <ref type="bibr">[11]</ref> proposed the LMaFit algorithm that implements an ADMM for solving the following nonconvex formulation of robust PCA:</p><p>Note that if ( &#219;, V ) solves (5), then ( &#219;Q, Q -1 V ) also solves <ref type="bibr">(5)</ref> for any invertible Q &#8712; R r&#215;r . Since all matrices &#219;Q share the same column space, Dai et al. <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref> exploited this fact and formulated the matrix completion problem as the following optimization problem over a Grassmann manifold:</p><p>Authorized licensed use limited to: Univ of Calif Davis. Downloaded on May 20,2021 at 00:39:57 UTC from IEEE Xplore. Restrictions apply.</p><p>where Gr(m, r) denotes the Grassmann manifold, i.e., the set of r-dimensional vector subspaces of R m , and U &#8712; R m&#215;r is any matrix whose column space is U and is often chosen to be orthonormal. However, it is noticed that the outer problem might be discontinuous at points U for which the V problem does not have a unique solution. To address this issue, Keshavan et al. <ref type="bibr">[14]</ref>, <ref type="bibr">[15]</ref> proposed to optimize both the column space and row space at the same time, which results in the following so-called OptSpace formulation for matrix completion: min U&#8712;Gr(m,r),V&#8712;Gr(n,r)</p><p>where U and V are any orthonormal bases of U and V, &#955; &gt; 0 is a weighting parameter, and the regularizer U &#931;V 2 F is used so that the outer problem is continuous. Boumal and Absil <ref type="bibr">[16]</ref>, <ref type="bibr">[24]</ref> proposed to study the following variant of (6):</p><p>where &#937; is the complement of &#937;. They proposed to use the Riemannian trust-region method to solve this problem. Comparing with OptSpace <ref type="bibr">(7)</ref>, formulation (8) has a much smaller search space. Note that in <ref type="bibr">(8)</ref>, &#955; is usually chosen to be very close to zero, as it indicates that we have a small confidence that the entries (UV ) ij for (i, j) / &#8712; &#937; are equal to zero. For RMC, Cambier and Absil <ref type="bibr">[18]</ref> proposed the following Riemannian optimization formulation:</p><p>where M r denotes the fixed-rank manifold, i.e., M r := {X | rank(X) = r}. The algorithm proposed in <ref type="bibr">[18]</ref> needs to smooth the 1 norm first to change the problem to a smooth problem, and then apply the Riemannian conjugate gradient method to solve the smoothed problem. As a result, the algorithm in <ref type="bibr">[18]</ref> does not solve (9) exactly but only solves an approximation of it. Related to (6) and ( <ref type="formula">5</ref>), He et al. proposed the GRASTA algorithm <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref> that can be used to solve the following formulation of RMC:</p><p>where U is again an orthonormal basis of U. GRASTA uses alternating minimization and ADMM to solve <ref type="bibr">(10)</ref>, which is efficient in practice but lacks convergence guarantees. Recently, Zhang and Yang <ref type="bibr">[17]</ref> studied the following formulation of robust PCA over the fixed-rank manifold:</p><p>where g : R m&#215;n &#8594; R m&#215;n is a hard thresholding procedure that depends on a parameter specifying the level of sparsity of matrix L -M . This parameter might be difficult to estimate in practice.</p><p>A similar model for RMC was also considered in <ref type="bibr">[17]</ref>.</p><p>In the following we list a few other works related to robust PCA and RMC. Zhao et al. <ref type="bibr">[25]</ref> proposed two smooth functions that can promote the robustness against two types of outliers. He et al. <ref type="bibr">[26]</ref> derived a correntropy-based cost function and applied the half-quadratic technique to solve the matrix completion problem. Zeng and So <ref type="bibr">[27]</ref> proposed the iterative p -regression algorithm and ADMM for a RMC model that uses p norm to promote the sparisty. Zhang and Wang <ref type="bibr">[28]</ref> considered the low-rank Hankel matrix recovery problem with sparse noise, and proposed an alternating projection type method to solve it. Dutta et al. <ref type="bibr">[29]</ref> formulated robust PCA as a nonconvex feasibility problem and proposed an alternating projection method to solve it. In another work, Dutta et al. <ref type="bibr">[30]</ref> proposed to formulate the robust PCA as the best pair problem that aims to find a pair of points minimizing the distance between two disjoint sets. An accelerated proximal gradient method was designed in <ref type="bibr">[30]</ref> to solve this problem. There is a vast literature on robust PCA and RMC and the ones that we discussed above are by no means exhaustive. We refer the reader to <ref type="bibr">[7]</ref>, <ref type="bibr">[31]</ref>- <ref type="bibr">[33]</ref> for more comprehensive surveys on these topics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. THE MANPG ALGORITHM FOR ROBUST MATRIX COMPLETION</head><p>In this section, we show that the ManPG algorithm recently proposed by Chen et al. <ref type="bibr">[21]</ref> can be naturally adopted to solve <ref type="bibr">(3)</ref>. Note that the ManPG algorithm was originally proposed for solving problems over the Stiefel manifold, but the manifold in ( <ref type="formula">3</ref>) is Grassmann manifold. Therefore, we need to further elaborate on the details how ManPG works on Grassmann manifold. To this end, we first introduce some backgrounds on the geometry of Grassmann manifold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Geometry of the Grassmann Manifold</head><p>In this subsection we briefly introduce concepts and properties of the Grassmann manifold. Most of the materials here are from <ref type="bibr">[16]</ref>, and we include them here for the ease of discussion later. The Grassmann manifold Gr(m, r) is the set of r-dimensional linear subspaces of R m endowed with quotient manifold structure, whose dimension is dim(Gr(m, r)) = r(mr) <ref type="bibr">[34]</ref>. Each point U &#8712; Gr(m, r) is a linear subspace spanned by the column space of a full-column-rank matrix U :</p><p>where R m&#215;r * denotes the set of all m &#215; r matrices with full column rank, and col(U ) denotes the subspace spanned by the columns of U . For numerical reasons, we always represent U by its orthonormal basis, i.e., U &#8712; U(m, r) := {U &#8712; R m&#215;r : U U = I r }, where U (m, r) denotes the Stiefel manifold. From now on, by slightly abusing the notation, we use the matrix U &#8712; U(m, r) to represent U &#8712; Gr(m, r). Since multiplying by an r &#215; r orthonormal matrix does not change the column space of U , we can regard Gr(m, r) as a quotient of R m&#215;r * by the equivalent relation U = UQ, where Q is any r &#215; r orthonormal matrices. Throughout this paper, we use the Riemannian metric that is induced from the Euclidean inner product, i.e., U, V = Tr(U V ). Endowed with this Riemannian metric, the Grassmann manifold is also a Riemannian quotient manifold, and it admits a tangent space to Gr(m, r) at each point U:</p><p>Note that here we again slightly abuse the notation. More specifically, we use the matrix H &#8712; R m&#215;r to represent the tangent vector H &#8712; T U Gr(m, r). For the Riemannian manifold M, the Riemannian gradient of a smooth function f : M &#8594; R is defined as follows.</p><p>Definition 1: (Riemannian Gradient) Given a smooth function f : M &#8594; R, the Riemannian gradient of f at X &#8712; M, denoted by gradf (X), is the unique tangent vector in</p><p>where Df denotes the directional derivative of f . For the Grassmann manifold, the orthogonal projector from R m&#215;r onto the tangent space T U Gr(m, r) is given by (here we again slightly abuse the notation and use the orthonormal basis U to represent the subspace U):</p><p>The definition of retraction operation for manifold M is given below. Definition 2: (Retraction) Let Retr X (&#958;) : TM &#8594; M be a mapping from the tangent bundle TM to the manifold M. We call Retr X (&#8226;) a retraction at X if the following hold for any X &#8712; M and &#958; &#8712; T X M:</p><p>In our numerical experiments, we choose the QR decomposition as the retraction for Grassmann manifold:</p><p>where qf(X) denotes the Q-factor of the QR decomposition of X. Here we again use U to represent the subspace U and use H to represent the element in the tangent space. Note that because of ( <ref type="formula">12</ref>), we know that U + H is always full-rank, and therefore the output of the QR decomposition is unique.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. The ManPG Algorithm</head><p>Recently, Chen et al. <ref type="bibr">[21]</ref> proposed a novel ManPG algorithm for solving nonsmooth optimization problem over the Stiefel manifold St(n, r) in the following form:</p><p>in which F 1 is smooth with Lipschitz continuous gradient, F 2 is nonsmooth and convex. Here the smoothness, convexity and Lipschitz continuity are interpreted when the functions are considered in the ambient Euclidean space. A typical iteration of ManPG algorithm for solving <ref type="bibr">(17)</ref> is as follows:</p><p>where t &gt; 0 and &#945; k &gt; 0 are step sizes, and &#8711;F 1 denotes the Euclidean gradient of F 1 . Chen et al. <ref type="bibr">[21]</ref> proved that the iteration complexity of ManPG <ref type="bibr">(18)</ref> is O(1/ 2 ) for obtaining an -stationary solution of <ref type="bibr">(18)</ref>. They also demonstrated that ManPG is very efficient for solving sparse PCA and compressed modes problems. Now we discuss how to apply ManPG <ref type="bibr">[21]</ref> to solve <ref type="bibr">(3)</ref>. For ease of presentation, we denote the smooth part of F in (3) by f , i.e.,</p><p>) Note that for fixed U and S, the optimal V of F (U, V, S) (and f (U, V, S)) is uniquely determined. Therefore, by denoting</p><p>and</p><p>we know that our RMC formulation (3) reduces to</p><p>where U is an orthonormal basis of U. Note that although ManPG was originally designed for problems over the Stiefel manifold, we find that it can also be applied to problems over the Grassmann manifold. Also note that as long as the tangent space of the manifold can be represented by a linear operator, the subproblem of ManPG can be solved efficiently using a semi-smooth Newton method (see Section 4.2 of <ref type="bibr">[21]</ref>), and this happens to be the case for the Grassmann manifold (e.g., <ref type="bibr">(12)</ref>). Therefore, ManPG can be applied to solve <ref type="bibr">(22)</ref> and the subproblems can be solved relatively easily. It is easy to see that ManPG for solving <ref type="bibr">(22)</ref> reduces to the following updates in the k-th iteration:</p><p>where t S , t U , &#945; and &#946; are all step sizes, and h(S) := &#947; P &#937; (S) 1 . We now make some necessary remarks on this ManPG algorithm <ref type="bibr">(23)</ref>. First, ( <ref type="formula">23</ref>) is actually slightly different with a direct application of ManPG for solving <ref type="bibr">(22)</ref>. For a direct application of ManPG, we have t S = t U and &#945; = &#946;. Here in <ref type="bibr">(23)</ref> we allow these step sizes to be different so that we have more freedom to choose the best step sizes in practice. Also we note that (23c) and (23d) correspond to a Riemannian gradient step with respect to the U variable. The updates (23a) and (23b) correspond to a proximal gradient step for the S variable in the Euclidean space. This can be interpreted in the following way. First, in RMC <ref type="bibr">(22)</ref>, the U variable does not appear in the nonsmooth part of the objective, so it is natural to perform a Riemannian gradient step for U . Second, for fixed U , the S Algorithm 1: ManPG for Solving RMC <ref type="bibr">(22)</ref> problem is only an unconstrained problem in the Euclidean space, so we take a proximal gradient step. Moreover, the two subproblems (23a) and (23c) are both easy to solve. Specifically, (23c) can be reduced to</p><p>i.e., it is the negative Riemannian gradient of f multiplied by the step size t U . The &#916;S subproblem (23a) can be solved by a simple 1 norm shrinkage operation (note that we are only interested in the P &#937; (&#916;S k ) and P&#937;(&#916;S k ) can be simply set to 0):</p><p>Now, to implement the ManPG <ref type="bibr">(23)</ref>, the only remaining component is to calculate the Riemannian gradient grad U f (U, S) used in <ref type="bibr">(24)</ref>. The procedure for computing it is outlined in <ref type="bibr">[24]</ref>. By assuming that the subspace of the Grassmann manifold is represented by orthonormal bases, which means U is restricted to the Stiefel manifold, the Riemannian gradient of the smooth function f (U, S) with respect to U is given by:</p><p>where C &#8712; R m&#215;n is the mask operator whose components are given by: C ij = 1 if (i, j) &#8712; &#937;, and C ij = 0 otherwise. Here V U,S is defined in <ref type="bibr">(20)</ref>, and it can be computed as follows:</p><p>where vec denotes the vectorization operator, A is defined as</p><p>and &#8855; denotes the Kronecker product. For more details about these calculation, we refer the reader to <ref type="bibr">[24]</ref>.</p><p>With these preparations, we can finally summarize the ManPG algorithm <ref type="bibr">(23)</ref> for solving (3) (or, <ref type="bibr">(22)</ref>) as in Algorithm 1.</p><p>Remark  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. ALTERNATING MANPG WITH CONTINUATION</head><p>It should be noted that ManPG updates S and U in parallel. That is, ManPG ( <ref type="formula">23</ref>) is a Jacobi type iterative algorithm and for fixed (U k , S k ), (23a) and (23c) can be computed in parallel. One way that can possibly improve the speed of ManPG is to use a Gauss-Seidel type algorithm. This idea has also been adopted in <ref type="bibr">[35]</ref>, where the authors showed that the Gauss-Seidel type ManPG performs much better than the original Jacobi type ManPG. Motivated by this, here we also propose an alternating ManPG (AManPG), which updates S and U sequentially, instead of in parallel. However, one crucial thing to note here is that the V variable will need to be re-calculated, when we have a new S variable before we update the U variable. Our AManPG algorithm is summarized in Algorithm 2.</p><p>Remark 4: When we compute &#916;U k in AManPG, we used the latest S k+1 , which requires us to compute the latest V k U k ,S k+1 . While in ManPG, we used S k in the updates of &#916;U k , and this does not require us to compute another V k . This is the main difference between AManPG (Algorithm 2) and ManPG (Algorithm 1). In both algorithms, we always set &#945; = &#946; = 1. Noticing that the matrix A only depends on variable U , and there is no need to recalculate A when computing V k U k ,S k+1 . The continuation technique: There are two parameters in the model (3): &#955; and &#947;. Since &#955; indicates our confidence level of the entries of (UV ) being zero, it needs to be very small. In our numerical experiments, we choose &#955; = 10 -8 . The parameter &#947; in (3) controls the sparsity level of P &#937; (S). A larger &#947; yields sparser P &#937; (S). In practice, we usually do not know how sparse the matrix S should be, and it is therefore not easy to determine &#947;. A usual practice in the literature to deal with this issue is to conduct a continuation technique on &#947;. Roughly speaking, Algorithm 3: AManPG With Continuation (AManPGC) for Solving <ref type="bibr">(22)</ref>.</p><p>1: Input: Step sizes t S , t U , parameters &#947; 0 &#947; min , shrinking factors &#956; 1 &lt; 1, &#956; 2 &lt; 1, initial accuracy tolerance 0 2: Initialize: U 0 , S 0 . Set = 0 3: while &#947; &gt; &#947; min do 4:</p><p>Call AManPG to solve <ref type="bibr">(22)</ref> with &#947; = &#947; , and set the output of AManPG as (U +1 , S +1 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5:</head><p>&#947; +1 = &#956; 1 &#947; 6: +1 = &#956; 2 7:</p><p>= + 1 8: end while 9: Output: U , S the continuation starts with solving (3) using a relatively large &#947;. Then the parameter &#947; is decreased and ( <ref type="formula">3</ref>) is solved again. This process is repeated until &#947; is very small. This idea has been widely adopted in the literature, e.g., <ref type="bibr">[22]</ref>, <ref type="bibr">[36]</ref>- <ref type="bibr">[38]</ref>. Combining this continuation idea with our AManPG algorithm, we obtain the AManPGC algorithm which works greatly in practice as confirmed by our numerical results in Section VI. AManPGC is summarized in Algorithm 3. Note that in Algorithm 3, we also shrink the accuracy tolerance in each iteration, as we want to solve the problem more and more accurately.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONVERGENCE ANALYSIS FOR AMANPG</head><p>In this section, we analyze the convergence behavior and iteration complexity of AManPG (Algorithm 2). To simplify the notation, we denote M = Gr(m, r) and h(S) = &#947; P &#937; (S) 1 . We analyze the convergence of AManPG for solving the following problem:</p><p>where f (U, S) is smooth and h(S) is nonsmooth and convex.</p><p>Here the smoothness and convexity are interpreted when the functions are considered in the ambient Euclidean space. For simplicity, we rewrite the AManPG for solving (28) here. One typical iteration of AManPG for solving <ref type="bibr">(28)</ref> is the same as <ref type="bibr">(23)</ref> except that (23c) is replaced by</p><p>Note that we use S k+1 in (29) to replace S k in (23c).</p><p>We make the following assumptions to (28) throughout this section.</p><p>Assumption 5: (F is lower bounded) There exists a finite constant F * , such that</p><p>Note that for <ref type="bibr">(22)</ref>, it is easy to see that F * = 0.</p><p>The following assumption is about grad U f (U, S), which regards the regularity of the pullback function f (&#916;U, S) = f (Retr U (&#916;U ), S), and differs from the standard Lipschitz continuity assumption because of the retraction operator. This assumption was originally suggested in <ref type="bibr">[39]</ref>.</p><p>Assumption 7: (Restricted Lipschitz-type gradient for pullbacks) There exists L U &#8805; 0 such that, for sequence (U k , S k ) k&#8805;0 generated by AManPG (Algorithm 2), the pullback function</p><p>Note that since the Grassmann manifold is a compact submanifold of R m , for fixed S, by Lemma 2.7 of <ref type="bibr">[39]</ref>, Assumption 7 holds if f (U, S) has locally Lipschitz continuous gradient in the Euclidean space. Assumption 7 allows simple extensions of existing proofs in the Euclidean case. One limitation of this assumption is that it might be difficult to estimate L U in practice.</p><p>From the Theorem 4.1 in <ref type="bibr">[40]</ref>, we can define the stationary point of problem <ref type="bibr">(28)</ref> as follows.</p><p>Definition 8: (Stationary point). A pair of (U, S) &#8712; M &#215; R m&#215;n is called a stationary point of problem <ref type="bibr">(28)</ref> if it satisfies the first-order necessary conditions:</p><p>According to Theorem 4.1 in <ref type="bibr">[40]</ref>, the optimality conditions of the subproblems ( <ref type="formula">29</ref>) and (23a) are</p><p>If &#916;U k = 0 and &#916;S k = 0, then we know that S k+1 = S k , and (U k , S k ) satisfies ( <ref type="formula">30</ref>) and thus is a stationary point of <ref type="bibr">(28)</ref>. Therefore, we can use the norm of (U k , S k ) to measure the closeness to stationary point, and we define the -stationary point of (28) as follows. Definition 9: ( -stationary point). We say that</p><p>given by (23a) and ( <ref type="formula">29</ref>) satisfies</p><p>where L := min{L S , L U }. Now we are ready to analyze the iteration complexity of AManPG for obtaining an -stationary point of <ref type="bibr">(28)</ref>. The convergence analysis presented here mainly follows from <ref type="bibr">[21]</ref>, <ref type="bibr">[35]</ref>. The main difference lies in the Assumption 7, which was not used in the previous work <ref type="bibr">[21]</ref>, <ref type="bibr">[35]</ref>. This assumption helps us in simplifying and extending the analysis from <ref type="bibr">[21]</ref>, <ref type="bibr">[35]</ref> to the Grassmann manifold. First, we present two lemmas, which show that there is a sufficient reduction of the objective value after each update of S and U . These two lemmas essentially follow Lemma 4 of <ref type="bibr">[41]</ref>.</p><p>Lemma 10: Assume Assumption 6 holds, and the sequence (S k , U k , &#916;S k , &#916;U k ) is generated by AManPG. By choosing t S = 1/L S and &#945; = 1, the following inequality holds</p><p>Proof: See Appendix A for details. Lemma 11: Assume Assumption 7 holds, and the sequence (S k , U k , &#916;S k , &#916;U k ) is generated by AManPG. By choosing t U = 1/L U and &#946; = 1, the following inequality holds</p><p>) Proof: See Appendix B for details. Now we are ready to present our main convergence result of AManPG, which essentially follows Theorem 1 of <ref type="bibr">[35]</ref>.</p><p>Theorem 12: Assume Assumptions 5, 6 and 7 hold. By choosing t U = 1/L U , t S = 1/L S , &#945; = &#946; = 1 in AManPG (Algorithm 2), every limit point of the sequence {U k , S k } generated by AManPG is a stationary point of problem <ref type="bibr">(28)</ref>. Moreover, AManPG returns an -stationary point of problem <ref type="bibr">(28)</ref> in at most 2L(F (U 0 , S 0 ) -F * )/ 2 iterations, where L := min(L S , L U ).</p><p>Proof: See Appendix C for details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. NUMERICAL EXPERIMENTS</head><p>In this section, we provide numerical results for both synthetic and real datasets to illustrate the performance of the proposed algorithms. We focus on comparing our ManPG and AManPG algorithms with some baseline algorithms for solving nonconvex RMC formulations, in particular, the subgradient method (SubGM) <ref type="bibr">[23]</ref>, the LMaFit method <ref type="bibr">[11]</ref> and the Riemannian conjugate gradient method for the smoothed 1 -norm objective function (RMC) <ref type="bibr">[18]</ref>. We use the same continuation framework for ManPG and call it ManPGC. For the SubGM method, we assume that the linear operator A is a simple projection. For the LMaFit algorithm, we turn off the rank estimation since we assume that the rank is known for all cases. We use the original setting for the RMC algorithm. All algorithms use the same C-Mex code for accelerating the matrix multiplication between a sparse matrix and a full matrix and some other bottleneck computations. Moreover, to guarantee a fair comparison, each algorithm is carefully tuned to achieve its best performance. All experiments were run on Matlab R2018b with a 2.3 GHz Dual-Core Intel Core i5 CPU.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Synthetic Data</head><p>We first test our ManPGC and AManPGC algorithms on different synthetic data. After picking the values of m, n, r, we generate U * &#8712; R m&#215;r , V * &#8712; R r&#215;n with i.i.d. normal entries of zero mean and unit variance. The target low-rank matrix is X * = U * V * . We test two different ways for generating the sparse matrix S * in which the positions of the nonzero entries are chosen uniformly random: (i) Gaussian (Cases 1-6 in Table <ref type="table">I</ref>): the nonzero entries of S * follow the normal distribution N (0, 1);  <ref type="table">I</ref>): the nonzero entries of S * are uniform in [-500, 500]. Finally we set M = X * + S * and P &#937; (M ) is the observed matrix, where &#937; denotes the indices set of nonzero entries of S * . We set = 10 -8 L for the stopping criteria in Algorithms 1 and 2.</p><p>We test our algorithms for the cases in Table <ref type="table">I</ref>, where sr denotes the sampling ratio and sp denotes the sparsity level, i.e., the ratio of nonzero entries.</p><p>Parameters: For Cases 1-6, we choose t S = 1, and for Cases 7-12, we choose t S = 0.8. We choose t U = 2/|&#937;| for both AManPGC and ManPGC in all cases, except Case 4 in which we choose t U = 1.6/|&#937;|, 0 = 20 for ManPGC. Moreover, we set &#947; 0 = 10, &#955; = 10 -8 , &#956; 1 = &#956; 2 = 1/10, 0 = 30 for Cases 1-6. For Cases 7-8, we set &#956; 1 = 1/10, &#956; 2 = 1/20, 0 = 200 for AManPG and &#956; 1 = 1/3, &#956; 2 = 1/8, 0 = 400 for ManPG. For Cases 9-12, we set &#956; 1 = 1/10, &#956; 2 = 1/20, 0 = 200 for AManPG and &#956; 1 = 1/6, &#956; 2 = 1/16, 0 = 200 for ManPG. We use the SVD of the observed matrix P &#937; (M ) as the initial point for all algorithms. Note that the step sizes t S and t U can also be estimated using the line search technique, which should be used in practice. Here we chose to tune constant step sizes because we observed that they work better than the ones found by line search on the tested problems.</p><p>In the k-th iteration, we calculate the relative difference for each method as</p><p>and report the Relative Difference versus the CPU time (in seconds) in Fig. <ref type="figure">1</ref>.</p><p>From Fig. <ref type="figure">1</ref> we can see that our AManPGC algorithm usually outperforms other baseline algorithms on CPU time. From Fig. <ref type="figure">1</ref>, we see that AManPGC is the fastest algorithm in almost all 12 tested cases. Moreover, ManPGC also works well for Cases 1-6, while shows some disadvantage comparing with RMC for Cases 7-12. However, ManPGC is still better than SubGM and LMaFit in all cases. Here we discuss some possible reasons why ManPGC and AManPGC can be more efficient than RMC <ref type="bibr">[18]</ref>, SubGM <ref type="bibr">[23]</ref> and LMaFit <ref type="bibr">[11]</ref>. For RMC, it needs to solve a sequence of smooth Riemannian optimization problems using the Riemannian conjugate gradient method. As a contrast, our ManPGC and AManPGC only solve one (though nonsmooth) Riemannian optimization problem. Note that SubGM uses subgradient information, and requires a geometrically decreasing step size to guarantee the convergence. Therefore, it can be slow in practice. As a contrast, our ManPGC and AManPGC do not use the subgradient information, and allow constant step size, and therefore can be more efficient and achieve higher accuracy in practice. For LMaFit, note that it is essentially a nonconvex ADMM algorithm. Its convergence speed depends on the penalty parameter used in the augmented Lagrangian function, which is difficult to tune in practice. It is also worth pointing out that there still lacks a convincing convergence guarantee for LMaFit thus far, as the one given in <ref type="bibr">[11]</ref> requires a strong and unverifiable condition. Therefore, it is not clear under what conditions LMaFit converges. Fig. <ref type="figure">2</ref>. Recovery of full matrices from their entries. We declare X to be recovered if the relative difference satisfies UV -X * F / X * F &lt; 10 -2 . Top row: phase transition for synthetic datasets by varying rank and outlier sparsity. We fix the sampling ratio = 0.1. Botton row: phase transition for synthetic datasets by varying rank and the sampling ratio. We fix the outlier sparsity = 0.1. For each rank-sampling ratio or rank-sparsity pair, we run three algorithms for 20 times and report the empirical recovery rate. White denotes perfect recovery in all experiments, and deep blue denotes failure for all experiments. We further plot the phase transition for the proposed AManPGC, ManPGC algorithms and the RMC formulation, which performs the best among all baseline methods. We set m = n = 1000, t S = 1, t U = 2/|&#937;|, 0 = &#947; 0 = 1, &#956; 1 = &#956; 2 = 1/5, for both AManPGC and ManPGC. We set the maximum iteration number for all three algorithms to be 100. Fig. <ref type="figure">2</ref> shows the recovery rate for different ranks, outlier sparsities and sampling ratio. We see from Fig. <ref type="figure">2</ref> that the AManPGC and the RMC algorithms perform similarly and they are both better than the ManPGC algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Real Data: Background Extraction From Surveillance Video</head><p>We now evaluate the performance of ManPGC and AManPGC for video background estimation using three data sets suggested in <ref type="bibr">[42]</ref>. Each data set contains a sequence of grayscale frames each of which can be regarded a matrix. By stacking the columns of each frame of the video into a long vector, we obtain a matrix whose columns correspond to the frames. This matrix can be decomposed to the matrix corresponding  to the static background plus the matrix corresponding to the moving foreground. The former one is expected to have low rank, because ideally the columns corresponding to the background are all the same; and the latter one is expected to be sparse, because the moving foreground usually only occupies a small portion of the frame. In our experiment, we observe that we are able to extract the background of the video using only partial observation. We apply our algorithms for this task and compare with existing methods. The three data sets are: We set r = 1 and test two cases for the ratio of observed entries: 50% and 10%. We terminate the algorithms when the following inequality holds:</p><p>which indicates that the algorithm does not make much progress. We plot the recovered background frames in Figs. <ref type="figure">3, 4</ref> and<ref type="figure">5</ref>.   The CPU time (in and the number of iterations are reported in Tables II, III, and IV. From Tables II, III, and IV, we see that ManPGC and AManPGC are faster than the other three algorithms in all cases. Moreover, we see that even with </p><p>where X * denotes the ground truth, and</p><p>The d metric of the recovered backgrounds for different are shown in Fig. <ref type="figure">6</ref>. From Fig. <ref type="figure">6</ref> we see that all the tested algorithms yield good results for recovering the backgrounds and the foregrounds. More specifically, from Fig. <ref type="figure">6</ref> we see when 50% entries of M are observed, SubGM is slightly worse than others for the "Escalator" data, and when 10% entries of M are observed, LMaFit is slightly worse than others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. CONCLUSION</head><p>In this paper, we have proposed a new formulation for RMC as a Riemannian optimization over the Grassmann Manifold. Inspired by the recent work ManPG, we have developed a new algorithm called AManPG for solving this nonconvex and nonsmooth manifold optimization problem. The iteration complexity of AManPG for obtaining an -stationary solution is rigorously analyzed, following a similar way as ManPG. In our numerical experiments, we have tested our proposed formulation and algorithms for both synthetic and real datasets, and the numerical results show that our proposed formulation and methods usually outperform some popular existing methods for RMC.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX A PROOF OF LEMMA 10</head><p>Proof: For fixed U &#8712; M and S &#8712; R m&#215;n , define g U,S (T ) := &#8711; S f (U, S), T + 1 2t S T 2 F + h(S + T ).</p><p>Note that g U,S is (1/t S )-strongly convex, therefore it holds that g U,S (T 1 ) &#8805; g U,S (T 2 ) + &#8706;g U,S (T 2 ), T 1 -T 2</p><p>By letting T 1 = 0, T 2 = &#916;S k in (39), we have</p><p>where the equality is from the optimality condition of (23a), i.e., 0 &#8712; &#8706;g U k ,S k (&#916;S k ). Using Assumption 6, we can get </p><p>, where the first inequality comes from <ref type="bibr">(41)</ref>, and the second inequality is from <ref type="bibr">(40)</ref>. This completes the proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX B PROOF OF LEMMA 11</head><p>Proof: From Assumption 7, we have</p><p>where the second inequality is due to <ref type="bibr">(31)</ref>. This completes the proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX C PROOF OF THEOREM 12</head><p>Proof: Combining (33) and <ref type="bibr">(34)</ref> yields</p><p>Since F is decreasing and bounded below, we have</p><p>It follows that every limit point of {(U k , S k )} is a stationary point of <ref type="bibr">(28)</ref>. Moreover, if AManPG does not terminate after K</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: Univ of Calif Davis. Downloaded on May 20,2021 at 00:39:57 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
