<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion *</title></titleStmt>
			<publicationStmt>
				<publisher>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE</publisher>
				<date>06/16/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10562628</idno>
					<idno type="doi">10.1109/CVPR52733.2024.01381</idno>
					
					<author>Feng Yu</author><author>Teng Zhang</author><author>Gilad Lerman</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We present the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. STE is a fusion of the Tyler's M-estimator (TME) and a variant of the fast median subspace. Our theoretical analysis suggests that, under a common inlier-outlier model, STE can effectively recover the underlying subspace, even when it contains a smaller fraction of inliers relative to other methods in the field of robust subspace recovery. We apply STE in the context of Structure from Motion (SfM) in two ways: for robust estimation of the fundamental matrix and for the removal of outlying cameras, enhancing the robustness of the SfM pipeline. Numerical experiments confirm the state-of-the-art performance of our method in these applications. This research makes significant contributions to the field of robust subspace recovery, particularly in the context of computer vision and 3D reconstruction.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In many applications, data has been collected in large quantities and dimensions. It is a common practice to represent such data within a low-dimensional subspace that preserves its essential information. Principal Component Analysis (PCA) is frequently employed to identify this subspace. However, PCA faces challenges when dealing with data contaminated by outliers. Consequently, the field of Robust Subspace Recovery (RSR) aims to develop a framework for outlier-robust PCA. RSR is particularly relevant to problems in computer vision, such as fundamental matrix estimation, which involves recovering a hidden subspace associated with "good correspondence pairs" among highly corrupted measurements.</p><p>Various algorithms have been proposed to address RSR, employing methods such as projection pursuit <ref type="bibr">[1,</ref><ref type="bibr">5,</ref><ref type="bibr">14,</ref><ref type="bibr">23,</ref><ref type="bibr">29,</ref><ref type="bibr">35,</ref><ref type="bibr">38]</ref>, subspace energy minimization (in particular least absolute deviations and its relaxations) <ref type="bibr">[9,</ref><ref type="bibr">24,</ref><ref type="bibr">27,</ref><ref type="bibr">33,</ref><ref type="bibr">34,</ref><ref type="bibr">42,</ref><ref type="bibr">43,</ref><ref type="bibr">51]</ref>, robust covariance estimation <ref type="bibr">[50]</ref>, filtering-based methods <ref type="bibr">[4,</ref><ref type="bibr">8,</ref><ref type="bibr">49]</ref> and exhaustive subspace search methods <ref type="bibr">[10,</ref><ref type="bibr">16]</ref>. An in-depth exploration and comprehensive overview of robust subspace recovery and its diverse algorithms can be found in <ref type="bibr">[25]</ref>.</p><p>Methods based on robust covariance estimators, such as the Tyler's M-estimator (TME), offer additional useful information on the shape of the data within the subspace, similarly to PCA in the non-robust setting. They also offer maximum-likelihood interpretation, which is missing in many other methods. Application of the TME <ref type="bibr">[47]</ref> to RSR has been shown to be successful on basic benchmarks <ref type="bibr">[25,</ref><ref type="bibr">50]</ref>. Moreover, under a model of inliers in a general position on a subspace and outliers in general position in the complement of the subspace, TME was shown to recover the subspace within a desirable fraction of inliers <ref type="bibr">[50]</ref>. Below this fraction it was proved to be Small Set Expansion (SSE) hard to solve the RSR problem <ref type="bibr">[16]</ref>.</p><p>One may still succeed with solving the RSR problem with a computationally efficient algorithm when the fraction of inliers is lower than the one required by <ref type="bibr">[16]</ref>, considering a more restricted data model or violating other assumptions made in <ref type="bibr">[16]</ref>. For example, some special results in this direction are discussed in <ref type="bibr">[32]</ref>. Also, <ref type="bibr">[33]</ref> proposes the generalized haystack model of inliers and outliers to demonstrate the possibility of handling lower fractions of inliers by an RSR algorithm. This model extends the limited standard haystack model <ref type="bibr">[27]</ref>, where basic methods (such as PCA filtering) can easily work with low fractions of outliers. Nevertheless, it is unclear how practical the above theoretical ideas are for applied settings.</p><p>One practical setting that requires a fraction of inliers significantly lower than the one stated in <ref type="bibr">[16]</ref> arises in the problem of robust fundamental (or essential) matrix estimation. The fundamental matrix encompasses the epipolar geometry of two views in stereo vision systems. It is typically computed using point correspondences between the two projected images. This computation requires finding an 8-dimensional subspace within a 9-dimensional ambient space. In this setting, the theoretical framework of <ref type="bibr">[16]</ref> requires that the fraction of inliers be at least 8/9&#8673;88.9%, which is clearly unreasonable to require.</p><p>To date, the RANdom Sample Consensus (RANSAC) method <ref type="bibr">[10]</ref> is the only RSR method that has been highly successful in addressing this nontrivial scenario, gaining widespread popularity in computer vision. RANSAC is an iterative method that randomly selects minimal subsets of the data and fits models, in particular subspaces, to identify the best consensus set, that is, the set in most agreement with the hypothesized model. There are numerous approaches proposed to improve RANSAC, especially for this particular application, including locally optimized RANSAC (LO-RANSAC, <ref type="bibr">[6]</ref>), maximum likelihood estimator RANSAC (MLESAC) <ref type="bibr">[45]</ref>), degeneracy-check enabled RANSAC (DEGENSAC) <ref type="bibr">[7]</ref>) and M-estimator guided RANSAC (MAGSAC) <ref type="bibr">[2]</ref>). A near recovery theory for a variant of RANSAC under some assumptions on the outliers was suggested in <ref type="bibr">[32]</ref>. Nevertheless, in general, RANSAC is rather slow and its application to higher-dimensional problems is intractable. This work introduces a novel RSR algorithm that is guaranteed to robustly handle a lower fraction of outliers than the theoretical threshold proposed by <ref type="bibr">[16]</ref>, under special settings. Our basic idea is to adapt Tyler's M-Estimator to utilize the information of the underlying d-dimensional subspace, while avoiding estimation of the full covariance. By using less degrees of freedom we obtain a more accurate subspace estimator than the one obtained by TME with improved computational complexity. We show that STE is a fusion of the Tyler's M-estimator (TME) and a variant of the fast median subspace (FMS) <ref type="bibr">[24]</ref> that aims to minimize a subspace-based `0 energy.</p><p>Our theory shows that our proposed subspace-constrained Tyler's estimator (STE) algorithm can effectively recover the underlying subspace, even when it contains a smaller fraction of inliers relative to other methods. We obtain this nontrivial achievement first in a generic setting, where we establish when an initial estimator for STE is sufficiently well-conditioned to guarantee the desired robustness of STE. We then assume the asymptotic generalized haystack model and show that under this model, TME itself is a well-conditioned initial estimator for STE, and that unlike TME, STE with this initialization can deal with a lower fraction of inliers than the theoretical threshold specified in <ref type="bibr">[16]</ref>.</p><p>We demonstrate competitive performance in robust fundamental matrix estimation, relying solely on subspace information without additional methods for handling degenerate scenarios, in contrast to <ref type="bibr">[7,</ref><ref type="bibr">12,</ref><ref type="bibr">37]</ref>. We also propose a potential application of RSR for removing bad cameras in order to enhance the SfM pipeline and show competitive performance of STE. This is a completely new idea and it may require additional exploration to make it practical. Nevertheless, it offers a very different testbed where N = D is very large and RANSAC is generally intractable.</p><p>The rest of the paper is organized as follows: &#167;2 introduces the STE framework, &#167;3 establishes theoretical guarantees of STE, &#167;4 applies STE to two different problems in SfM, demonstrating its competitive performance relative to existing algorithms, and &#167;5 provides conclusions and future directions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The STE Algorithm</head><p>We present our proposed STE. We review basic notation in &#167;2.1 and Tyler's original estimator in &#167;2.2. We describe our method in &#167;2.3, its computational complexity in &#167;2.4, its algorithmic choices in &#167;2.5 and an interpretation for it as a fusion of TME and FMS with p=0 in &#167;2.6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Notation</head><p>We use bold upper and lower case letters for matrices and column vectors, respectively. Let I k denote the identity matrix in R k&#8677;k , where if k is obvious we just write I. For a matrix A, we denote by tr(A) and Im(A) the trace and image (i.e., column space) of A. We denote by S + (D) and S ++ (D) the sets of positive semidefinite and definite matrices in R D&#8677;D , respectively. We denote by O(D,d) the set of semi-orthogonal D&#8677;d matrices, i.e., U 2 O(D,d) if and only if U 2 R D&#8677;d and U &gt; U=I d . We refer to linear d-dimensional subspaces as d-subspaces. For a d-subspace L, we denote by P L the D&#8677;D matrix representing the orthogonal projector onto L. We also arbitrarily fix</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Tyler's Estimator and its Application to RSR</head><p>Tyler's M-estimator (TME) <ref type="bibr">[47]</ref> robustly estimates the covariance &#8963; &#8676; of the dataset X ={x i } N i=1 &#8674;R D by minimizing</p><p>over &#8963; 2 S ++ (D) such that tr(&#8963;) = 1. The cost function in (1) can be motivated by writing the maximum likelihood of the multivariate t-distribution and letting its degrees of freedom parameter, &#9003;, approach zero <ref type="bibr">[31]</ref>. This cost function is invariant to dilations of &#8963;, and the constraint on tr(&#8963;), whose value can be arbitrarily chosen, fixes a scale. TME also applies to scenarios where the covariance matrix does not exist. In such cases, TME estimates the shape (or scatter matrix) of the distribution, which is defined up to an arbitrary scale. More direct interpretations of TME as a maximum likelihood estimator can be found in <ref type="bibr">[11,</ref><ref type="bibr">46]</ref>. When D is fixed and N approaches infinity, TME is the "most robust" estimator of the shape matrix for data i.i.d. sampled from a continuous elliptical distribution <ref type="bibr">[47]</ref> in a minimax sense, that is, as a minimizer of the maximal variance.</p><p>Tyler <ref type="bibr">[47]</ref> proposed the following iterative formula for computing TME:</p><p>Kent and Tyler <ref type="bibr">[22]</ref> proved that if any d-subspace of R D , where 1 &#63743; d &#63743; D 1, contains fewer than Nd/D data points, then the above iterative procedure converges to TME. Linear rate of convergence was proved for the regularized TME in <ref type="bibr">[15]</ref> and for TME in <ref type="bibr">[13]</ref>.</p><p>One can apply the TME estimator to solve the RSR problem with a given dimension d by forming the subspace spanned by the top d eigenvectors of TME. Zhang <ref type="bibr">[50]</ref> proved that as long as there are more than Nd/D inliers lying on a subspace, and the projected coordinates of these inliers on the d-subspace and the projected coordinates of the outliers on the (D d)-dimensional orthogonal complement of the subspace are in general position, then TME recovers this subspace. Zhang <ref type="bibr">[50]</ref> also showed that in this setting the above iterative formula converges (note that the condition of convergence in <ref type="bibr">[22]</ref> does not apply in this case). The above lower bound of Nd/D on the number of inliers coincides with the general bound for the noiseless RSR problem, beyond which the problem becomes Small Set-Expansion (SSE) hard <ref type="bibr">[16]</ref>.</p><p>Numerical experiments in <ref type="bibr">[50]</ref> and <ref type="bibr">[25]</ref> indicated state-ofthe-art accuracy of TME compared to other RSR algorithms in various settings. The computational complexity of TME is of order O(K(ND 2 +D 3 )), where K is the number of iterations. On the other hand, the cost of faster RSR algorithms is of order O(KNDd) <ref type="bibr">[24,</ref><ref type="bibr">25,</ref><ref type="bibr">33]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Motivation and Formulation of STE</head><p>We aim to use more cleverly the d-subspace information within the TME framework to form an RSR algorithm, instead of first estimating the full covariance. By using less degrees of freedom we can obtain a more accurate subspace estimator, especially when the fraction of outliers can be large. Furthermore, our idea allows us to improve the computational cost to become state-of-the-art for high-dimensional settings.</p><p>Many RSR algorithms can be formulated as minimizing a best orthogonal projector onto a d-subspace <ref type="bibr">[24,</ref><ref type="bibr">25,</ref><ref type="bibr">27,</ref><ref type="bibr">33,</ref><ref type="bibr">51]</ref>. We are going to do something similar, but unlike using an orthogonal projector, we will still use information from TME to get the shape of the data on the projected subspace. We will make the rest of the eigenvalues (i.e., bottom D d ones) equal and shrink them by a parameter 0&lt; &lt;1. We thus use a regularized version of a reduced-dimension covariance matrix. This parameter plays a role in our theoretical estimates. Making smaller helps with better subspace recovery, whereas making bigger enhances the well-conditioning of the estimator. Following these basic ideas, we formulate our method, STE. For simplicity, we utilize covariance matrices and their inverses. Since these covariance matrices are essentially d-dimensional and include an additional simple regularizing component, our overall computations can be expressed in terms of the computation of the top d singular values of an N &#8677;D matrix (see &#167;2.4).</p><p>At iteration k we follow a similar step to that of TME:</p><p>We compute the eigenvalues { i } D i=1 of Z (k) and replace each of the bottom (D d) of them with &#8226; d+1,D , where</p><p>(</p><p>We also compute the eigenvectors of Z (k) and form the matrix &#8963; (k) with the same eigenvectors as those of Z (k) and the modified eigenvalues, scaled to have trace 1. We iteratively repeat this procedure until the two estimators are sufficiently close. Algorithm 1 summarizes this procedure. Note that it is invariant to scaling of the data, similarly to TME.</p><p>Algorithm 1 STE: Subspace-Constrained Tyler's Estimator</p><p>1: Input: X=[x 1 ,...,x N ]2R D&#8677;N : centered data matrix, d: subspace dimension, K: maximum number of iterations, &#8999;, : parameters. 2: Output:</p><p>, where e X = [(w</p><p>With some abuse of notation we denote by 1 , ... , D the eigenvalues of &#8963; (k 1) (and not &#8963; (k) ). Since they are scaled to have trace 1, d+1,D = (1 P d j=1 j )/(D d). We thus only need the top d eigenvectors and top d eigenvalues of &#8963; (k 1) to update e w (k)</p><p>i . Therefore, the complexity of STE can be of order O(KNDd) if a special fast algorithm is utilized for computing only the top d eigenvectors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">Implementation Details</head><p>STE depends on the parameters K, &#8999; and and the initialization of &#8963; (0) . The first two parameters are rather standard in iterative procedures and do not raise any concern.</p><p>Our theory sheds some light on possible choices of and in particular it indicates that the algorithm can be more sensitive to choices of when the quantity defined later in (3) is relatively small. In this case, it may be beneficial to try several values of . We propose here a constructive way of doing it. We first form a sequence of 0&lt; &#63743;1, e.g., k =1/k, k =1,...,m. In order to determine the best choice of , we compute the distance of each data point x to each subspace L k , corresponding to the choice of k , where dist(x,L k )=kx P L k xk. We set set a threshold &#8675;, obtained by the median among all points and all subspaces and for each subspace, L k , we count the number of the inliers with distance below this threshold. The best k is determined according to the subspace yielding the largest number of inliers. We describe this procedure in Algorithm 2.</p><p>For simplicity, we initialize with &#8963; (0) =I D /D and note that with this choice &#8963; (1) reflects the trimmed covariance matrix and thus reflects the PCA subspace. One can also initialize with TME or other subspaces (see &#167;3 where the theory of STE is discussed). One can further try several initialization (with possible random components) and use a strategy similar to Algorithm 2 to choose the best one.</p><p>At last, we remark that when computing Z (k) we want to ensure that x &gt; i (&#8963; (k 1) ) 1 x i cannot be zero and we thus add the arbitrarily small number 10 15 to this value.</p><p>Algorithm 2 Estimating best for STE 1: Input: X=[x 1 ,...,x N ]2R D&#8677;N : centered data matrix, d: subspace dimension, { 1 ,..., m }: a set of pre-selected 's. 2: Output: &#8676; : optimal among { 1 ,..., m } 3: for j =1,2,...,m do 4:</p><p>6: end for 7: Set &#8675; =median({D (1) ,. . . ,D (m) }) 8: j &#8676; =argmax 1&#63743;j&#63743;m |D (j) &lt;&#8675;| 9: &#8676; = j &#8676; 2.6. STE fuses TME and a Variant of FMS STE is formally similar to both TME and FMS. Indeed, at each iteration these algorithms essentially compute k) . We summarize the formal weights for FMS (with any choice of p for minimizing an `p energy in <ref type="bibr">[24]</ref>), TME and STE. We ignore an additional scaling constant for TME and STE, obtained by dividing w i x i x &gt; i above by its trace, and a regularization parameter for FMS. We express these formulas using the eigenvalues 1 , ... , D and eigenvectors u 1 , ... , u D of the weighted sample covariance, P N i=1 w i x i x &gt; i for each method and := &#8226; d+1,D (see (2)) as follows:</p><p>These weights aim to mitigate the impact of outliers in different ways. For FMS, P D j=d+1 (x &gt; i u j ) 2 is the squared distance of a data point x i to the subspace L. Thus for p &lt; 2, w FMS i is smaller for "subspace-outliers", where the robustness to such outliers increases when p 0 decreases.</p><p>The weights of TME are inversely proportional to the squared Mahalanobis distance of x i to the empirical distribution. They mitigate the effect of "covariance-outliers". If the dataset is concentrated on a k-subspace where k &lt; d, then TME can provide smaller weights to points lying away from this subspace, unlike FMS that does not distinguish between points within the larger d-subspace.</p><p>We note that the weights of STE fuse the above two weights. Within a d-subspace, they use the shape of the data. They can thus avoid outliers within this d-subspace. Within the orthogonal component of this subspace, they use a term proportional to that of FMS with p=0. We remark that such `0 minimization has a clear interpretation for RSR, though is generally hard to guarantee. Indeed, <ref type="bibr">[24]</ref> has no guarantees for FMS with p=0. It can also yield unwanted spurious stationary points <ref type="bibr">[26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Theory</head><p>We review a theoretical guarantee for STE, whose proof is given in <ref type="bibr">[28]</ref>. It requires some conditions and we verify they hold with high probability under the asymptotic generalized haystack model. We assume a noiseless inliers-outliers RSR model. Let L &#8676; denote the underlying d-subspace in R D , X in =X \L &#8676; and X out =X \X in be the set of inliers and outliers, respectively, and n 1 =|X in | and n 0 =|X out | be the number of inliers and outliers. Our first assumption is a mild one on how well-conditioned the inliers are in L &#8676; (compare e.g., other assumptions in <ref type="bibr">[25,</ref><ref type="bibr">32]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Motivation for Assumption 2:</head><p>The ratio of inliers per outliers, n 1 /n 0 , in RSR is often referred to as the SNR (signal-to-noise ratio) <ref type="bibr">[25,</ref><ref type="bibr">32,</ref><ref type="bibr">33]</ref>. The smaller it is, the best the subspace recovery is. We define the dimension-scaled SNR (DS-SNR) as the SNR obtained when scaling n 1 and n 0 by their respective dimensions (of L &#8676; and L ? &#8676; ):</p><p>Zhang <ref type="bibr">[50]</ref> showed that exact recovery by TME is guaranteed whenever DS-SNR&gt;1 (assuming general position assumptions on the inliers and outliers) and Hardt and Moitra <ref type="bibr">[16]</ref> showed that when considering general datasets with general position assumptions on the inliers and outliers, the RSR problem is SSE hard if the DS-SNR is lower than 1. We aim to show that under the following weaker generic condition, STE can obtain exact recovery with DS-SNR, strictly lower than 1.</p><p>Assumption 2: DS-SNR&gt; , where &lt;1 is the STE parameter.</p><p>Our last assumption requires a sufficiently good initialization for STE, but also implicitly involves additional hidden assumptions on the inliers and outliers. This is expected, since Assumption 1 does not require anything from the outliers and also has a very weak requirement from the inliers. To formulate the new assumption we define below some some basic condition numbers for good initialization (which are more complicated than the one for initialization by PCA suggested by <ref type="bibr">[33]</ref> and <ref type="bibr">[32]</ref>) and also quantities similar to the ones used to guarantee landscape stability in the theory of RSR <ref type="bibr">[25,</ref><ref type="bibr">27,</ref><ref type="bibr">33,</ref><ref type="bibr">51]</ref>.</p><p>Definitions required for Assumption 3: Recall that &#8963; (0) denotes the initial value in Algorithm 1, and denote</p><p>We define the following condition number</p><p>To get a better intuition to this primary quantity of Assumption 3, we first express the initial estimator &#8963; (0) , using basis vectors for L &#8676; and L ? &#8676; , as a 2&#8677;2 block matrix</p><p>&#8676; ,L&#8676; , we decompose this block matrix as</p><p>We note that the numerator of &#63743; 1 is the d-th eigenvalue of the second matrix in the above sum. We show in <ref type="bibr">[28]</ref> that this eigenvalue is positive if &#8963; (0) is positive definite, which can be easily enforced. The condition number is thus the ratio between the smallest positive eigenvalue of the second matrix of the sum and the largest eigenvalue of the component of the first matrix associated with L ? &#8676; . Therefore, &#63743; 1 expresses a ratio between a quantifier of a d-dimensional component of &#8963; (0) , associated with L &#8676; , and a quantifier of the projection onto L ? &#8676; of a full rank component of &#8963; (0) .</p><p>We also define &#8963; in,&#8676; as the TME solution to the set of the projected inliers {U L &#8676; x|x2X in }&#8674;R d and the following two condition numbers</p><p>.</p><p>We note that &#63743; in is analogous to the condition number in (25) of <ref type="bibr">[32]</ref>, where we replace the sample covariance by the TME estimator. An analog to the alignment of outliers statistic <ref type="bibr">[27,</ref><ref type="bibr">33]</ref> for STE is</p><p>An analog to the stability statistic <ref type="bibr">[27,</ref><ref type="bibr">33]</ref> for STE is</p><p>where d+1,D (X) was defined in (2).</p><p>Assumption 3: There exists C =C( ,DS-SNR)&gt;0 such that</p><p>The exact technical requirement on C is specified in <ref type="bibr">[28]</ref>. In general, the larger the RHS of ( <ref type="formula">4</ref>), the more restricted the choice of &#8963; (0) is. In particular, when &#63743; 1 =1, the definition of &#63743; 1 implies that Im(&#8963; (0) )=L &#8676; , so the subspace is already recovered by the initial estimate. Therefore, reducing the lower bound of &#63743; 1 may allow some flexibility, so a marginally suboptimal initialization could still work out. In <ref type="bibr">[28]</ref>, we show that under the asymptotic generalized haystack model, Assumption 3 can be interpreted as an upper bound on the largest principal angle between the initial and ground truth subspaces.</p><p>Generic Theory: The next theorem suggests that under assumptions 1-3, STE nicely converges to an estimator that recovers L &#8676; . The main significance of this theory is that its assumptions can allow DS-SNR lower than 1 for special instances of datasets (for which the assumptions hold), unlike the general recovery theories of <ref type="bibr">[16]</ref> and <ref type="bibr">[50]</ref>.</p><p>Theorem 1. Under assumptions 1-3, the sequence &#8963; (k)  generated by STE converges to U L&#8676; &#8963; in,&#8676; U &gt; L&#8676; , the TME solution for the set of inliers X in . In addition, let L (k)  be the subspace spanned by the top d eigenvectors of &#8963; (k)  , then the angle between L (k)  and L &#8676; , \(L (k) , L &#8676; ) = cos 1 (kU &gt; L (k) U L&#8676; k), converges r-linearly to zero.</p><p>We discuss insights of this theory on choices of the algorithms and further verify the above stated advantage of STE over TME assuming a common probabilistic model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Choice of for subspace recovery:</head><p>In order to avoid too large lower bound for &#63743; 1 in (4), which we motivated above, it is good to find &#9999; 1 and &#9999; 2 &gt;0, such that lies in (&#9999; 1 ,DS-SNR &#9999; 2 ) (to notice this, observe the terms involving in the denominators of the last two additive terms in ( <ref type="formula">4</ref>)). We thus note that if the DS-SNR is expected to be sufficiently larger than 1, we can use, e.g., = 0.5, but when the DS-SNR can be close to 1 or lower (e.g., in fundamental matrix estimation), it is advisable to choose small values of according to Algorithm 2 and their sizes may depend on the expected value of the DS-SNR.</p><p>Possible ways of Initialization: If one expects an initial estimated subspace L to have a sufficiently small angle &#10003; with L &#8676; , where &#10003; = \( L,L &#8676; ), then for &#8963; (0) := &#8679; L +&#9999;I it can be shown that &#63743; 1 &gt;O(1/(&#9999;+&#10003;)) and &#63743; 2 &lt;O(1+ &#10003; &#9999; ). Thus one may use a trusted RSR method, e.g., FMS. As discussed in &#167;2.5, the choice &#8963; (0) =I (or a scaled version of it) corresponds to L being the PCA subspace (obtained at iteration 1). Also, using the TME solution for &#8963; (0) corresponds to using the TME subspace as L.</p><p>Theory under a probabilistic model: We show that under a common probabilistic model, the assumptions of Theorem 1, where &#8963; (0) is obtained by TME, hold. Moreover, we show that STE (initialized by TME) can recover the correct subspace in situations with DS-SNR&lt; 1, whereas TME cannot recover the underlying subspace in such cases. We follow <ref type="bibr">[33]</ref> and study the Generalized Haystack Model, though for simplicity, we assume Gaussian instead of sub-Gaussian distributions and an asymptotic setting. We assume n 1 inliers i.i.d. sampled from a Gaussian distribution N(0,&#8963; (in)  /d), where &#8963; (in) 2 S + (D) and L &#8676; = Im(&#8963; (in) ) (so &#8963; (in) has d nonzero eigenvalues), and n 0 outliers are i.i.d. sampled from a Gaussian distribution N(0,&#8963; (out)  /D), where &#8963; (out) /D 2 S ++ (D). We define the following condition numbers of inliers (in L &#8676; ) and outliers:</p><p>and</p><p>.</p><p>Clearly, Assumption 1 holds under this model, and Assumption 2 constrains some of its parameters. Our next theorem shows that Assumption 3 holds under this model when the initial estimate &#8963; (out) for STE is obtained by TME. It also shows that in this case STE can solve the RSR problem even when DS-SNR&lt;1, unlike TME. For simplicity, we formulate the theory for the asymptotic case, where N ! 1 and the theorem holds almost surely. It is possible to formulate it for a very large N with high probability, but it requires stating complicated constants depending on various parameters. Theorem 2. Assume data generated from the above generalized haystack model. Assume further that for 0 &lt; &#181; &lt; 1, which can be arbitrarily small, d &lt; (1 &#181;)D 2. Then, for any chosen 0 &lt; c 0 &lt; 1, which is a lower bound for , there exists &#8984; :=&#8984;(&#63743; in ,&#63743; out ,c 0 ,&#181;)&lt;1 such that if DS-SNR &#8984; and &#8963; (0) is obtained by TME, then Assumption 3 for &#8963; (0)  is satisfied with c 0 &lt; &lt; &#8984; c 0 almost surely as N ! 1. Consequently, the output of the STE algorithm, initialized by TME and with the choice of c 0 &lt; &lt; &#8984; c 0 , recovers L &#8676; . On the other hand, if &#8963; There are three different regimes that the theorem covers. When DS-SNR 1, both TME+STE (i.e., STE initialized by TME) and TME solve the RSR problem. When &#8984; &#63743;DS-SNR&lt; 1, TME+STE solves the RSR problem and TME generally fails. When &#63743;DS-SNR&lt;&#8984;, TME+STE might also fail, but STE with extremely good initialization (that satisfies Assumption 3) can still solve the problem.</p><p>To get a basic idea of the dependence of &#8984; on its parameters, we remark that &#8984; ! 1 if either c 0 ! 0, &#63743; in ! 1, &#63743; out ! 1 or &#181; ! 0, where the parameter &#181; is somewhat artificial and might be removed with a tighter proof. Therefore, successful performance of TME+STE requires a DS-SNR that is close to 1 when is close to either 0 or &#8984; (so that c 0 is very small) or when either the inlier or outlier distribution is highly non-symmetric, that is, either &#63743; in or &#63743; out is large.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Applications to Structure from Motion</head><p>We apply STE to problems relevant to SfM: robust estimation of fundamental matrices (see &#167;4.1), and initial screening of undesirable cameras (see &#167;4.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Robust Fundamental Matrix Estimation</head><p>Fundamental matrix estimation from noisy and inexact keypoint matches is a core computer vision problem. It provides a challenging setting for applying RSR methods.</p><p>We review this setting as follows. Let (x,x 0 ) 2 R 3 &#8677; R 3 be a correspondence pair of two points in different images that are projections of the same 3D point in the scene, where x and x 0 are expressed by homogeneous coordinates of planar points. The fundamental matrix F2R 3&#8677;3 relates these corresponding points and the epipolar lines they lie on as follows: x 0&gt; Fx=0 <ref type="bibr">[17]</ref>, or equivalently,</p><p>where vec(&#8226;) denotes the vectorized form of a matrix. Therefore, ideally, the set of all vectors in R 9 of the form vec(xx 0&gt; ), where (x,x 0 )2R 3 &#8677;R 3 is a correspondence pair, lies on an 8-subspace in R 9 and its orthogonal complement yields the fundamental matrix. In practice, the measurements of correspondence pairs can be highly corrupted due to poor matching. Moreover, some choices of correspondence pairs and the corruption mechanism may lead to concentration on low-dimensional subspaces within the desired 8-subspace. Furthermore, the corruption mechanism can lead to nontrivial settings of outliers. Lastly, since d=8 and D =9, the theoretical threshold of <ref type="bibr">[16]</ref> translates to having the fraction of inliers among all data points be at least 8/9&#8673;88.9%. Therefore, this application is often a very challenging setting for direct RSR methods. The best performing RSR methods to date for fundamental matrix estimation are variants of RANSAC <ref type="bibr">[10]</ref>. RANSAC avoids any subspace-modeling assumptions, but estimates the susbspace based on testing myriads of samples, each having 7 or 8 point correspondences <ref type="bibr">[17]</ref>.</p><p>We test the performance of STE in estimating the fundamental matrix on the Photo Tourism database <ref type="bibr">[41]</ref>, where the image correspondences are obtained by SIFT feature similarities <ref type="bibr">[30]</ref>. We compare STE with the following 3 top RSR performers according to <ref type="bibr">[25]</ref>: FMS <ref type="bibr">[24]</ref>, spherical FMS (SFMS) <ref type="bibr">[24]</ref> and TME <ref type="bibr">[47,</ref><ref type="bibr">50]</ref>. We also compared with vanilla RANSAC <ref type="bibr">[10]</ref> and two of its specialized extensions, which are state-of-the-art performers for estimating fundamental matrices: locally optimized RANSAC (LO-RANSAC) <ref type="bibr">[6]</ref> and degeneracy-check enabled RANSAC (DEGENSAC) <ref type="bibr">[7]</ref>. For the RSR methods we used codes from the supplementary material of <ref type="bibr">[25]</ref>  with their default options. We further used the Python package pydegensac for implementing LO-RANSAC and DEGENSAC with the inlier threshold &#8984; = 0.75. For STE, we used Algorithm 2 to estimate the best from {(2i) 1 } 5 i=1 . We measure the accuracy of the results according to the median and mean errors of relative rotation and direction vectors directly obtained by the fundamental matrices for each method. For computing these errors, we compared with ground-truth values provided by <ref type="bibr">[41,</ref><ref type="bibr">48]</ref>. Figure <ref type="figure">1</ref> describes the result of the mean errors for relative rotation per dataset of Photo Tourism, where the other three errors and mAA <ref type="bibr">(10 )</ref> are in the supplemental material. STE is significantly better than top RSR performers (TME, FMS and SFMS). Overall, it appears that STE performs better than vanilla RANSAC, except for the Ellis Island and Vienna Cathedral datasets, where RANSAC outperforms STE. STE is still competitive when compared with LO-RANSAC and DEGENSAC, except for Notre Dame and the latter two datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Initial Camera Removal for SfM</head><p>We propose a novel application of RSR for SfM and test STE for this application. Even though our framework is not sufficiently practical at this point, it allows testing STE in a different setting where N = D is very large and d =6. Our idea is to use RSR within the SfM pipeline right after estimating the fundamental matrices, in order to remove some cameras that result in inaccurate estimated fundamental matrices. The hope is that eventually such methods may reduce corruption and speed up the costly later computationally intensive stages of the global SfM pipeline.</p><p>There are two main reasons to question such a process. One may first question the gain in improving accuracy. Indeed, since the rest of the pipeline already identifies corrupted pairwise measurements, this process may not improve accuracy and may even harm it as it removes whole cameras and not pairs of cameras. That is, it is possible that a camera, which results in bad pairwise measurement, also contributes to some other accurate pairwise estimates that can improve the overall accuracy. The second concern is in terms of speed. In general, the removal of cameras may result in higher or comparable speed. Indeed, the LUD global pipeline <ref type="bibr">[36]</ref>, which we follow, examines the parallel rigidity of the viewing graph and extracts the maximal parallel rigid subgraph. Thus earlier removal of cameras may worsen the parallel rigidity of the graph and increase the computation due to the need of finding a maximal parallel rigid subgraph. For example, <ref type="bibr">[40]</ref> removes cameras in an earlier stage of the LUD pipeline, but results in higher computational cost than the LUD pipeline. Therefore, improvement of speed for the LUD pipeline by removing cameras is generally non-trivial. Moreover, currently we use scale factors obtained by first running LUD, so we do not get a real speed improvement. Nevertheless, the proposed method is insightful whenever it may indicate clear improvement in accuracy for a dataset, since one can then infer that the current pipeline is not effective enough in handling corrupted measurements, which can be easily recognized by a simple method. Furthermore, improvement in "speed" can be indicative of maintaining parallel rigidity.</p><p>Our RSR formulation is based on a fundamental observation by Sengupta et al. <ref type="bibr">[39]</ref> on the low-rank of the n-view essential (or fundamental) matrix. The n-view essential matrix E of size 3n&#8677;3n is formed by stacking all n 2 essential matrices, while being appropriately scaled. That is, the ij-th block of E is the essential matrix for the i-th and j-th cameras, where each E ij is scaled by a factor ij in accordance with the global coordinate system (see <ref type="bibr">[20,</ref><ref type="bibr">21,</ref><ref type="bibr">39]</ref>). It was noticed in <ref type="bibr">[39]</ref> that E has rank 6. Moreover, <ref type="bibr">[39]</ref> characterized the set of n-view essential matrices whose camera centers are not all collinear by the satisfaction of a few algebraic conditions, where the major one is rank(E)=6. Further explanation appears in <ref type="bibr">[20]</ref>.</p><p>We propose a straightforward application of RSR, utilizing these ideas to initially eliminate cameras that introduce significant corruption to the essential matrices. For this purpose, we compute the essential matrices (by computing first the fundamental matrices and then using the known camera calibration) and scale each matrix according to the factor obtained by the LUD pipline <ref type="bibr">[36]</ref> (note that this is the initial scaling applied in <ref type="bibr">[20,</ref><ref type="bibr">21,</ref><ref type="bibr">39]</ref> before applying a non-convex and nontrivial optimization procedure that refines such scales). Using these appropriately scaled essential matrices, we form the n-view essential matrix E of size 3n&#8677;3n. We denote the 3n&#8677;3 column blocks of E by E :,1 , ..., E :,n (since E is symmetric they are the same as the row blocks transposed). We treat E as a data matrix with D =N =3n, where the columns of E are the data points. We apply RSR with d=6, recover a d-dimensional robust subspace and identify the outlying columns whose distance is largest from this subspace. To avoid heuristic methods for the cutoff of outliers we assume a fixed percentage of 20% outlying columns. If a column block, E :,i contains an outlying column, we remove its corresponding camera i. Consequently, a smaller percentage of cameras (about 10 15%) will be eliminated.</p><p>We use the Photo Tourism database <ref type="bibr">[41]</ref> with precomputed pairwise image correspondences provided by <ref type="bibr">[39]</ref> (they were obtained by thresholding SIFT feature similarities). To compute scale factors for the essential matrices we use the output of the LUD pipeline <ref type="bibr">[36]</ref> as follows (following an idea proposed in <ref type="bibr">[39]</ref> for initializing these values): Given the essential matrix for cameras i and j computed at an early stage of our pipeline, E ij , and the one obtained by the full LUD pipeline, E LUD ij , the scaling factor is</p><p>. Since many values of E ij are missing, we also apply matrix completion.</p><p>We compare the LUD pipeline with the LUD pipeline combined with the filtering processes achieved by STE, FMS, SFMS, and TME. For STE we fix =1/3, though any other value we tried yielded the same result. We report both mean and median errors of rotations and translations and runtime of the standard LUD and the RSR+LUD methods with initial screening of cameras. Figure <ref type="figure">2</ref> shows the mean rotation and translation errors, where the rest of the figures and a summarizing table are in the supplementary material. In general, STE demonstrates slightly higher accuracy compared to other RSR methods. Improved accuracy is particularly notable when matrix completion is not utilized, as demonstrated in the supplementary material. We observe that LUD+STE generally improves the estimation of camera parameters (both rotations and translations) over LUD.</p><p>The improvement of LUD+STE is noticeable in Roman Forum and Gendarmenmarkt. In the supplementary material we show further improvement for Gendarmenmarkt with the removal of 45% outlying columns. While the resulting errors are still large, their improvement shows some potential in dealing with difficult SfM structure by initially removing cameras in a way that may help eliminate some scene ambiguities, which are prevalent in Gendarmenmarkt. In terms of runtime, both LUD+STE and LUD+SFMS demonstrate significant improvements, where LUD+SFMS is even faster than LUD+STE. While this does not yet imply faster handling of the datasets (as we use initial scaling factors obtained by LUD), it indicates the efficiency of the removal of outliers in maintaining parallel rigidity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>We introduce STE, a meticulously crafted adaptation of TME designed to address challenges within RSR. Theoretical guarantees demonstrate its ability to recover the true underlying subspace reliably, even with a smaller fraction of inliers compared to the well-known theoretical threshold. Under the generalized haystack model, we show that this initialization can be chosen as TME itself, leading to improved handling of a smaller fraction of inliers compared to TME. Our exploration extends to practical applications, where STE proves effective in two 3D vision tasks: robust fundamental matrix estimation and screening of bad cameras for improved SfM.</p><p>Several avenues for future research include: &#8226; Exploring adaptations of other robust covariance estimation methods to RSR. &#8226; Studying effective initialization for STE both in theory and in practice. &#8226; In-depth theoretical exploration of the optimal choice of the parameter . &#8226; Study of alternative ways of adapting TME to RSR problems. &#8226; Improving STE for fundamental matrix estimation following ideas similar to those in <ref type="bibr">[7,</ref><ref type="bibr">12,</ref><ref type="bibr">37]</ref> for addressing challenging degeneracies. &#8226; Enhancing our initial idea of initial removal of bad cameras, specifically attempting to use it to rectify challenging scene ambiguities. &#8226; Testing our methods for SfM using more recent feature matching algorithms.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: University Of Minnesota Duluth. Downloaded on December 28,2024 at 20:10:48 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
