<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Alternating minimization algorithm for unlabeled sensing and linked linear regression</title></titleStmt>
			<publicationStmt>
				<publisher>Elsevier</publisher>
				<date>02/04/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10638658</idno>
					<idno type="doi"></idno>
					<title level='j'>Signal processing</title>
<idno>0165-1684</idno>
<biblScope unit="volume">232</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Ahmed Abbasi</author><author>Shuchin Aeron</author><author>Abiy Tasissa</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Unlabeled sensing is a linear inverse problem with permuted measurements. We propose an alternating minimization (AltMin) algorithm with a suitable initialization for two widely considered permutation models: partially shuffled/k-sparse permutations and r-local/block diagonal permutations. Key to the performance of the AltMin algorithm is the initialization. For the exact unlabeled sensing problem, assuming either a Gaussian measurement matrix or a sub-Gaussian signal, we bound the initialization error in terms of the number of blocks and the number of shuffles. Experimental results show that our algorithm is fast, applicable to both permutation models, and robust to choice of measurement matrix. We also test our algorithm on several real datasets for the ‘linked linear regression’ problem and show superior performance compared to baseline methods.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The linear inverse problem is given by &#119832; = &#119809;&#119831; * + &#119830;, where &#119809; &#8712; R &#119899;&#215;&#119889; is the measurement matrix, &#119832; &#8712; R &#119899;&#215;&#119898; represents the linear measurements of the unknown &#119831; * &#8712; R &#119889;&#215;&#119898; , and &#119830; &#119894;&#119895; &#8764; &#57914; (0, &#120590; 2 ) denotes i.i.d. Gaussian noise. In unlabeled sensing, the measurements &#119832; are scrambled. Specifically,</p><p>where &#119823; * &#8712; R &#119899;&#215;&#119899; is the unknown permutation matrix. Given &#119832; and &#119809;, the objective is to estimate &#119831; * . In this manuscript, &#119831; * and &#119823; * denote the underlying unknown parameters of the unlabeled sensing problem.</p><p>If there is no noise, i.e., &#119830; = &#120782;, we refer to <ref type="bibr">(1)</ref> as the exact unlabeled sensing problem. Since estimating &#119831; * and &#119823; * for a generic permutation is challenging, several works <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref> assume a &#119896;-sparse or partially shuffled permutation model, Fig. <ref type="figure">1</ref>. A permutation matrix &#119823; * &#119896; is &#119896;-sparse if it has &#119896; offdiagonal elements, i.e., &#10216;&#119816;, &#119823; * &#119896; &#10217; = &#119899; -&#119896;, where &#10216;&#8901;, &#8901;&#10217; denotes the trace inner product. The &#119903;-local, or block diagonal, permutation model was first proposed in <ref type="bibr">[7]</ref> and later considered in <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>. A permutation matrix &#119823; * &#119903; is &#119903;-local with &#119904; blocks if &#119823; * &#119903; = blkdiag(&#119823; 1 , &#8230; , &#119823; &#119904; ), where blkdiag(&#8901;) denotes a block diagonal matrix. Fig. <ref type="figure">1</ref> illustrates the &#119896;-sparse and &#119903;-local permutation models. These two models are compared by information-theoretic inachievability in <ref type="bibr">[9]</ref>.</p><p>In <ref type="bibr">[9]</ref>, an alternating minimization algorithm was proposed for the unlabeled sensing problem. The algorithm estimates &#119823; * and &#119831; * by minimizing the following optimization program:</p><p>( P, X) = argmin &#119823;&#8712;&#120561; &#119899; ,&#119831; &#119865; (&#119823;, &#119831;) = &#8214;&#119832; -&#119823;&#119809;&#119831;&#8214; 2  &#119865; , <ref type="bibr">(2)</ref> where &#120561; &#119899; denotes the set of &#119899; &#215; &#119899; permutations and &#8214; &#8901; &#8214; 2 &#119865; denotes the squared Frobenius norm of a matrix, which is the sum of the squares of its entries. Given that (2) is a non-convex optimization problem, a crucial part of the alternating minimization algorithm in <ref type="bibr">[9]</ref> is the initialization. When the unknown permutation &#119823; * &#119903; is &#119903;-local, <ref type="bibr">[9]</ref> proposed the collapsed initialization <ref type="bibr">(3)</ref>. Let &#119823; &#119894; denote each &#119903; &#119894; &#215; &#119903; &#119894; block of &#119823; * &#119903; = blkdiag(&#119823; 1 , &#8230; , &#119823; &#119904; ). Let &#119809; &#119894; &#8712; R &#119903; &#119894; &#215;&#119889; denote blocks of the measurement matrix &#119809; = [&#119809; 1 ; &#8943; ; &#119809; &#119904; ], where ; denotes vertical concatenation. Then, each block of measurements &#119832; &#119894; is expressed as &#119832; &#119894; = &#119823; &#119894; &#119809; &#119894; &#119831; * , &#119894; &#8712; [&#119904;]. The shuffled measurements in each block [&#119823; &#119894; &#119809; &#119894; &#8758; &#119832; &#119894; ] are summed to extract &#119904; unshuffled measurements [&#120783; &#8868; &#119903; &#119894;</p><p>where &#120783; &#119903; &#119894; &#8712; R &#119903; &#119894; denotes the vector whose entries are all 1. These &#119904; measurements are then represented compactly in the collapsed linear system of equations B&#119831; * = &#7928;, where B &#8712; R &#119904;&#215;&#119889; and &#7928; &#8712; R &#119904;&#215;&#119898; . The initialization for &#119903;-local</p><p>&#119903; is the minimum-norm solution to the collapsed system and is given by:</p><p><ref type="url">https://doi.org/10.1016/j.sigpro.2025.109927</ref> Received 23 July 2024; Received in revised form 11 December 2024; Accepted 27 January 2025 Left. Sparse (or partially shuffled) permutation considered in <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref>, with number of shuffles &#119896; = 10. Right. The &#119903;-local permutation structure considered in <ref type="bibr">[7,</ref><ref type="bibr">9,</ref><ref type="bibr">10]</ref>, with block size &#119903; = 10. In this paper, we propose a general algorithm for both permutation models.</p><p>where X(0</p><p>For the &#119896;-sparse problem, in this manuscript, we propose the following initialization:</p><p>Note that the above initialization corresponds to the least square solution of (1) when &#119823; * is the identity matrix. The initialization is motivated by the observation that, since &#119823; * &#119896; has &#119896; off-diagonal elements, the identity matrix serves as a simple approximation of &#119823; * &#119896; . The main goal of this manuscript is to characterize the effectiveness of the initializations &#119831; (0)  &#119903; and &#119831; (0) &#119896; by upper bounding the initialization error for the exact unlabeled sensing problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Contributions and outline</head><p>This paper presents a theoretical analysis of the initialization for the exact unlabeled sensing problem under two permutation models. The key contributions of this manuscript are summarized as follows:</p><p>1. Assuming &#119809; is Gaussian, we show that the relative error</p><p>critically depends on the parameter &#119889; -&#119904;. In particular, Theorem 4.1 establishes that the probability that the relative error exceeds &#8730; 1 -&#119904;&#8725;&#119889; decays at a sub-Gaussian rate.</p><p>2. Assuming &#119857; * is sub-Gaussian, we focus on the error</p><p>, which also depends critically on the parameter &#119889; -&#119904;. In particular, Theorem 4.2 in this work shows that the probability that the error exceeds &#119874; ( &#8730; &#119889; -&#119904; ) also decays at a sub-Gaussian rate. 3. For the &#119896;-sparse problem, we analyze the relative error</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119865;</head><p>. Assuming &#119809; is Gaussian and sufficiently tall (see Definition 4.4), we show that the relative error depends on the parameter &#119896;&#8725;&#119899;. Specifically, depending on the problem parameters, Theorem 4.5 shows that the probability of the error exceeding &#119874;(&#119896;&#8725;&#119899;) decays at a rate ranging from inverse linear to exponential. 4. We apply the alternating minimization algorithm in <ref type="bibr">[9]</ref> with the initializations (3) and ( <ref type="formula">4</ref>). Experiments results on real datasets show superior performance compared to baseline methods.</p><p>Outline. The outline of the paper is as follows. Section 2 discusses related work. Section 3 presents the alternating minimization algorithms for the &#119903;-local and &#119896;-sparse permutation models. Section 4 covers the initialization analysis. Experimental results on real datasets showing superior performance compared to baseline methods are given in Section 5. We conclude in Section 6.</p><p>Table 1</p><p>Table summarizing the frequently used notation used in this paper. See Fig. 1 for definition and examples of &#119903;-local &#119823; * &#119903; and &#119896;-sparse &#119823; * &#119896; permutations. The superscript * in &#119823; * , &#119823; * &#119903; , &#119823; * &#119896; , &#119831; * denotes that these are unknown problem parameters which are estimated by the proposed algorithm. Symbol Description &#119809; &#8712; R &#119899;&#215;&#119889; Measurement matrix &#119831; * &#8712; R &#119889;&#215;&#119898; Unknown matrix &#119857; * &#119895; &#8712; R &#119889; &#119895;-th column of &#119831; * , &#8704;&#119895; &#8712; [&#119898;] &#119823; * &#8712; {0, 1} &#119899;&#215;&#119899; Unknown permutation matrix &#119823; * &#119896; &#8712; R &#119899;&#215;&#119899; Unknown &#119896;-sparse permutation matrix &#119896; Number of shuffles &#119896; s.t. &#10216;&#119823; * &#119896; , &#119816;&#10217; = &#119899; -&#119896; &#119823; * &#119903; &#8712; {0, 1} &#119899;&#215;&#119899; Unknown &#119903;-local permutation matrix &#119903; &#119894; &#8704; &#119894; &#8712; [&#119904;] Block sizes of &#119823; * &#119903; s.t. &#119823; * &#119903; = blk diag (&#119823; * 1 , &#8230; , &#119823; * &#119904; ) &#120561; &#119903; &#119894; The set of &#119903; &#119894; &#215; &#119903; &#119894; permutation matrices. &#119904; Number of blocks in &#119903;-local &#119823; * &#119903; [&#119904;] Integers {1, &#8230; , &#119904;}</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Notation</head><p>Boldface lowercase letters denote column vectors and boldface uppercase letters denote matrices. The transpose of a vector &#119857; is noted by &#119857; &#8868; . For a matrix &#119831;, its transpose is denoted by &#119831; &#8868; . &#119808; &#8224; denotes the Moore-Penrose pseudoinverse of &#119808;. Tr (&#8901;) denotes the trace of a matrix. Given two matrices &#119808; and &#119809;, &#10216;&#119808; , &#119809;&#10217; denotes the trace inner product, defined as Tr (&#119808; &#8868; &#119809;). Given a vector &#119857;, the &#119894;-th entry of &#119857; is denoted by &#119909; &#119894; . [&#119808;; &#119809;] denotes the vertical concatenation of matrices &#119808;, &#119809;. &#8214; &#8901; &#8214; &#119901; denotes the vector &#119901;-norm. &#8214; &#8901; &#8214; &#119865; and &#8214; &#8901; &#8214; denote the Frobenius norm and the operator norm of a matrix, respectively. &#120783; &#119899; &#8712; R &#119899; denotes the vector of all ones. &#119816; denotes the identity matrix. R denotes the set of real numbers. &#120561; &#119899; = {&#119833; &#8758; &#119833; &#8712; {0, 1} &#119899;&#215;&#119899; , &#119833;&#120783; &#119899; = &#120783; &#119899; , &#119833; &#8868; &#120783; &#119899; = &#120783; &#119899; } denotes the set of &#119899; &#215; &#119899; permutation matrices. E(&#8901;) denotes the expectation of a random variable in consideration. &#119862; , &#119862; &#8242; denote absolute constants &gt; 1, and &#119888; , &#119888; &#8242; denote absolute constants &#8804; 1. argmin denotes the set of minima of an objective function in consideration. exp(&#8901;) denotes the exponential function. We also summarize the frequently used notation in Table <ref type="table">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work</head><p>This section provides a concise overview of related work in unlabeled sensing theory and algorithms, inference problems involving unlabeled data, and applications of unlabeled sensing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Theory and algorithms</head><p>We note that the problem in (1) is referred to as the single-view (multi-view) unlabeled sensing problem for &#119898; = 1 (&#119898; &gt; 1). For the single-view problem, we represent the problem as &#119858; = &#119823; * &#119809;&#119857; * + &#119856; where &#119858;, &#119857; * , and &#119856; denote the variables analogous to &#119832;, &#119831; * and &#119830; respectively. The work in <ref type="bibr">[11]</ref> formulated the single-view unlabeled sensing problem and established that &#119899; = 2&#119889; measurements are both necessary and sufficient for the recovery of &#119857; * . Subsequent works <ref type="bibr">[12,</ref><ref type="bibr">13]</ref> generalized this result and developed an information-theoretic inachievability analysis.</p><p>For single-view unlabeled sensing, algorithms based on branch and bound and expectation maximization are proposed in <ref type="bibr">[14]</ref><ref type="bibr">[15]</ref><ref type="bibr">[16]</ref>, which are suitable for small problem sizes. A stochastic alternating minimization approach is introduced in <ref type="bibr">[17]</ref>. For multi-view unlabeled sensing, the Levsort subspace matching algorithm was proposed in <ref type="bibr">[18]</ref>, and a subspace clustering approach was presented in <ref type="bibr">[4]</ref>. The works <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr">19]</ref> propose methods based on bi-convex optimization, robust regression, and spectral initialization, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Inference on unlabeled data</head><p>The problem of unlabeled sensing has been studied in the context of estimation and detection from unlabeled samples. In <ref type="bibr">[20]</ref>, the authors consider the problem of signal amplitude estimation and detection from unlabeled quantized binary samples. In the setup they consider, the ordering of the time indexes is unknown. The work in <ref type="bibr">[20]</ref> proposes a maximum likelihood estimator to estimate the underlying signal amplitude and permutation, under certain structural assumptions on the quantization and signal profile. Furthermore, an alternating maximization algorithm is studied for the general estimation and detection problem.</p><p>In <ref type="bibr">[21]</ref>, the authors study signal recovery from unlabeled samples by considering a special case of unlabeled sensing, referred to as the unlabeled ordered sampling problem. In this problem, instead of estimating an unknown permutation matrix, the goal is to estimate an unknown selection matrix that preserves the order of the measurements. <ref type="bibr">[21]</ref> links this problem to compressive sensing and also proposes an alternating minimization algorithm for solving it. <ref type="bibr">[22]</ref> also addresses the unlabeled ordered sampling problem, but with a focus on signal detection, rather than signal recovery as in <ref type="bibr">[21]</ref>. A dynamic programming algorithm is proposed therein to estimate the selection matrix.</p><p>[23] considers the problem of signal detection, where the observations are independently drawn from a finite alphabet. Focusing on the inference problem, the work in <ref type="bibr">[23]</ref> characterizes the information available in the unordered samples by studying a binary hypothesis test. Additionally, it provides a computationally efficient detector to address the detection problem. <ref type="bibr">[24]</ref> further studies the signal detection problem in the case of binary unlabeled observations. In the low-detectability regime, <ref type="bibr">[24]</ref> gives an analytical characterization of various statistical inference quantities, such as Chernoff information. In <ref type="bibr">[25]</ref>, the signal detection framework of unlabeled sensing is applied to decision-making in sensor networks, where sensors report measurements to a central node, but the measurements are imprecise or lack labels. <ref type="bibr">[26]</ref> considers a related problem, where sensors send their measurements to a central fusion center. Due to an anomaly in one of the sensors, the measurements at the fusion center are assumed to be unordered, and <ref type="bibr">[26]</ref> explores the problem of detecting the anomaly with minimal detection delay.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Applications of unlabeled sensing</head><p>The linked linear regression problem <ref type="bibr">[27]</ref><ref type="bibr">[28]</ref><ref type="bibr">[29]</ref>, also called regression analysis with linked data, is to fit a linear model by minimizing over &#119857; and &#119823; the objective &#8214;&#119858; -&#119823;&#119809;&#119857;&#8214; 2 , where &#119823; is an &#119903;-local permutation and &#119857; is the regression vector. For example, consider the statistical model where the weight depends linearly on age and height. Let &#119858; &#8712; R &#119899; contain the weights of &#119899; = 10 individuals, 5 males, 5 females. Let &#119809; &#8712; R &#119899;&#215;&#119889; , with &#119889; = 2 contain the age and height values. Each record (weight, age, height) is known to belong to a male or a female individual. Letting the first (second) block of 5 rows of &#119858;, &#119809; contain the records of male (female) individuals, the unknown &#119903;-local permutation, &#119903; = 5, assigns each weight value (row of &#119858;) to its corresponding age, height (row of &#119809;). Detailed references describing record linkage with blocking are <ref type="bibr">[30]</ref>, Section 1.1 and <ref type="bibr">[31]</ref>, Section 2; also, see Fig. <ref type="figure">2</ref>. Experimental results on real datasets are given in Section 5. Other applications of unlabeled sensing include the pose and correspondence problem <ref type="bibr">[13]</ref> and 1-d unassigned distance geometry <ref type="bibr">[9]</ref>. In <ref type="bibr">[32]</ref>, the authors apply the unlabeled sensing framework to the sensor network localization problem. We also note a recent work that extends the unlabeled sensing theory to multi-channel signals, leading to highly structured unlabeled sensing problems <ref type="bibr">[33]</ref>. In <ref type="bibr">[33]</ref>, beyond theoretical analysis of the structured problem, the model is applied to a real-world application in calcium imaging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Technical background</head><p>This section provides definition of sub-Gaussian and sub-exponential random vectors, and also summarizes the key concentration inequalities used throughout this paper. Let &#119833; * &#8712; R &#119889;&#215;&#119898; be a matrix such that the columns &#119859; * &#119895; are independent, zero-mean sub-Gaussian random vectors, with sub-Gaussian norm &#119870;. It follows then that</p><p>A random variable &#119883; is sub-exponential with sub-exponential norm</p><p>where &#119862; &#8805; 1, &#119905; &#8805; 0 and &#119870; &#119889;-&#119904; &#8805; 0.</p><p>Theorem 2.1 (Hanswon-Wright Inequality, Theorem 2.1 in <ref type="bibr">[35]</ref>). Let &#931; = &#119808; &#8890; &#119808; be a positive semi-definite matrix. Let &#119857; = (&#119909; 1 , &#8230; , &#119909; &#119889; ) be a zero-mean sub-Gaussian random vector, i.e., for</p><p>Lemma 2.2 (Johnson-Lindenstrauss Lemma, Lemma 5.3.2 in <ref type="bibr">[36]</ref>). Let &#119875; be a projection in R &#119901; onto a uniformly distributed random &#119902;-dimensional subspace. Let &#119859; &#8712; R &#119901; be a fixed point and &#119905; &gt; 0. Then, with probability at least 1 -2 exp(-&#119888; &#119905; 2 &#119902;),</p><p>Theorem 2.3 (Hoeffding's Inequality, Theorem 2.6.2 in <ref type="bibr">[36]</ref>). Let &#119883; 1 , &#8230; , &#119883; &#119898; be independent, sub-Gaussian random variables. Then, for every &#119905; &#8805; 0,</p><p>where &#8214;&#119883; &#119894; &#8214; &#120593; 2 denotes the sub-Gaussian norm of &#119883; &#119894; and &#119888; &#8242; is an absolute constant.</p><p>Lemma 2.4 (Tail Inequality for &#120594; 2 &#119863; Distributed Random Variables, Lemma 1 in <ref type="bibr">[37]</ref>). Let &#119885; &#119863; be a &#120594; 2 statistic with &#119863; degrees of freedom. For any positive &#119905;,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Algorithm</head><p>In this section, we briefly summarize the alternating minimization algorithm first proposed in <ref type="bibr">[9]</ref>. To estimate the &#119823; * and &#119831; * in the optimization objective (1), the alternating minimization (AltMin) updates for <ref type="bibr">(2)</ref> are</p><p>where &#119823; (&#119905;) and &#119831; (&#119905;) denote the estimate of &#119823; * and &#119831; * at the t-th iteration respectively. The initialization to ( <ref type="formula">12</ref>), ( <ref type="formula">13</ref>), a linear assignment problem and a least squares problem, has to be chosen carefully because ( <ref type="formula">2</ref>) is a non-convex optimization problem. After initialization, we alternate ( <ref type="formula">12</ref>), ( <ref type="formula">13</ref>) and use the relative change in the objective value &#119865; (&#119831;, &#119823;) as the stopping criterion. For &#119903;-local &#119823; * &#119903; , the &#119823;-update <ref type="bibr">(12)</ref> decouples along the blocks of &#119823; * &#119903; , see Algorithm 1. For &#119896;-sparse &#119823; * &#119896; , see Algorithm 2.</p><p>Fig. <ref type="figure">2</ref>. Record linkage with Blocking <ref type="bibr">[27,</ref><ref type="bibr">34]</ref> assigns records in File A to records in File B, up to blocks. For example, records matching on identifiers 'city', 'occupation' etc. (left) are assigned to the same block (right). Linked linear regression <ref type="bibr">[28]</ref> fits a regression model on such block-permuted data. See Section 5 for results on real datasets. The figure on the left is adapted from Figure <ref type="figure">1</ref> in <ref type="bibr">[34]</ref> and the figure on the right is adapted from Figure <ref type="figure">1</ref> in <ref type="bibr">[27]</ref>.</p><p>Algorithm 1 AltMin for &#119903;-local &#119823; * &#119903; Input: Convergence threshold &#120598;, &#119832;, &#119809; and block sizes &#119903; 1 , &#119903; 2 , .., &#119903; &#119904; of &#119823; * 1 , &#119823; * 2 , ..., &#119823; * &#119904; respectively. 1: X &#8592; collapsed initialization in (3) 2: &#374; &#8592; &#119809; X 3: while relative change &gt; &#120598; do 4: for &#119894; &#8712; 1 &#8943; &#119904; do //&#119904; is the number of blocks 5: P&#119894; &#8592; argmin &#119823; &#119894; &#8712;&#120561; &#119903; &#119894; -&#10216;&#119832; &#119894; &#374;&#8868; &#119894; , &#119823; &#119894; &#10217; 6: P &#8592; blkdiag( P1 , &#8943; , P&#119904; ) 7: X &#8592; &#119809; &#8224; P&#8868; &#119832; 8: &#374; &#8592; &#119809; X 9: Return P, X Algorithm 2 AltMin for &#119896;-sparse &#119823; * &#119896; Input: convergence threshold &#120598;, &#119832;, &#119809; 1: &#374; &#8592; &#119832; 2: while relative change &gt; &#120598; do 3: P &#8592; argmin &#119823;&#8712;&#120561; &#119899; -&#10216;&#119832; &#374;&#8868; , &#119823;&#10217; 4:</p><p>&#374; &#8592; &#119809; X 6: Return P, X</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Initialization analysis</head><p>In this section, we derive upper bounds for the initialization error of the alternating minimization algorithm applied to solve the exact unlabeled sensing problem ((1) with &#119830; = &#120782;), using the initializations in (3) and (4).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Analysis for &#119903;-local permutation</head><p>We consider the initialization in <ref type="bibr">(3)</ref>. Let B = &#360;&#119826; &#7804;&#8868; denote the compact singular value decomposition of B. Using this in (3), we obtain B &#8224; = &#7804;&#119826; -1 &#360;&#8868; , where B = &#360;&#119826; &#7804;&#8868; . Using this substitution, it can be shown that the initialization error &#8214;&#119831; * -X(0) &#119903; &#8214; &#119865; is the projection of &#119831; * onto the orthogonal complement of the row space of B, as follows:</p><p>Recall that the size of B is &#119904; &#215; &#119889;. If &#119904; &#8805; &#119889;, and assuming that B has rank &#119889;, it can be verified that X(0) &#119903; = &#119831; * . For instance, for &#119904; &#8805; &#119889;, if the entries of &#119809; are drawn independently from a continuous distribution, the rank of B is &#119889; with high probability. Given that, in this section, we upper bound the error in initialization for the under-determined &#119904; &lt; &#119889; case. Similar analysis for under-determined systems have been studied in the sketching literature <ref type="bibr">[38]</ref>, but these results are not applicable here as our sub-sampling strategy via the collapsing initialization is deterministic.</p><p>The first key theoretical result is Theorem 4.1. This theorem considers a Gaussian matrix &#119809; &#8712; R &#119899;&#215;&#119889; , with &#119831; * &#8712; R &#119889;&#215;&#119898; assumed to be a fixed but unknown matrix. To upper bound the relative error</p><p>we apply the Johnson-Lindenstrauss lemma. The key insight is that the relative error depends on the parameter &#119889; -&#119904;. Specifically, the probability that the relative error exceeds &#8730; 1 -&#119904; &#119889; decays at a sub-Gaussian rate. This implies that the initialization in (3) improves as the number of blocks &#119904; in the &#119903;-local model increases. &#119903; be as defined in <ref type="bibr">(3)</ref>. Consider the exact unlabeled sensing problem. Assuming Gaussian &#119809; &#8712; R &#119899;&#215;&#119889; and block-diagonal &#119823; * &#119903; with &#119904; blocks, for log &#119898; &#8804; &#119888; &#119905; 2 (&#119889; -&#119904;), &#119904; &lt; &#119889;, and &#119905; &#8805; 0, Pr</p><p>Proof. The error ( <ref type="formula">14</ref>) is the column-wise projection of &#119831; * &#8712; R &#119889;&#215;&#119898; onto a (&#119889; -&#119904;)-dimensional uniformly random subspace <ref type="bibr">(14)</ref>, which can be bounded by the JL Lemma <ref type="bibr">(8)</ref>. &#9633;</p><p>In Theorem 4.1, we considered a Gaussian matrix &#119809; and a fixed, unknown matrix &#119831; * . In the following, we fix the measurement matrix &#119809; and introduce randomness in &#119831;. Before discussing our main result for this setting, we provide motivation for considering this scenario. One motivating application is the unassigned distance geometry problem (uDGP) <ref type="bibr">[39]</ref><ref type="bibr">[40]</ref><ref type="bibr">[41]</ref>, which involves recovering the configuration of points from a list of distances. We focus on the 1-dimensional uDGP, which can be framed as a structured unlabeled sensing problem, as described in <ref type="bibr">[9]</ref>, where &#119857; * represents the sought-after positions of the points on the 1D line. In this case, the matrix &#119809; is deterministic (see Eq. ( <ref type="formula">3</ref>) of <ref type="bibr">[9]</ref>). For this setting, without any assumptions on the underlying points &#119857; * , and assuming that the underlying permutation is &#119903;-local, the collapsed initialization may be suboptimal. To illustrate this, recall the error in the collapsed estimate: &#8214;(&#119816; -&#7804; &#7804;&#8868; )&#119857; * &#8214; &#119865; . For any &#119857; * orthogonal to the subspace spanned by &#7804;, the collapsed estimate will be the zero vector. This shows that the following upper bound on the initialization error &#8214; x(0) &#119903; -&#119857; * &#8214; &#8804; &#8214;&#119816; -&#7804; &#7804;&#8868; &#8214;&#8214;&#119857; * &#8214; = &#8214;&#119857; * &#8214;, holds with equality for &#119857; * orthogonal to the subspace spanned by the columns of &#7804;. The implication of the above discussion is that some structural assumption on the underlying &#119857; * is necessary to obtain suitable bounds on the initialization error.</p><p>We next consider the squared error in the initialization &#8214;&#119808;&#119857; * &#8214; 2 where &#119808; = &#119816; -&#7804; &#7804;&#8868; . Note that &#119808; is an orthogonal projection matrix onto the orthogonal complement of the subspace spanned by &#7804;. Given that &#7804; &#8712; R &#119889;&#215;&#119904; and we are considering the underdetermined regime &#119904; &lt; &#119889;, the rank of &#119808; is &#119889; -&#119904;. Therefore, the analysis of the error is equivalent to analyzing the quadratic form &#8214;&#119808;&#119857; * &#8214; 2 . One common approach to control the norm of a random vector is to assume that &#119857; * is a random vector with independent, sub-Gaussian coordinates (see, for example, Theorem 3.1.1 in <ref type="bibr">[36]</ref>). However, we note that this assumption does not hold after applying the projection operator &#119808;. A more suitable tool in this case is the Hanson-Wright inequality. In fact, the quadratic form in our case is special due to the fact that &#119808; is a projection operator, allowing us to leverage its linear algebraic properties. To apply the Hanson-Wright inequality, it suffices to assume that &#119857; * is sub-Gaussian with mean zero, without requiring the independence of its coordinates. With this assumption, we study the error</p><p>4.1 shows that this error depends critically on the parameter &#119889;-&#119904;. In particular, Theorem 4.2 demonstrates that the probability of the error exceeding &#119874; ( &#8730; &#119889; -&#119904; ) decays at a sub-Gaussian rate. Theorem 4.2. Let &#119831; * &#8712; R &#119889;&#215;&#119898; be the unknown matrix such that the columns &#119857; * &#119895; are independent, zero-mean sub-Gaussian random vectors, with sub-Gaussian norm &#119870;. Let X(0) &#119903; = [ x(0) 1 | &#8943; | x(0) &#119898; ] be as defined in (3). Consider the exact unlabeled sensing problem. For any fixed measurement matrix &#119809; and block-diagonal &#119823; * &#119903; with &#119904; blocks, &#119904; &lt; &#119889; and &#119905; &#8805; 0,</p><p>where</p><p>The strategy for the proof of Theorem 4.2 is to first use the Hanson-Wright inequality to control the quadratic form &#8214;&#119808;&#119857; * &#8214; 2 . Once that is established, we use Lemma 4.3 to define a modified random variable that is sub-exponential. The rest of the technical proof uses the Hoeffding bound on the modified random variable and standard techniques to relate the concentration inequality of &#8214;&#119859;&#8214; 2 2 to &#8214;&#119859;&#8214; 2 <ref type="bibr">[36]</ref>. For the complete proof, see Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 4.3. Let x(0)</head><p>&#119903; be as in (3), &#119857; * &#8712; R &#119889; be a zero-mean sub-Gaussian random vector, &#119809; be a fixed measurement matrix, and &#119823; * &#119903; be a fixed blockdiagonal permutation with &#119904; blocks. Let &#119857; err denote the random variable</p><p>1. For &#119904; &lt; &#119889; and &#119905; &#8805; 0,</p><p>where</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Define the random variable xerr as follows:</head><p>The random variable xerr is a sub-exponential random variable with sub-exponential norm &#8214;x err -&#119862; &#119889;-&#119904; &#8214; &#120593; 1 = &#119862; &#119870; &#119889;-&#119904; . In addition, xerr &#8805; &#119857; err , and</p><p>Proof. We prove each part separately.</p><p>1. To derive <ref type="bibr">(17)</ref>, set &#119808; = (&#119816; -&#7804; &#7804;&#8868; ) in <ref type="bibr">(7)</ref>, where &#7804; &#8712; R &#119904;&#215;&#119889; denotes the basis for the row space of the collapsed matrix</p><p>Here, &#119808; &#8712; R &#119889;&#215;&#119889; is a (&#119889; -&#119904;)-dimensional projection matrix, and &#931; = &#119808; &#8868; &#119808; = &#119808;&#119808; = &#119808;. Moreover, Tr (&#931;) = Tr (&#931; 2 ) = Tr (&#119808;) = -&#119904; and the operator norm satisfies &#8214;&#931;&#8214; = 1. We now apply Hanson-Wright inequality in <ref type="bibr">(7)</ref>, we obtain:</p><p>The next step will be to upper bound the term 2 &#8730; (&#119889; -&#119904;)&#119905;. To do that, we rely on following inequality: 2 &#8730; &#119887;&#119905; &#8804; 2&#119905; + 1 2 &#119887;, for &#119887; &#8805; 1 and &#119905; &#8805; 0. To verify this, note that this inequality is equivalent to checking whether 4&#119905; &#8804; 4&#119905; 2 &#119886; + 2&#119905;&#119886; + &#119886;&#8725;4 holds.</p><p>&#8226; For &#119886; = 1, this reduces to 0 &#8804; (2&#119905; -1&#8725;2) 2 , which holds for all &#119905; &#8805; 0. &#8226; For &#119886; &#8805; 2, we check 4&#119905; &#8804; 4&#119905; 2 &#119886; + 2&#119905;&#119886; + 1&#8725;4 which is true for &#119905; &#8805; 0.</p><p>Using this inequality with &#119887; = &#119889; -&#119904;, we obtain 2 &#8730; (&#119889; -&#119904;)&#119905; &#8804; 2&#119905;(&#119889; -&#119904;) + 1 2 (&#119889; -&#119904;). Substituting this bound in <ref type="bibr">(21)</ref>, we simplify the inequality as follows:</p><p>The above inequality is equivalent to:</p><p>A change of variable, &#119906; = 2&#119870; 2 &#119905;((&#119889; -&#119904;) + 1), and minor algebraic manipulation yields the final desired result.</p><p>Remark: To derive Theorem 4.2, we first show that &#119857; err is a subexponential random variable. It cannot be concluded from ( <ref type="formula">17</ref>) that the shifted random variable &#119857; err -&#119862; &#119889;-&#119904; is sub-exponential because the lower tail probability Pr [&#119857; err -&#119862; &#119889;-&#119904; &#8804; 0] is not bounded. Since our goal is to upper bound the probability that the error exceeds a certain value, we define the non-negative subexponential random variable xerr -&#119862; &#119889;-&#119904; , which upper bounds the lower tail as {&#119857; err &#8804; &#119862; &#119889;-&#119904; } = &#119862; &#119889;-&#119904; , see <ref type="bibr">(18)</ref>. We will use the definition of xerr given in ( <ref type="formula">18</ref>), ( <ref type="formula">19</ref>). 2. Conditioning on the two complementary events in ( <ref type="formula">18</ref>), <ref type="bibr">(19)</ref>, by the law of total probability, &#8704; &#119905; &gt; 0, we have</p><p>= Pr [&#119857; err -&#119862; &#119889;-&#119904; &#8805; &#119905; | &#119857; err &gt; &#119862; &#119889;-&#119904; ] Pr [&#119857; err &gt; &#119862; &#119889;-&#119904; ] (25) = Pr [&#119857; err -&#119862; &#119889;-&#119904; &#8805; &#119905;], (26) (24) follows from (18); specifically, Pr [x err -&#119862; &#119889;-&#119904; &#8805; &#119905; | &#119857; err -&#119862; &#119889;-&#119904; ] = 0 &#8704; &#119905; &gt; 0. (25) follows from (19). (26) follows from noting that the event {&#119857; * | &#119857; err &#8805; &#119862; &#119889;-&#119904; + &#119905;} is a subset of {&#119857; * | &#119857; err &gt; &#119862; &#119889;-&#119904; }. From (26), for &#119905; &gt; 0 Pr [&#119857; err -&#119862; &#119889;-&#119904; &#8805; &#119905;] = Pr [|x err -&#119862; &#119889;-&#119904; | &#8805; &#119905;]. (27) (27) follows from noting that xerr -&#119862; &#119889;-&#119904; = |x err -&#119862; &#119889;-&#119904; |. Substituting (17) in (27), for &#119905; &gt; 0, Pr [|x err -&#119862; &#119889;-&#119904; | &#8805; &#119905;] &#8804; exp ( -&#119888; &#119905; &#119870; &#119889;-&#119904;</p><p>Since exp(0) = 1, (28) holds for &#119905; &#8805; 0. In ( <ref type="formula">28</ref>), we have verified the definition in ( <ref type="formula">6</ref>) for xerr -&#119862; &#119889;-&#119904; with &#119870; &#119889;-&#119904; = 2&#119870; 2 ( &#8730; &#119889; -&#119904; + 1) = &#119862; &#119870; 2  &#8730; &#119889; -&#119904;. By proposition 2.7.1 (a), (d), definition 2.7.5 in <ref type="bibr">[36]</ref> and ( <ref type="formula">28</ref>), the sub-exponential norm is &#8214;x err -&#119862; &#119889;-&#119904; &#8214; &#120593; 1 = &#119862; &#119870; &#119889;-&#119904; . &#9633;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Analysis for &#119896;-sparse permutation</head><p>In this section, we analyze the initialization step when the underlying permutation is &#119896;-sparse. Specifically, we study the relative error</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119865;</head><p>. Assuming that &#119809; is a ''tall'' Gaussian matrix, Theorem 4.5</p><p>establishes that this error depends on the parameter &#119896;&#8725;&#119899;, which represents the proportion of shuffled entries relative to the total. More precisely, Theorem 4.5 shows that the probability of the error exceeding &#119874;(&#119896;&#8725;&#119899;) decays at a rate ranging from inverse linear to exponential. Central to Theorem 4.5 is the assumption that the measurement matrix &#119809; is a ''tall'' Gaussian matrix, which we define precisely below. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 4.4 ([42]</head><p>). A Gaussian matrix &#119809; &#8712; R &#119899;&#215;&#119889; is considered ''tall'' if the aspect ratio &#120582; = &#119889;&#8725;&#119899; satisfies &#120582; &lt; &#120582; 0 for some sufficiently small constant &#120582; 0 &gt; 0. In that case, we have Pr [&#120590; min (&#119809;) &#8805; &#119888; &#8730; &#119899;] &#8804; &#119890; -&#119888; &#119899; , where &#119888; is an absolute constant that depends on &#119809;. , where &#119862; = 4(1 + &#8730; 3), and 1 &#8804; &#119896; &#8804; &#119899;, we have</p><p>For the proof of ( <ref type="formula">29</ref>), see Appendix A in the Appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Remark:</head><p>In the above theorem, we consider the error bound at the lower bounds and upper bounds of &#120598;.</p><p>, it can be shown that the failure probability is 8</p><p>, the failure probability is exp(-&#119888; &#119899;). (29) gives a (&lt; 1) relative error bound when &#119896; grows slowly with &#119899;, for example &#119896; = &#119899; &#120573; , &#120573; &lt; 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>We compare the proposed AltMin algorithm to several benchmark methods on both synthetic (Fig. <ref type="figure">3</ref>) and real (Table <ref type="table">3</ref>) datasets. We implemented all algorithms in MATLAB. The linear assignment problem to recover the permutation estimate &#119823; is solved by using the MATLAB matchpairs function. The least squares estimate (line 7 of Algorithm 1 and line 4 of Algorithm 2) is solved by computing the Moore-Penrose pseudoinverse using the MATLAB function pinv. The convex optimization problem proposed in <ref type="bibr">[2]</ref> is solving using CVX <ref type="bibr">[43,</ref><ref type="bibr">44]</ref>. The specific solver we used is SeDuMi. We also use CVX to project onto the set of doubly stochastic matrices for the benchmark method 'ds+'. Baselines. We compare against six baseline methods, '&#120001; 2 -regularized' <ref type="bibr">[2]</ref>, 'ds+' <ref type="bibr">[2]</ref>, 'Spectral' <ref type="bibr">[19]</ref>, 'Biconvex' <ref type="bibr">[1]</ref>, 'RLUS' <ref type="bibr">[7]</ref>, and 'Stochastic AltMin' <ref type="bibr">[17]</ref>. The '&#120001; 2 -regularized' method considers the &#119896;-sparse permutation model and imposes a row-wise group sparse penalty (&#120001; 2 -norm regularization) on &#119820;, where &#374; = &#119809; X + &#119820;. Other methods are discussed in the following paragraphs. For the proposed algorithms, ( <ref type="formula">12</ref>), ( <ref type="formula">13</ref>) are alternated until the change in the objective value is less than 1 percent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Synthetic data generation.</head><p>To simulate &#119832; = &#119823; * &#119809; &#119899;&#215;&#119889; &#119831; * &#119889;&#215;&#119898; + &#119830;, we generate data by drawing the entries of matrices &#119809;, &#119831; * and &#119830; from the normal distribution. Subsequently, &#119830; is scaled by &#120590; to set SNR &#8796; &#8214;&#119831; * &#8214; 2 &#119865; &#8725;(&#119898;&#120590; 2 ) = 100. The permutation matrices &#119823; * &#119903; and &#119823; * &#119896; are sampled uniformly from the set of &#119903;-local and &#119896;-sparse permutations, respectively. The results are averaged over 15 Monte-Carlo runs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Enforcing block-diagonal constraints.</head><p>To adapt the 'Spectral', '&#120001; 2regularized', and 'Biconvex' methods to the &#119903;-local model, we add a constraint enforcing the permutation estimates to be block-diagonal.'</p><p>&#8226; The Spectral method in <ref type="bibr">[3]</ref> studies the unlabeled sensing model &#119832; = &#119823; &#119894; &#119809;&#119831; + &#119830;, where &#119830; is an additive Gaussian noise, with no assumption made on the structure of the permutation, i.e., the underlying permutation is generic. The proposed algorithm in <ref type="bibr">[3]</ref> Table <ref type="table">3</ref> &#119903; max denotes the largest block size of &#119823; * &#119903; and &#119904; denotes the number of blocks. For a description of the datasets, see Table <ref type="table">2</ref>. The methods are compared by the coefficient of determination &#119877; 2 and relative error (&#119877; 2 , relative error). The relative error is &#8214;&#119831; * -X&#8214; &#119865; &#8725;&#8214;&#119831; * &#8214; &#119865; , where &#119831; * = &#119809; &#8224; &#119832; * is the 'oracle' regression matrix given unpermuted data &#119832; * . The coefficient &#119877; 2 ( X) = 1 -(&#8214;&#119832; * -&#119809; X&#8214; &#119865; &#8725;&#8214;&#119832; * &#8214; &#119865; ) measures the goodness of fit for the unpermuted data. The 'naive' estimate from permuted data &#119832; is X = &#119809; &#8224; &#119832;. The coefficient &#119877; 2 is bounded 0 &#8804; &#119877; 2 &#8804; 1, and a higher value of &#119877; 2 indicates a better fit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset</head><p>&#119903; max &#119904; Oracle Naive RLUS &#120001; 2 -reg AltMin-&#119896; AltMin-&#119903; ftp 43 46 (0.88, 0) (0.44, 0.70) (0.62, 0.01) (0.57, 0.59) (0.73, 0.41) (&#120782;.&#120790;&#120787;, &#120782;.&#120783;&#120789;) ames 311 61 (0.85, 0.32) (0.37, 0.75) (0.38, 0.72) (0.72, 0.89) (0.69, 0.42) (&#120782;.&#120789;&#120788;, &#120782;.&#120785;&#120784;) scm 424 1137 (0.58, 0) (0.21, 0.79) (0.53, 0.29) (0.46, 0.49) (0.54, 0.29) (&#120782;.&#120787;&#120787;, &#120782;.&#120784;&#120787;) scs 3553 356 (0.81, 0) (0.01, 0.99) (0.74, 0.24) (0.02, 0.98) (0.73, 0.33) (&#120782;.&#120789;&#120789;, &#120782;.&#120784;&#120783;) air-qty 95 366 (0.69, 0) (0.29, 0.78) (0.65, 0.20) (0.64, 0.21) (&#120782;.&#120788;&#120788;, &#120782;.&#120783;&#120789;) (0.65, 0.19) air-qty 46 744 (0.69, 0) (0.10, 0.94) (0.60, 0.34) (0.26, 0.79) (0.57, 0.42) (&#120782;.&#120788;&#120784;, &#120782;.&#120785;&#120784;)</p><p>estimates &#119823; using</p><p>In the case that the underlying permutation is &#119903;-local, first recall that &#119904; denotes the number of blocks in the &#119903;-local permutation and &#119903; &#119894; is the size of the &#119894;th block. The unlabeled sensing model reduces to</p><p>where &#119832; &#119894; &#8712; R &#119903; &#119894; &#215;&#119898; , &#119809; &#119894; &#8712; R &#119903; &#119894; &#215;&#119889; and &#119830; &#119894; &#8712; R &#119903; &#119894; &#215;&#119898; respectively denote blocks of the matrices &#119832; &#8712; R &#119899;&#215;&#119898; , &#119809; &#8712; R &#119899;&#215;&#119889; and &#119830; &#8712; R &#119899;&#215;&#119898; . Therefore, accounting for the structure of the &#119903;-local permutation, we modify (30) as follows:</p><p>which are &#119904; linear assignment problems over the sets of permutation matrices of size &#119903; &#119894; . &#8226; For the 'Biconvex' algorithm, we modify the &#119823; 1 , &#119823; 2 updates in Algorithm 1 from <ref type="bibr">[1]</ref>.</p><p>where &#120583; and &#120588; are ADMM parameters, and &#119823; &#119809; is the projection matrix onto the column-span of &#119809;. Instead of updating &#119823; (&#119905;+1) 1 = argmin &#119823;&#8712;&#120561; &#119899; &#10216;&#119823;, &#119810; 1 &#10217;, we enforce the block diagonal constraint by updating &#119823; (&#119905;+1) 1,&#119894; = argmin &#119823; &#119894; &#8712;&#120561; &#119903; &#119894; &#10216;&#119823; &#119894; , &#119810; 1,&#119894; &#10217;, where &#119810; 1,&#119894; &#8712; R &#119903;&#215;&#119903; is the matrix comprised of entries from the &#119903; &#119894; rows and &#119903; &#119894; columns in the &#119894;-the block of &#119810;. We update &#119823; (&#119905;+1) 2 in a similar manner. &#8226; The '&#120001; 2 -regularized' method considers the &#119896;-sparse permutation model and imposes a row-wise group sparse penalty on &#119820;, where &#374; = &#119809; X + &#119820;. Specifically, the proposed estimate is obtained by solving the minimization problem: min &#119831;,&#119820; &#8214;&#119832; -&#119809;&#119831;&#8214; 2 &#119865; + &#120582; &#119899; &#8721; &#119894;=1 &#8214;&#119820; &#119894; &#8214; 2 , where &#119820; &#119894; denotes the &#119894;th row of &#119820;. We do not modify this minimization over &#119831; and &#119820;. Instead, We substitute this estimate, denoted by X, and let &#374; = &#119809; X. To estimate the underlying permutation, we consider the following minimization problem min &#119823; &#8214;&#119832; -&#119823; &#374;&#8214; 2 &#119865; . We note the following equalities: argmin &#119823; &#8214;&#119832; -&#119823; &#374;&#8214; 2 &#119865; = argmin &#119823; &#8214;&#119832;&#8214; 2 &#119865; + &#8214;&#119823; &#374;&#8214; 2 &#119865; -2&#10216;&#119832;, &#119823; &#374;&#10217; = argmax &#119823; &#10216;&#119832;, &#119823; &#374;&#10217; = argmax &#119823; Tr (&#119832; &#8868; &#119823; &#374;) = argmax &#119823; &#10216;&#119832; &#374;&#8868; , &#119823;&#10217;. In the case that the underlying permutation is &#119903;-local, the above minimization problem decouples, and we obtain a block-wise linear assignment problem: P&#119894; &#8592; argmin &#119823; &#119894; &#8712;&#120561; &#119903; &#119894; -&#10216;&#119832; &#119894; &#374;&#8868; &#119894; , &#119823; &#119894; &#10217;, where &#374;&#119894; &#8712; R &#119903; &#119894; &#215;&#119898; and &#119832; &#119894; &#8712; R &#119903; &#119894; &#215;&#119898; denote blocks of the matrices &#374; and &#119832;, respectively. conclusions of Theorems 4.1, 4.2, and 4.5 as the initialization improves with lower values of &#119903; and &#119896;.</p><p>The proposed AltMin algorithm is also applicable to both models and computationally scalable. For &#119903; = 125 (Fig. <ref type="figure">3(a)</ref>) and &#119896; = 850 (Fig. <ref type="figure">3(b</ref>)), MATLAB runtime with 16 Gb RAM and 9-th Gen. 4-core processor is less than a second. Fig. <ref type="figure">3(c</ref>). The entries of the measurement matrix &#119809; are sampled i.i.d. from the uniform [0, 1] distribution. Compared to the case for Gaussian &#119809; (Fig. <ref type="figure">3(b)</ref>), the performance of the 'Spectral' and 'RLUS' methods deteriorates significantly. This is because both algorithms consider quadratic measurements &#119832;&#119832; &#8868; . Specifically, the 'Spectral' method is based on the spectral initialization technique <ref type="bibr">[45]</ref>, which assumes a Gaussian measurement matrix. In contrast, the performance of the proposed and '&#120001; 2 -regularized' methods does not deteriorate. Fig. <ref type="figure">3(d</ref>). The 'ds+' algorithm <ref type="bibr">[2]</ref> considers the convex relaxation of (2) by minimizing the objective over the set of doubly stochastic matrices. Assuming a known upper bound on the number of shuffles &#119896;, 'ds+' constrains &#10216;&#119816;, &#119823;&#10217; &#8805; &#119899; -&#119896;. To project onto the set of doubly stochastic matrices, each iteration of 'ds+' minimizes a linear program which greatly increases its run-time. The proposed AltMin algorithm optimizes the same objective, but over the set of permutation matrices. This results in a simpler linear assignment problem, which is why AltMin is faster and outperforms 'ds+'. Fig. <ref type="figure">3(e)</ref>. We compare to the method in <ref type="bibr">[17]</ref> which considers the &#119898; = 1 single-view setup and proposes stochastic alternating minimization (S.AltMin) to optimize <ref type="bibr">(2)</ref>. S.AltMin updates &#119823; 50 times in each iteration and its run-time is 50 times that of the proposed algorithm. <ref type="bibr">[17]</ref> also proposes AltMin with multiple initializations for P(0) with a similarly high run-time. The results show that our algorithm (AltMin with P(0) = &#119816; initialization) outperforms both S.AltMin and AltMin with multiple initializations. Check what AltMin refers to. Table <ref type="table">3</ref>. For the linked linear regression problem (Section 2.3), we report results on the three datasets from <ref type="bibr">[2]</ref>, available at <ref type="bibr">[46]</ref><ref type="bibr">[47]</ref><ref type="bibr">[48]</ref>, and the Ames <ref type="bibr">[49]</ref> housing datasets. For all datasets, the columns of the response variables &#119832; and the design matrix &#119809; are centered (zeromean). &#119809; is also replaced by its top &#119889; principal components. For the 'scs' dataset, one of the 7 response variables is excluded to improve the model fit. For 'ftp', 'ames', 'scm', 'ames' and 'scs', the values of a feature (column of the design matrix) are rounded and data points with the same feature value are assigned to the same block and permuted.</p><p>The 'air-qty' dataset contains time-stamped readings with year, month, day, and hour information and we follow the experimental setup designed in the 'Case Study' in Section 4 of <ref type="bibr">[2]</ref>: in row 4 (5) of Table <ref type="table">3</ref>, readings with the same month and day (day and hour) are assigned to the same block and permuted. The regression model for 'air-qty' is also as defined in <ref type="bibr">[2]</ref>, and we also use a moving-median filter with window size 32 to remove outliers. The proposed AltMin algorithm outperforms the competing baselines. 'AltMin-&#119896;', i.e., AltMin initialized to P(0) = &#119816;, is also competitive, possibly because permuting correlated rows does not greatly corrupt the design matrix &#119809;. Results for 'Spectral' and 'Biconvex' are omitted because the methods were not competitive.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this paper, we studied a fast alternating minimization algorithm for the unlabeled sensing problem under two structured permutation models: &#119896;-sparse and &#119903;-local. The initialization of this non-convex algorithm plays a crucial role in its performance. To address this, we proposed two initialization strategies tailored to the respective permutation models. Our primary contribution lies in the characterization of the initialization error, providing theoretical insights into its impact on algorithm performance. Additionally, we show the competitive performance of the algorithm on both synthetic and real datasets. While the current work focuses on analyzing the initialization, one direction for future research is to study the rate of convergence and establish conditions under which the algorithm converges provably to the unknown parameters of the unlabeled sensing problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CRediT authorship contribution statement</head><p>Ahmed Ali Abbasi: Writing -review &amp; editing, Writing -original draft, Methodology, Conceptualization. Shuchin Aeron: Writing -review &amp; editing, Writing -original draft, Methodology. Abiy Tasissa: Writing -review &amp; editing, Writing -original draft, Methodology. can be expressed as the difference of two Chi-square random variables. Using a tail inequality for Chi-square distributed random variables (see Lemma 4.3), Lemma B.1 provides a bound for this quantity. With this result established, the main crux of the proof of Theorem 4.5 is to relate &#8721;</p><p>Proof. Let &#119812; denote the error term such that</p><p>where X(1) = &#119809; &#8224; &#374;(0) , &#374;(0) = &#119832; = &#119823; * &#119896; &#119809;&#119831; * and &#119812; &#10178; &#57918;(&#119809;) is orthogonal to the range space of &#119809; since X(1) = argmin &#119831; &#8214; &#374;(0) -&#119809;&#119831;&#8214; &#119865; . Combining &#119832; * = &#119809;&#119831; * and (39), we obtain:</p><p>The inequality in <ref type="bibr">(40)</ref> can equivalently be rewritten as: <ref type="formula">41</ref>) and applying the union bound to &#8746; &#119895;=&#119898; &#119895;=1 &#57905; &#119895; using <ref type="bibr">(47)</ref>,</p><p>For &#119905; &#8805; 2 log &#119898;,</p><p>By Theorem 1.1 in <ref type="bibr">[42]</ref>, for a ''tall'' Gaussian matrix &#119809; &#8712; R &#119899;&#215;&#119889; , where &#119809; is considered tall if the aspect ratio &#120582; = &#119889;&#8725;&#119899; satisfies &#120582; &lt; &#120582; 0 for some sufficiently small constant &#120582; 0 &gt; 0, we have Pr [&#120590; min (&#119809;) &#8804; &#119888; &#8730; &#119899;] &#8804; &#119890; -&#119888; &#119899; (see Subsection 1.2 of <ref type="bibr">[42]</ref>). We consider the union bound of the failure probabilities &#119890; -&#119888; &#119899; and 7&#119890; -&#119905;&#8725;2 . A minor calculation yields that &#119890; -&#119888; &#119899; + 7&#119890; -&#119905;&#8725;2 &#8804; 8&#119890;</p><p>-&#119905;&#8725;2 for &#119905; &#8804; &#119888; &#119899;. Consequently, for 2 log(&#119898;) &#8804; &#119905; &#8804; &#119888; &#119899;, we obtain Pr [ &#8214;&#119831; * -X(1) &#8214; 2 &#119865; &#8214;&#119831; * &#8214; 2 &#119865; -2 &#119888; 2 &#119896; &#119899; &#8805; &#119862; &#8730; &#119896; &#119899; &#119905; ] &#8804; 8&#119890; -&#119905;&#8725;2 . (44) A.A. Abbasi et al. We now make a change of variables &#119905; &#8242; = &#119862; &#8730; &#119896; &#119899; &#119905;. For 2&#119862; &#8730; &#119896; log(&#119898;) &#119899; &#8804; &#119905; &#8242; &#8804; &#119862; &#119888; &#8730; &#119896;, we have Pr [ &#8214;&#119831; * -X(1) &#8214; 2 &#119865; &#8214;&#119831; * &#8214; 2 &#119865; -2 &#119888; 2 &#119896; &#119899; &#8805; &#119905; &#8242; ] &#8804; 8 exp ( -&#119905; &#8242; &#119899; 2&#119862; &#8730; &#119896; ) . (45) We make a further change of variable with &#119905; &#8242; = &#120598; &#119896; &#119899; . For 2&#119862; log(&#119898;) &#8730; &#119896; &#8804; &#120598; &#8804; &#119862; &#119888; &#119899; &#8730; &#119896; , we obtain Pr [ &#8214;&#119831; * -X(1) &#8214; 2 &#119865; &#8214;&#119831; * &#8214; 2 &#119865; &#8805; ( 2 &#119888; 2 + &#120598; ) &#119896; &#119899; ] &#8804; 8 exp ( -&#120598; &#8730; &#119896; 2&#119862; ) . (46) (46) provides a useful (&lt; 1) relative error bound when &#119896; grows slowly with &#119899;, for example &#119896; = &#119899; &#120573; , &#120573; &lt; 1. &#9633; Lemma B.1. Let &#119823; * &#119896; be the fixed unknown &#119896;-sparse permutation matrix with &#10216;&#119816;, &#119823; * &#119896; &#10217; = &#119899; -&#119896;. Consider the exact unlabeled sensing problem. Let &#119857; * be the fixed unknown vector, &#119858; * = &#119809;&#119857; * , &#375;(0) = &#119823; * &#119896; &#119809;&#119857; * . Assuming Gaussian &#119809;, for &#119896; &#8712; [&#119899;] and &#119905; &#8805; 0, we have Pr [ &#8214;&#119858; * -&#375;(0) &#8214; 2 2 &#8805; ( 2&#119896; + 4(1 + &#8730; 3) &#8730; &#119896;&#119905; + 10&#119905; ) &#8214;&#119857; * &#8214; 2 2 ] &#8804; 7&#119890; -&#119905; . (47) Proof. Under the &#119896;-sparse assumption on &#119823; * , the known vector &#119858; has &#119896; shuffled entries. Assuming, without loss of generality, that the first &#119896; rows of &#119809; * are shuffled, &#8214;&#119858; * -&#375;(0) &#8214; 2 2 &#8214;&#119857; * &#8214; 2 2 = 1 &#8214;&#119857; * &#8214; 2 2 &#119894;=&#119896; &#8721; &#119894;=1 (&#119835; &#8868; &#119894; &#119857; * -&#119835; &#8868; &#119823;(&#119894;) &#119857; * ) 2 = 2 &#119894;=&#119896; &#8721; &#119894;=1 (&#119835; &#8868; &#119894; &#119857; * ) 2 &#8214;&#119857; * &#8214; 2 2 &#9183;&#9182;&#9182;&#9182;&#9182;&#9183;&#9182;&#9182;&#9182;&#9182;&#9183; &#8796;&#119879; 1 -2 &#119894;=&#119896; &#8721; &#119894;=1 &#119835; &#8868; &#119894; &#119857; * &#119835; &#8868; &#119823;(&#119894;) &#119857; * &#8214;&#119857; * &#8214; 2 2 &#9183;&#9182;&#9182;&#9182;&#9182;&#9182;&#9182;&#9182;&#9182;&#9183;&#9182;&#9182;&#9182;&#9182;&#9182;&#9182;&#9182;&#9182;&#9183; &#8796;&#119879; 2</p><p>&#119879; 1 , defined in <ref type="bibr">(48)</ref>, is the sum of &#119896; independent Chi-square random variables and is bounded, using <ref type="bibr">(11)</ref>, as follows:</p><p>The product random variables in &#119879; 2 are distributed as the difference of two independent &#120594;</p><p>2 random variables. &#119835; &#8868; &#119894; &#119857; * &#119835; &#8868; &#119823;(&#119894;) &#119857; * &#8214;&#119857; * &#8214; 2 2 &#8764; 1 2 &#119885; 1 &#119894; -1 2 &#119885; 2 &#119894; &#8704; &#119894; &#8712; &#119899; -&#119896; + 1, &#8230; , &#119899;. (50) The random variables (rv) in (50) are not mutually independent, but each rv depends on, at most, two other rvs. To see this, let permutation &#119823; such that &#119823;(&#119894;) &#8614; &#119895;, then &#119835; &#8868; &#119894; &#119857; * &#119835; &#8868; &#119895; &#119857; * is not independent of &#119835; &#8868; &#119895; &#119857; * &#119835; &#8868; &#119823;(&#119895;) &#119857; * , &#119835; &#8868; &#119823; &#8868; (&#119894;) &#119857; * &#119835; &#8868; &#119894; &#119857; * . (51) The &#119896; rvs in (50) can therefore be partitioned into three sets &#119875; , &#119876;, &#119877; such that the rvs within each set are independent. Let &#119896; 1 be the number of rvs in set &#119875; . The sum &#119879; &#119875; , where &#119879; &#119875; &#8796; 1 &#8214;&#119857; * &#8214; 2 2 &#8721; &#119894;&#8712;&#119875; &#119835; &#8868; &#119894; &#119857; * &#119835; &#8868; &#119823;(&#119894;) &#119857; * = 1 2 &#8721; &#119894;&#8712;&#119875; &#119885; 1 &#119894; -1 2 &#8721; &#119894;&#8712;&#119875; &#119885; 2 &#119894; , (52) is upper bounded in probability as Pr [&#119879; &#119875; &#8804; -2 &#8730; &#119896; 1 &#119905; -&#119905;] &#8804; 2 exp(-&#119905;). (53) (53) follows from applying the union bound to probabilities &#119901; 1 , &#119901; 2 which are given by &#119901; 1 = Pr [&#8721; &#119894;&#8712;&#119875; &#119885; 1 &#119894; &#8804; &#119896; 1 -2 &#8730; &#119896; 1 &#119905; ] &#8804; &#119890; -&#119905; , (54) &#119901; 2 = Pr [&#8721; &#119894;&#8712;&#119875; &#119885; 2 &#119894; &#8805; &#119896; 1 + 2 &#8730; &#119896; 1 &#119905; + 2&#119905; ] &#8804; &#119890; -&#119905; , (55) and bounding &#119901; 1 , &#119901; 2 using the tail inequalities (10) and (11), respectively. Defining &#119879; &#119876; , &#119879; &#119877; similarly to (52), with cardinalities &#119896; 2 , &#119896; 3 and applying the union bound as in (54), (55) gives Pr [ &#119879; 2 &#8804; -2 &#8730; 3&#119896;&#119905; -3&#119905; ] &#8804; 6&#119890; -&#119905; . (57) Applying the union bound to (49), (57) gives the result in (47) with &#119862; = 4(1 + &#8730; 3). &#9633;</p></div></body>
		</text>
</TEI>
