<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>On Weighted Graph Sparsification by Linear Sketching</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10432482</idno>
					<idno type="doi">10.1109/FOCS54457.2022.00052</idno>
					<title level='j'>63rd IEEE Annual Symposium on Foundations of Computer Science</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Yu Chen</author><author>Sanjeev Khanna</author><author>Huan Li</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A seminal work of  PODS'12]   showed that one can compute a cut sparsifier of an unweighted undirected graph by taking a near-linear number of linear measurements on the graph. Subsequent works also studied computing other graph sparsifiers using linear sketching, and obtained near-linear upper bounds for spectral sparsifiers [Kapralov-Lee-Musco-Musco-Sidford, FOCS'14] and first non-trivial upper bounds for spanners  SODA'21]. All these linear sketching algorithms, however, only work on unweighted graphs, and are extended to weighted graphs by weight grouping, a non-linear operation not implementable in, for instance, general turnstile streams.In this paper, we initiate the study of weighted graph sparsification by linear sketching by investigating a natural class of linear sketches that we call incidence sketches, in which each measurement is a linear combination of the weights of edges incident on a single vertex. This class captures all aforementioned linear sketches for unweighted sparsification. It also covers linear sketches implementable in the simultaneous communication model, where edges are distributed across n machines. Our results are:1) Weighted cut sparsification: We give an algorithm that computes a (1 + )-cut sparsifier using Õ(n -3 )linear measurements, which is nearly optimal. This also implies a turnstile streaming algorithm with Õ(n -3 ) space. Our algorithm is achieved by building a so-called "weighted edge sampler" for each vertex. 2) Weighted spectral sparsification: We give an algorithm that computes a (1+ )-spectral sparsifier using Õ(n 6/5 -4 ) linear measurements. This also implies a turnstile streaming algorithm with Õ(n 6/5 -4 ) space. Key to our algorithm is a novel analysis of how the effective resistances change under vertex sampling. Complementing our algorithm, we then prove a superlinear lower bound of Ω(n 21/20-o(1) ) measurements for computing some O(1)-spectral sparsifier using incidence sketches. 3) Weighted spanner computation: We first show that any o(n 2 ) linear measurements can only recover a spanner of stretch that in general depends linearly on wmax w min . We thus focus on graphs with wmax w min = O(1) and study the stretch's dependence on n. On such graphs, the algorithm in  SODA'21]  can obtain a spanner of stretch Õ(n 2 3 (1-α) ) using Õ(n 1+α ) measurements for any α ∈ [0, 1]. We prove that, for incidence sketches, this tradeoff is optimal up to an n o(1) factor for all α < 1/10. We prove both our lower bounds by analyzing the "effective resistances" in certain matrix-weighted graphs, where we develop a number of new tools for reasoning about such graphs -most notably (i) a matrix-weighted analog of the widely used expander]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Graph sparsification is a process that reduces the number of edges in a dense graph significantly while preserving certain useful properties. Besides being an interesting problem in its own right, graph sparsification has also been used as a fundamental building block in many modern graph algorithms such as maximum flow and minimum cut algorithms <ref type="bibr">[4]</ref>, <ref type="bibr">[20]</ref>, <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, solvers for graph structured linear systems <ref type="bibr">[6]</ref>, <ref type="bibr">[14]</ref>, <ref type="bibr">[34]</ref>, and graph clustering <ref type="bibr">[1]</ref>, <ref type="bibr">[5]</ref>.</p><p>In addition to designing fast algorithms in the classic computational model, a rich body of work has also studied graph sparsification by linear sketching. In this setting, we can only access the input graph by taking linear measurements, each of which returns a linear combination of the edge weights, and the goal is then to compute a sparsifier of the input graph using as few measurements as possible. To state the previously known results in this setting, let us first recall the definitions of three extensively studied graph sparsifiers that we will study in this work.</p><p>Definition I.1 (Cut sparsifiers). Given a weighted graph G = (V, E, w) and a parameter &#8712; (0, 1), another weighted graph H = (V, F, w ) with F &#8838; E is called a (1 + )-cut sparsifier of G if for every cut (S, V -S), its weight w G (S, V -S) in G and its weight w H (S, V -S) in H satisfy that (1 -)w G (S, V -S) &#8804; w H (S, V -S) &#8804; (1 + )w G (S, V -S).</p><p>Definition I.2 (Spectral sparsifiers). Given a weighted graph G = (V, E, w) with Laplacian matrix L G and a parameter &#8712; (0, 1), another weighted graph H = (V, F, w ) with Laplacian matrix L H and F &#8838; E is called a (1 + )-spectral sparsifier of G if for every vector x &#8712; R n we have (1</p><p>Definition I.3 (Spanners). Given a weighted graph G = (V, E, w) and a parameter t &#8805; 1 (called the stretch), another weighted graph H = (V, F, w ) with F &#8838; E is called a tspanner of G if for every vertex pair u, v, the shortest path length d G (u, v)<ref type="foot">foot_0</ref> between u, v in G and the shortest path length</p><p>A seminal work by Ahn, Guha, and McGregor <ref type="bibr">[2]</ref> showed that one can compute a (1 + )-cut sparsifier of an unweighted graph using &#213;(n -<ref type="foot">foot_1</ref> ) linear measurements, which is nearly optimal. Then subsequent works by Kapralov, Lee, Musco, Musco, and Sidford <ref type="bibr">[17]</ref> and Kapralov, Mousavifar, Musco, Musco, Nouri, Sidford, and Tardos <ref type="bibr">[19]</ref> showed that one can also compute a (1 + )-spectral sparsifier of an unweighted graph using &#213;(n -2 ) linear measurements. Finally, a recent work by Filtser, Kapralov, and Nouri <ref type="bibr">[10]</ref> showed that one can compute an &#213;(n 2 3 )-spanner of an unweighted graph using &#213;(n) linear measurements. <ref type="bibr">[10]</ref> also showed that one can compute an &#213;(n 2 3 (1-&#945;) )-spanner of an unweighted graph using &#213;(n 1+&#945; ) linear measurements for any &#945; &#8712; [0, 1], and conjectured that this tradeoff might be close to optimal.</p><p>In all of these works <ref type="bibr">[2]</ref>, <ref type="bibr">[10]</ref>, <ref type="bibr">[17]</ref>, <ref type="bibr">[19]</ref>, the authors also showed that their linear sketching algorithms can also be applied to computing graph sparsifiers of weighted graphs in dynamic streams, where the input graph is given by a stream of insertions and deletions of weighted edges, and the goal is to compute a sparsifier of the input graph using a small amount of space. In all these works, this is achieved by grouping the edge weights geometrically, and then applying the linear sketching algorithm for unweighted graphs to the subgraph induced by edges in each weight group. Note that this approach crucially requires that if an edge e is inserted with weight w e , then any subsequent deletion of the edge e again reveals its weight and it must be identical to w e . This is crucial to ensuring that each edge e is inserted into or deleted from the same geometric group.</p><p>However, the operation of grouping edges by weight is not linear, and as a result, the above approach for extending unweighted sketches to weighted ones is not implementable in the more general turnstile streams, where the graph is given as a stream of arbitrary edge weight updates. Surprisingly, little seems to be known about graph sparsification in this setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Our results</head><p>In this paper, we initiate the study of weighted graph sparsification by linear sketching. To state our results, we need to introduce some notation first. Let w &#8712; R ( n 2 ) &#8805;0 denote the weights of the edges of the input graph, where w e = 0 means that there is no edge in the edge slot e. A linear sketch of</p><p>, and a (randomized) recovery algorithm A that takes as input &#934;w &#8712; R N and outputs a sparsifier of the graph with edge weights w. Note that, by definition, the linear sketch is non-adaptive.</p><p>We focus on a natural class of linear sketches that we call incidence sketches, in which each row of the sketching matrix &#934; is supported 2 on edges incident on a single vertex (which could be different for different rows). This class captures all linear sketches that are implementable in a distributed computing setting, where the edges are stored across n machines such that machine i has all edges incident on the i th vertex (a.k.a.simultaneous communication model). Moreover, it also covers all aforementioned linear sketches used in previous works for unweighted cut sparsification <ref type="bibr">[2]</ref>, spectral sparsification <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref>, and spanner computation <ref type="bibr">[10]</ref>.</p><p>We now present our results for computing these three kinds of sparsifiers in weighted graphs. When describing our results, we use w max and w min to denote the largest and the smallest non-zero edge weights, respectively, and always assume w max &#8805; 1 &#8805; w min . We also write &#213;(&#8226;) to hide polylog(n, -1 , wmax wmin ) factors. a) Weighted cut sparsification.: We design an incidence sketch with a near-linear number of measurements for computing a (1 + )-cut sparsifier of a weighted graph.</p><p>Theorem I.1 (Algorithm for weighted cut sparsification). For any &#8712; (0, 1), there exists an incidence sketch with random sketching matrix</p><p>) and a recovery algorithm A 1 , such that for any w &#8712; R</p><p>)-cut sparsifier of the graph with edge weights w.</p><p>Thus, we achieve a similar performance as the linear sketch for unweighted graphs given in <ref type="bibr">[2]</ref>, which uses O(n -2 poly(log n)) measurements. It is well known that even to detect the connectivity of a graph, &#8486;(n) linear measurements are needed. Therefore, our upper bound in Theorem I.1 is nearly optimal in n.</p><p>Similar to <ref type="bibr">[2]</ref>, <ref type="bibr">[10]</ref>, <ref type="bibr">[13]</ref>, <ref type="bibr">[17]</ref>, <ref type="bibr">[19]</ref>, by using Nisan's well known pseudorandom number generator <ref type="bibr">[30]</ref>, we can turn our linear sketching algorithm to a low space streaming algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Corollary I.2 (of Theorem I.1).</head><p>There is a single pass turnstile streaming algorithm with &#213;(n -3 ) space that, at any given point of the stream, recovers a (1 + )-cut sparsifier of the current graph with high probability.</p><p>Note that in the turnstile model, the stream consists of arbitrary edge weight updates.</p><p>b) Weighted spectral sparsification.: We design an incidence sketch with about n 6/5 measurements for computing a (1 + )-spectral sparsifier of a weighted graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theorem I.3 (Algorithm for weighted spectral sparsification).</head><p>For any &#8712; (0, 1), there exists an incidence sketch with random sketching matrix &#934; 2 &#8712; R N2&#215;( n 2 ) satisfying N 2 &#8804; &#213;(n 6/5 -4 ) and a recovery algorithm A 2 , such that for any</p><p>w) returns, with probability 1 -1 poly(n) , a (1 + )-spectral sparsifier of the graph with edge weights w.</p><p>Similar to the cut sparsification case, we have the following corollary:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Corollary I.4 (of Theorem I.3).</head><p>There is a single pass turnstile streaming algorithm with &#213;(n 6/5 -4 ) space that, at any given point of the stream, recovers a (1 + )-spectral sparsifier of the current graph with high probability.</p><p>We complement this result by showing that a superlinear number of measurements are indeed necessary for any incidence sketch to recover some O(1)-spectral sparsifier.</p><p>Theorem I.5 (Lower bound for weighted spectral sparsification). There exist constants , &#948; &#8712; (0, 1) such that any incidence sketch of N measurements that computes a (1 + )spectral sparsifier with probability &#8805; 1&#948; on any w must satisfy N &#8805; n 21/20-o (1) .</p><p>Note that this is in sharp contrast to the unweighted case, where a near-linear number of incidence sketch measurements are sufficient for computing an O(1)-spectral sparsifier <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref>. Theorem I.5 also draws a distinction between spectral sparsification and cut sparsification, as for the latter a near-linear number of measurements are enough even in the weighted case c) Weighted spanner computation.: We first show that any o(n 2 ) linear measurements can only recover a spanner of stretch that in general depends linearly on wmax wmin . This differs fundamentally from the case of cut or spectral sparsification, where we can recover an O(1)-sparsifier, whose error is completely independent of wmax wmin , using a sublinear-in-n 2 number of measurements. Specifically, we prove the following proposition.</p><p>Proposition I.6. Any linear sketch (not necessarily an incidence sketch) of N measurements that computes an o( wmax wmin )spanner with probability &#8805; .9 on any w must satisfy N &#8805; &#8486;(n 2 ). This proposition is a consequence of that edge weights are proportional to the edge lengths. More specifically, consider a complete graph where we weight a uniformly random edge by w min and all other n 2 -1 edges by w max . Then, while we can ignore the w min -weight edge in an O(1)-cut or spectral sparsifier, we have to include it in any o( wmax wmin )-spanner, since otherwise the shortest path length between its two endpoints would have been blown up by at least a factor of 2wmax wmin . Now note that, as the weight of this edge is smaller than all other edges, in order to find it we have to essentially recover all entries of the edge weight vector w, which inevitably requires &#8486;(n 2 ) linear measurements <ref type="foot">3</ref> .</p><p>In light of the above proposition, we turn our focus to graphs with wmax wmin = O(1), and study the stretch's optimal dependence on n. On such graphs, the approach in <ref type="bibr">[10]</ref> is able to obtain the following tradeoff between the stretch of the spanner and the number of linear measurements needed.</p><p>Theorem I.7 (Algorithm for weighted spanner computation <ref type="bibr">[10]</ref>). For any constant &#945; &#8712; [0, 1], there exists an incidence sketch with random sketching matrix &#934; 3 &#8712; R N3&#215;( n 2 ) satisfying N 3 &#8804; &#213;(n 1+&#945; ) and a recovery algorithm A 3 , such that for any w &#8712; R ( n 2 ) &#8805;0 with wmax wmin &#8804; O(1), A 3 (&#934; 3 w) returns, with probability 1 -1 poly(n) , an &#213;(n 2 3 (1-&#945;) )-spanner of the graph with edge weights w.</p><p>[10] also conjectured that on unweighted graphs, to obtain a spanner of stretch O(n 2 3 -) for any constant &gt; 0, a superlinear number of measurements are needed for any linear sketch (in other words, the tradeoff is optimal at &#945; = 0). We make progress on this question by showing that this is indeed true for a natural class of linear sketches (i.e. incidence sketches) on "almost" unweighted graphs (i.e. those with wmax wmin = O(1)). In fact, we show that in such a setting, the tradeoff obtained in the above theorem is optimal for all &#945; &lt; 1/10. Theorem I.8 (Lower bound for weighted spanner computation). For any constant &#945; &#8712; (0, 1/10), there exist constants C &#8805; 1, &#948; &#8712; (0, 1) such that any incidence sketch of N measurements that computes an o(n 2 3 (1-&#945;) )-spanner with probability &#8805; 1&#948; on any w with wmax wmin &#8804; C must satisfy N &#8805; n 1+&#945;-o (1) .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Roadmap</head><p>The rest of this paper is structured as follows. Section II gives an overview of our algorithms for weighted cut and spectral sparsification. Then in Section III, we give an overview of our lower bound for weighted spectral sparsification. Note that we do not give a separate overview of our lower bound for weighted spanner computation, because the ideas are similar to the ones described in Section III. The proofs of our main results are omitted here due to space limitation and are included in the full version.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. OVERVIEW OF WEIGHTED CUT AND SPECTRAL</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SPARSIFICATION ALGORITHMS</head><p>A. Overview of the algorithm for weighted cut sparsification a) Recap of the unweighted cut sparsification algorithm in <ref type="bibr">[2]</ref>.: At a high-level, the approach taken is to reduce cut sparsification to (repeatedly) recovering a spanning forest of a subgraph of the input graph, obtained by sampling edges uniformly at some rate p &#8712; (0, 1) known beforehand. This task is then further reduced to the task of sampling an edge connecting S to S for an arbitrary subset S of vertices, as this can be used to create a spanning forest by growing connected components. Now to implement this latter task, in the sketching phase, we apply an 0 -sampler sketch to the incidence vector of each vertex u (i.e. each column of the edge-vertex incidence matrix) in the sub-sampled graph. Then in the recovery phase, in order to recover an edge going out of a vertex set S, we add up the sketches of the vertices inside S. By linearity, this summed sketch is taken over the sum of the incidence vectors of vertices inside S, and the latter contains exactly the edges going out of S, since the edges inside cancel out. As a result, we can recover an edge going out of S, and create a spanning forest of the sub-sampled graph. Note that this approach crucially utilizes the fact that the edges are sampled uniformly at a rate that is known beforehand. This means that we can sample all n 2 edge slots beforehand, and apply the linear sketch only to the sampled edge slots.</p><p>b) Our approach for weighted cut sparsification.: We also reduce the task to recovering a spanning forest in a subsampled graph. However, the latter graph is now obtained by sampling edges non-uniformly. Specifically, we need to recover a spanning forest in a subgraph obtained by sampling each edge e with probability min {w e p, 1} for some parameter p &#8712; (0, 1) that is known beforehand. Therefore, in order to apply the idea as in the unweighted case, we will now need to design a variant of 0 -sampler that, given a vector x &#8712; R N &#8805;0 and a parameter p &#8712; (0, 1), recovers a nonzero entry of x after each entry i &#8712; [N ] is sampled with probability min {x i p, 1}. We call such a sampler "weighted edge sampler".</p><p>Note that the edge weights are not known to us beforehand, so we cannot sample the edge slots with our desired probabilities as in the unweighted graph case. We instead build such a weighted edge sampler using a rejection sampling process, in which we sample edges uniformly at &#213;(1) geometric rates, but use 1 -samplers to try recovering edges at each rate, and only output a recovered edge e if the sampling rate &#8776; w e p. We then show that with high probability, we can efficiently find a desired edge.</p><p>Roughly, our analysis involves proving that there exists a geometric rate q such that, after uniformly sampling edges at rate q, the total weight of edges e satisfying w e p &#8776; q accounts for a large portion of that of all sampled edges. As a result, by using a few independent 1 -samplers, we can find one such edge with high probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Overview of the algorithm for weighted spectral sparsification</head><p>As in previous linear sketches for unweighted graphs <ref type="bibr">[17]</ref>, <ref type="bibr">[19]</ref>, the key task is to recover edges with &#937;(1) effective resistances (or in weighted case, &#937;(1)-leverage scores), which we refer to as heavy edges. The high-level idea used in previous works is to (i) compute, for each vertex pair s, t, a set of vertex potentials x s,t &#8712; R n induced by an electrical flow from s to t, and then (ii) apply an 2 -heavy hitter to</p><p>to try recovering the edge (s, t), where B G is the edge-vertex incidence matrix of G. They achieve (i) by simulating an iterative refinement process in <ref type="bibr">[28]</ref>. To achieve (ii), they make a key observation that</p><p>is the energy of x s,t , and the entry of</p><p>Therefore by the energy minimization characterization of effective resistances, whenever the effective resistance between s, t is b</p><p>2 , and hence the entry (s, t) is an 2 -heavy hitter.</p><p>However, when the graph is weighted, we are only allowed to access the graph through linear measurements on its weight vector w G . As a result, we can only apply 2 -heavy hitters to</p><p>G is the Laplacian matrix of a "squared" graph (call it G sq ), which has the same edges as G, but whose edges are weighted by w 2 e as opposed to w e . Therefore, we will be recovering edges that are heavy in G sq instead of in G if we apply the same approach as in previous works. Unfortunately, a heavy edge in G is not in general heavy in G sq , since the energy on the edges with very large weights will blow up when we square the edge weights (i.e.</p><p>), and hence make the total energy grow unboundedly. To see an intuitive example, suppose G is a "block cycle graph" on n vertices whose edges are generated as follows (see also Figure <ref type="figure">1</ref>):</p><p>1) Partition the vertices into n<ref type="foot">foot_4</ref>/5 blocks S 0 , . . . , S n 4/5 -1 , each with n 1/5 vertices. 2) For each 0 &#8804; i &lt; n 4/5 , add on S i a complete graph of n 1/5 vertices with edge weights n 2/5 , i.e. n 2/5 K n 1/5 . 3) For each 0 &#8804; i &lt; n 4/5 , add on (S i , S i+1 ) a complete bipartite graph of 2n 1/5 vertices with edge weights n 2/5 and bipartition (S i , S i+1 ) 4 , i.e. n 2/5 K n 1/5 ,n 1/5 . 4) Finally, add a "crossing edge" e * of weight 1 between a randomly chosen vertex pair s, t. We note that, in this construction, the crossing edge e * spans &#8486;(n 4/5 ) consecutive blocks, Note that, typically, the crossing edge e * spans &#8486;(n 4/5 ) consecutive blocks, and therefore has effective resistance (and also leverage score) &#8486;(1).</p><p>Proposition II.1. If s &#8712; S i , t &#8712; S j such that min |i -j|, n 4/5 -|i -j| &#8805; &#8486;(n 4/5 ), then the effective resistance of e * satisfies r e * &#8805; &#8486;(1). Proof. Let s, t be the endpoints of the crossing edge e * . By the energy minimization characterization of effective resistances, it suffices to show that there is a set of vertex potentials whose normalized energy with respect to s, t is O(1). Specifically, consider the set of potentials x &#8712; R n such that x u = i n 4/5 for all u &#8712; S i . Then we have x s -x t = &#920;(1), and the total energy</p><p>However, in the squared graph G sq , all edge weights along the cycle are blown up by a factor of n 2/5 , and thus e * only has leverage score O(n -2/5 ) in G sq . To recover in a vector x every entry with 2 -contribution &#8805; n -2/5 x 2 2 , one will need an &#8486;(n -2/5 ) factor blowup in the number of linear measurements, resulting in a total of n 7/5 measurements needed to recover e * .</p><p>We can in fact improve the number of linear measurements needed for recovering e * to &#213;(n 6/5 ) using the vertex sampling trick, an idea first used in <ref type="bibr">[10]</ref> for sketching spanners. Namely, consider sampling a vertex set C &#8834; V by including each vertex with probability n -1/5 /100, and looking at the vertex-induced subgraph G sq [C]. Then one can show that, conditioned on e * &#8712; G sq [C], with constant probability, the two endpoints of e * will be disconnected in G sq [C]\e * . As a result, the leverage score of e * becomes 1 in G sq [C], and we can recover e * by recovering heavy edges in G sq [C], which, as will show, can be done using &#213;(|C|) &#8776; &#213;(n 4/5 ) measurements. Since e * &#8712; G[C] with probability &#8776; 1 n 2/5 , repeating this sampling process independently for &#213;(n 2/5 ) times allows us to recover e * in at least one vertex induced subgraph. This results in a linear sketch of &#213;(n 6/5 ) measurements.</p><p>What if we slightly increase each block's size to n 1/5+&#948; and decrease the edge weights along the cycle to n 2/5-3&#948; ? While one can still verify that the crossing edge e * has leverage score &#8486;(1), applying the same vertex sampling process as above will not disconnect the endpoints of e * with &#8486;(1) probability. However, one can alternatively show that, with constant probability, the number of edges along the cycle reduces by a factor of n 2/5 . Since now the energy of each edge only blows up by a factor smaller than n 2/5 in G sq , this will also make the leverage score of e * become &#8486;(1) in G sq [C], and thus we can apply the same linear sketch of &#213;(n 6/5 ) measurements.</p><p>The above warm-up seems to suggest that the sampling rate of &#8776; n -1/5 is a sweet spot for recovering heavy edges in any graphs with the block cycle structure. Indeed, we prove a key vertex sampling lemma showing that in any weighted graph G, a heavy edge e in G is also likely heavy in a vertexinduced subgraph of G sq obtained by sampling vertices at rate &#8776; n -1/5 . This is proved by carefully analyzing the structures of the edges of different weights after vertex sampling, and then explicitly constructing a set of vertex potentials with small total energy in the induced subgraph. Finally, by integrating this lemma into an iterative refinement process in <ref type="bibr">[28]</ref> (as the authors did in <ref type="bibr">[17]</ref>, <ref type="bibr">[19]</ref>) and a spectral sparsification algorithm in <ref type="bibr">[21]</ref>, we are able to recover a spectral sparsifier of G using &#213;(n 6/5 ) linear measurements.</p><p>We note that this method of recovering heavy edges by vertex sampling is inspired by the one used in <ref type="bibr">[17]</ref> for spanners. However, for spectral sparsification, the correctness of such a method follows from fairly different reasoning, and the proof is arguably more involved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. OVERVIEW OF LOWER BOUND FOR WEIGHTED SPECTRAL SPARSIFICATION</head><p>In this section we give an overview of our lower bound for weighted spectral sparsification (Theorem I.5). We prove our lower bound on a family of hard instances that turn out to have the exact same structure as the one in Figure <ref type="figure">1</ref>, which we used to illustrate the difficulty of recovering spectral sparsifiers in weighted graphs. Specifically, our hard instances are weighted "block cycle graphs" plus an extra crossing edge that is included with probability 1/2. In a block cycle graph, the vertices are partitioned into blocks that are arranged in a cyclic manner. Each block is a complete graph, and the vertices of adjacent blocks are connected by a complete bipartite graph. Here we draw all edge weights from Gaussian distributions and permute the vertices uniformly at random. On such graphs, for a suitable choice of edge weights, computing a spectral sparsifier essentially boils down to detecting the presence/absence of the crossing edge. We then show that for the latter task, the success probability of any incidence sketch can be bounded by the "effective resistance" of the crossing edge in a matrix-weighted graph, where the matrix weights are in turn determined by the sketching matrix.</p><p>We note that this family of hard instances has a similar structure to the ones conjectured in <ref type="bibr">[10]</ref>. However, instead of using Bernoulli distributions on the edges as suggested in <ref type="bibr">[10]</ref>, we use Gaussian distributions, which makes it easier to build the connection to effective resistances.</p><p>In order to show that the effective resistance is small for any incidence sketch with a limited number of measurements, we develop a number of new tools for analyzing such matrixweighted graphs. Most importantly, we present (i) a matrixweighted analog of the widely used expander decomposition of ordinary graphs <ref type="bibr">[12]</ref>, <ref type="bibr">[15]</ref>, <ref type="bibr">[33]</ref>, and (ii) a proof that a random vertex-induced subgraph of a matrix-weighted expander is also an expander with high probability. We highly recommend reading Section III-C to get intuition on why these two techniques are useful, where we use the ordinary graph version of (i) and (ii) to prove a lower bound for a simple class of sketches.</p><p>The rest of this section is structured as follows. In Section III-A we describe the distribution from which we generate our hard instances. In Section III-B we explain how we bound the success probability of an incidence sketch by the effective resistance in a matrix-weighted graph. In Section III-C we prove, as a warm-up, a lower bound for a simple class of sketches, where we only need to analyze the effective resistances in ordinary graphs. Finally in Section III-D we outline our proof for arbitrary incidence sketches, which requires analyzing matrix-weighted graphs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. The hard distribution</head><p>We first state how we generate the input weighted graph G = (V, E, w). Let n be the number of vertices and define s def = n 1/5 and def = n 4/5 . We choose a random permutation &#960; : 1..n &#8594; 1..n and construct a block cycle graph as follows. The i-th block (where 0 &#8804; i &lt; ) consists of vertices &#960;(si + 1), . . . , &#960;(si + s). For simplicity we denote the a-th vertex in the i-th block (i.e. &#960;(si + a)) as u i,a . The block index i will always be modulo implicitly. We then add a complete graph to each block, and a complete bipartite graph between each pair of adjacent blocks. Namely, for each 0 &#8804; i &lt; , we add a graph G i with edges connecting u i,a , u i,b for all a &lt; b &#8712; {1, . . . , s}, and add another bipartite graph G i,i+1 with edges connecting u i,a , u i+1,b for all a, b &#8712; {1, . . . , s}. Finally, with probability 1/2, we add an edge between vertices &#960;(1) and &#960;(n/2+1) (assume n is even). We refer to this edge as the crossing edge with respect to &#960; and any other edge in G as a non-crossing edge with respect to &#960;. We will omit "with respect to &#960;" when the underlying permutation &#960; is clear.</p><p>We next describe how the edge weights are determined. The weights of all non-crossing edges are drawn independently from N (8n 2/5 , n 4/5 log -1 n) (the Gaussian distribution with mean 8n 2/5 and variance n 4/5 log -1 n). The weight of the crossing edge is drawn from the standard Gaussian N (0, 1). If the crossing edge has negative weight, we say the input is invalid, and accept any sketch as a valid sketch. Our goal will be to detect the presence/absence of the crossing edge with high probability.</p><p>In the following, we will call the conditional distribution on the presence of the crossing edge the Yes distribution, and call the conditional distribution on the absence of the crossing edge the No distribution. We then show that with high probability, the effective resistance of the crossing edge is large, and therefore any linear sketch for computing spectral sparsifiers must distinguish between the two distributions with good probability.</p><p>Proposition III.1. With probability at least 1 -1/n, all noncrossing edges have weights in the range [4n 2/5 , 12n 2/5 ], and as a result the effective resistance between vertices &#960;(1) and &#960;(n/2 + 1) is at least 1/48 in a No instance.</p><p>Proposition III.2. Any linear sketch that can compute a 1.0001-spectral sparsifier with probability 0.9 can distinguish between the Yes and No distributions with probability 0.6.</p><p>The first proposition follows from an application of the Chernoff bound. The proof of the second proposition is deferred to the full version.</p><p>In the following, we will assume, for ease of our analysis, that the sketch will be given the permutation &#960; after computing the linear sketch. That is, the recovery algorithm A takes as input both &#934;w and &#960;. We will show that even with this extra piece of information, any incidence sketch with n 21/20- measurements for constant &gt; 0 cannot distinguish between the Yes and No distributions with high probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. A bound on the success probability via effective resistance</head><p>We first show that for our lower bound instance, any incidence sketch can be reduced to a more restricted class of linear sketches by only increasing the number of measurements by an O(log n) factor. Specifically, let us fix an arbitrary orientation of the edges, and consider sketches taken over the weighted signed edge-vertex incidence matrix B w &#8712; R ( n 2 )&#215;n , where the latter is given by</p><p>w e e &#8712; E and u is e's head -w e e &#8712; E and u is e's tail 0 otherwise.</p><p>That is, the algorithm must choose a (random) sketching matrix &#934; &#8712; R k&#215;( n 2 ) with the e th column &#966; e &#8712; R k corresponding to the edge slot e. The sketch obtained is then &#934;B w &#8712; R k&#215;n . Notice that the total number of measurements in &#934;B w is kn, as each vertex applies the sketching matrix &#934; to its incident edges. Let us call this class of sketch signed sketches. By Yao's minimax principle <ref type="bibr">[35]</ref>, to prove a lower bound for distinguishing the Yes and No distributions, it suffices to focus on deterministic sketches. The proof of the proposition below appears in the full version.</p><p>Proposition III.3 (Reduction to signed sketches). Consider any incidence sketch of N measurements with a deterministic sketching matrix &#934; &#8712; R N &#215;( n 2 ) and a recovery algorithm A that, given &#934;w and &#960;, distinguishes between the Yes and No distributions with probability at least 0.6. Then there exists a signed sketch with a sketching matrix &#934; &#8712; R k&#215;( n 2 ) , where k = O(1) &#8226; max{1, N log n n }, and a recovery algorithm A that, given &#934; B w and &#960;, distinguishes between the Yes and No distributions with probability at least 0.55.</p><p>Let us now fix a sketching matrix &#934; &#8712; R k&#215;( n 2 ) and aim to obtain an upper bound on the success probability of any signed sketch using &#934;. For notational convenience, let us write (&#934;B w ) yes to denote &#934;B w conditioned on the presence of the crossing edge and (&#934;B w ) no to denote &#934;B w conditioned on the absence of the crossing edge. We will also write (&#934;B w ) &#960;,yes or (&#934;B w ) &#960;,no to denote an extra conditioning on the permutation being &#960; in addition to the presence/absence of the crossing edge. Then to bound the success probability of any signed sketch using &#934; , it suffices to show that the total variation distance (TV-distance) between (&#934;B w ) yes and (&#934;B w ) no is small.</p><p>To state our upper bound on the TV-distance, we need to first introduce some notation. For an edge (u, v), define b uv &#8712; R nk by writing it as a block vector (with block size k) as follows:</p><p>where &#966; uv &#8712; R k is the column of &#934; corresponding to the edge slot (u, v). For a permutation &#960;, we then define</p><p>The following proposition is essentially a consequence of the result in <ref type="bibr">[8]</ref>, which bounds the TV-distance between multivariate Gaussians with the same mean. We give its proof in the full version. Note that we use &#8224; to denote taking the Moore-Penrose pseudoinverse of a matrix.</p><p>Proposition III.4. For any permutation &#960; such that b &#960;(1)&#960;(n/2+1) is in the range 5 of L &#960; ,</p><p>) . Our plan is then to show that b &#960;(1)&#960;(n/2+1) L &#8224; &#960; b &#960;(1)&#960;(n/2+1) is small on average for every choice of a signed sketch with k = n 1/20-for constant &gt; 0.</p><p>Note that if k = 1 and each &#966; uv &#8712; {0, 1}, then L &#960; is exactly the Laplacian matrix of the graph (call it H &#960; ) that is formed by the non-crossing edges (u, v) such that &#966; uv = 1, where each edge is weighted by n 4/5 log -1 n. Therefore, b &#960;(1)&#960;(n/2+1) L &#8224; &#960; b &#960;(1)&#960;(n/2+1) is the effective resistance between &#960;(1) and &#960;(n/2 + 1) in H &#960; , if &#966; &#960;(1)&#960;(n/2+1) = 0 (otherwise b &#960;(1)&#960;(n/2+1) is the zero vector).</p><p>In fact, to get a quick intuition as to why we should expect the effective resistance between &#960;(1) and &#960;(n/2 + 1) to be small, let us assume &#966; uv = 1 for all edge slots (u, v). Then, for any permutation &#960;, H &#960; is the graph formed by all noncrossing edges, each weighted by n 4/5 log -1 n. Note that these weights are about n 2/5 times larger than the weights &#920;(n 2/5 ) in the actual input graph (Proposition III.1). As a result, the effective resistance between &#960;(1) and &#960;(n/2+1) is about n 2/5 times smaller than the effective resistance between them in the input graph (the former roughly equals n -2/5 ).</p><p>When k &gt; 1, we can view L as the Laplacian of a matrixweighted graph (again, call it H &#960; ) formed by the non-crossing edges, where each edge (u, v) has a k &#215; k matrix weight n 4/5 log -1 n &#966; uv &#966; T uv . Now b &#960;(1)&#960;(n/2+1) L &#8224; &#960; b &#960;(1)&#960;(n/2+1) can 5 Recall that the range of a symmetric matrix is the linear span of its columns.</p><p>be seen as the (generalized) effective resistance between &#960;(1) and &#960;(n/2 + 1) in H &#960; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Warm-up: one-row signed sketches have small TV-distance</head><p>As a warm-up, we show that for any signed sketch, in the case that k = 1 and the sketching matrix &#934; has 0/1 entries, we have, for any constant &gt; 0,</p><p>Proposition III.4, we know that d TV ((&#934;B w ) &#960;,yes , (&#934;B w ) &#960;,no ) can be bounded by the effective resistance between &#960;(1), &#960;(n/2 + 1) in H &#960; if &#966; &#960;(1)&#960;(n/2+1) = 1, and is zero otherwise. Here H &#960; is formed by the non-crossing edges whose &#966; uv = 1, where each edge (u, v) has scalar weight n 4/5 log -1 n. We can focus on the &#934;'s whose number of nonzero entries is at least n 9/5+ , since otherwise</p><p>and we would already have our desired result <ref type="bibr">(2)</ref>.</p><p>Our proof of (2) will rely on decomposing H &#960; into expanders with large minimum degree. Since H &#960; 's edges all have the same weight n 4/5 log -1 n, it is more convenient to work with the unweighted version of H &#960; , which we denote by H &#960; . We now briefly review the definition of unweighted expanders, as well as state a known expander decomposition lemma that we will utilize.</p><p>Definition III.1 (Expander). An unweighted graph H = (V, E) is a &#950;-expander for some &#950; &#8712; [0, 1] if its conductance is at least &#950;, namely, for every nonempty S &#8834; V , we have</p><p>where vol(S) is the total degree of vertices in S.</p><p>Note that in the lemma below, we slightly abuse the notion of "regular graphs". Specifically, we will say a graph is regular if its minimum vertex degree d min is not much smaller than the average degree d.</p><p>Lemma III.5 (Almost regular expander decomposition, see e.g. <ref type="bibr">[16]</ref>). Given an unweighted graph H = (V, E) with average degree d &#8805; 16, there exists a subgraph I = (U, F ) where U &#8838; V and F &#8838; E such that I is a 1 16 log n -expander with minimum degree d min &#8805; d 16 . We will also need the following lemma, which shows that a random vertex-induced subgraph of an expander with large minimum degree is almost certainly an expander. We give the proof of this lemma in the full version. To the best of our knowledge, even this result was not known before.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma III.6 (Expanders are preserved under vertex sampling).</head><p>There exists a &#952; = &#952;(n) = n o (1) with the following property. Consider an unweighted</p><p>vertex subset of size s. Then with probability at least 1-1/n 7 , the vertex-induced subgraph H[C] is a 1 n o(1) -expander with minimum degree at least s 2n &#8226; d min . Proof of (2) using Lemmas III.5,III.6. As argued above we can assume w.l.o.g. that nnz(&#934;) &#8805; n 9/5+ . We want to obtain, for each edge slot e satisfying &#966; e = 1, conditioned on e being the crossing edge w.r.t. &#960;, an upper bound (call it u e ) on the typical effective resistance between the endpoints of e in the graph H &#960; . In other words, conditioned on e being the crossing edge, u e should be an upper bound on the effective resistance between the endpoints of e in H &#960; with high probability over &#960;. Then the total variation distance between (&#934;B w ) yes and (&#934;B w ) no can be bounded by</p><p>To obtain the u e 's, let us define the unweighted graph H &#966; = (V, E &#966; ) where E &#966; contains all edges e whose &#966; e = 1 (including the ones not present in the input graph, i.e. |E &#966; | = nnz(&#934;)). Now consider the following process, where we repeatedly delete an expander subgraph from H &#966; and obtain u e 's for the edges in the expander.</p><p>To show that u e 's are valid upper bounds, let us consider a fixed iteration of the while loop. For i = 0, . . . , n 4/5 -1, let U i denote the vertices in I that are in the i th block of the input block cycle graph:</p><p>Then by Chernoff bounds, with probability at least 1 -1/n 5 over the random choice of &#960;, we have</p><p>Then by invoking Lemma III.6, with probability at least 1-1/n 4 over &#960;, all vertex-induced subgraphs I[U i &#8746;U i+1 ] are 1  n o(1) -expanders with minimum degree at least</p><p>16n 9/5 . Using this fact, we obtain the following claim, whose proof appears in the full version.</p><p>Claim III.7. For each edge f &#8712; F , conditioned on f being the crossing edge, with probability at least 1 -1/n 2 over &#960;, the effective resistance between the endpoints of f in H &#960; is at most u f . Now let us divide the above process for obtaining u e 's into O(log n) phases, where in phase i &#8712; {1, . . . , O(log n)}, we have</p><p>]. Then we have e:&#966;e=1 u e = e:&#966;e=1</p><p>where in the first line we have used n O( ) to hide the constant factors, and the last inequality holds since in the last phase we have 2 i &#8804; n 1/5 . Plugging this into (3) finishes the proof.</p><p>D. The general case: proof of Theorem I.5</p><p>Note that even though for k = 1, the TV-distance is &#213;(n -1/5 ), this does not imply that k must be large for the TV-distance to become &#8486;(1).</p><p>By Proposition III.3, in order to prove Theorem I.5, it suffices to prove the following:</p><p>Theorem III.8. For any fixed sketching matrix &#934; &#8712; R k&#215;( n 2 ) where k &#8804; n 1/20-for some constant &gt; 0, we have</p><p>By Proposition III.4, our goal is to bound the "effective resistance" b &#960;(1)&#960;(n/2+1) L &#8224; &#960; b &#960;(1)&#960;(n/2+1) between vertices &#960;(1), &#960;(n/2 + 1) in the matrix-weighted graph H &#960; consisting of the non-crossing edges, where edge (u, v) has matrix weight n 4/5 log -1 n &#966; uv &#966; T uv &#8712; R k&#215;k . We will do so by (significantly) generalizing our previous approach based on expander decomposition for ordinary graphs in Section III-C. Our approach for the k = 1 case essentially consists of two steps: (i) decomposing the graph H &#966; into large expander subgraphs and (ii) proving that a random vertex induced subgraph of an expander is still an expander.</p><p>First note that there does not appear to be a combinatorial analog of conductance in matrix-weighted graphs, which suggests that we should define expanders in an algebraic way. Let us first recall the algebraic characterization of expanders for ordinary, unweighted graphs. The definition is based on eigenvalues of the normalized Laplacian of the graph, which is given by N = D -1/2 LD -1/2 , where D is a diagonal matrix with D uu equal to the degree d u of u.</p><p>Definition III.2 (Algebraic definition of ordinary, unweighted expanders). An unweighted graph H is a &#950;-expander for some &#950; &#8712; [0, 1] if the smallest nonzero eigenvalue of its normalized Laplacian matrix N is at least &#950;. By Cheeger's inequality <ref type="bibr">[3]</ref>, for &#950; &#8805; &#937;(1), this definition translates to that the graph H is a union of vertex-disjoint combinatorial expanders, each with conductance &#937;(1). To come up with an analogous definition for matrix-weighted graphs, let us first define their associated matrices formally. a) Matrices associated with matrix-weighted graphs.: We consider a k &#215; k matrix-weighted graph H = (V, E) with |V | = n. For each edge (u, v) &#8712; E, there is a vector &#966; uv &#8712; R k , indicating that (u, v) is weighted by the k &#215; k rank-1 matrix &#966; uv &#966; T uv . Definition III.3 (Degree matrices). For a vertex u, its generalized degree is given by</p><p>We then define the nk &#215; nk degree matrix D as a block diagonal matrix (with block size k &#215; k), with the u th block on the diagonal being</p><p>Definition III.4 (Laplacian matrices). The Laplacian matrix is given by L = (u,v)&#8712;E b uv b T uv , where b uv 's are defined in <ref type="bibr">(1)</ref>.</p><p>We will call b uv the incidence vector of edge (u, v).</p><p>Definition III.5 (Normalized Laplacian matrices). The normalized Laplacian matrix is given by</p><p>. We will call D &#8224;/2 b uv the normalized incidence vector of edge (u, v).</p><p>The following proposition says that similar to scalarweighted graphs, the eigenvalues of the normalized Laplacian of a matrix-weighted graph are also between [0, 2]. The proof this proposition appears in the full version.</p><p>Proposition III.9. The eigenvalues of N are between [0, 2]. Now, a first attempt might be to define matrix-weighted expanders to also be graphs whose normalized Laplacians' nonzero eigenvalues are large, and then try to decompose any matrix-weighted graph into large expander subgraphs. However, we show that the latter goal may not be achievable in general, by presenting in the full version a hard instance, for which any large subgraph has a small nonzero eigenvalue. b) Our approach.: In light of the hard instance, we loosen the requirement of being an expander by allowing small eigenvalues, but requiring instead that each edge, compared to the average, does not have too large "contribution" to the small eigenvectors. Formally, we want that every edge's normalized incidence vector has small (weighted) projection onto the bottom eigenspace. We will also need an analog of "almost regularity", which for ordinary, unweighted graphs says that the minimum degree is large. We give the formal definition of an almost regular matrix-weighted expander below.</p><p>Definition III.6 (Almost regular matrix-weighted expanders). For a k &#215; k matrix-weighted graph H, let &#955; 1 &#8804; . . . &#8804; &#955; nk be the eigenvalues of its normalized Laplacian N , and let f 1 , . . . , f nk &#8712; R nk be a set of corresponding orthonormal eigenvectors. We say H is a (&#947;, &#950;, &#968;)-almost regular expander if 1) (&#947;-almost regularity) For every vertex u and every incident edge (u, v) &#8712; E, we have</p><p>2) ((&#950;, &#968;)-expander) For every edge (u, v) &#8712; E we have</p><p>(</p><p>The LHS of ( <ref type="formula">4</ref>) is the so-called leverage score of &#966; uv w.r.t. D u . It is known that the sum of leverage scores equals the rank of the matrix: Proposition III.10. For any fixed vertex u,</p><p>Therefore, in the case that u has &#8486;(n) incident edges, ( <ref type="formula">4</ref>) is essentially saying that no incident edge's leverage score exceeds the average by too much.</p><p>To get intuition for condition (5), we need the following two results, whose proofs appear in the full version.</p><p>Theorem III.11. Let H be a k&#215;k-matrix weighted graph that is &#947;-almost regular (in the sense of ( <ref type="formula">4</ref>)). Then for any &#950; &#8712; (0, 1), the number of eigenvalues of its normalized Laplacian that are between (0, &#950;] is at most &#947;&#8226;k 2</p><p>(1-&#950;) 2 . Proposition III.12. Let be the number of &#955; i 's that are between (0, &#950;]. Then</p><p>Therefore, in the case that |E| = &#8486;(n 2 ), (5) is essentially saying that the LHS for every edge (u, v) does not exceed the average by too much.</p><p>We then show that every dense enough matrix-weighted graph can indeed be made into an expander by downscaling a small number of edges. To this end, let us define, for a scaling s : E &#8594; [0, 1], the rescaled graph H s , which is obtained from H by rescaling each edge (u, v)'s weight to s 2 uv &#966; uv &#966; T uv . The proof of the following theorem appears in the full version.</p><p>Theorem III.13. There is an algorithm that, given any k</p><p>2) The number of edges (u, v) &#8712; E with s uv &lt; 1 is o(n 2 ).</p><p>We next show that almost-regular expanders are preserved under vertex sampling. However, we will now use a different notion of "preservation". To state our specific result, let us define some additional notations. For a vertex subset C &#8838; V , we write L G[C] to denote the Laplacian of the vertex-induced subgraph G[C]. We also let D CC be the submatrix of D (the degree matrix of the original graph H) with rows and columns restricted to vertices in C, and let (f i ) C denote, for an eigenvector f i , the vector f i with indices restricted to C. We then have the following theorem, whose proof appears in the full version.</p><p>Theorem III.14. There exists a &#952; = &#952;(n) &#8804; n o (1) with the following property. Let H = (V, E) be a k &#215; k matrix-weighted, (&#947;, &#950;, &#968;)-almost regular expander where &#950; &#8804; 1/ log n. For an s &#8805; 2 &#8226; 10<ref type="foot">foot_5</ref> &#947;&#968;&#950; -1 k 2 &#952;(n), let C &#8838; V be a uniformly random vertex subset of size s. Then with probability at least 1-1/n 5 , we have that 1) The null space of D &#8224;/2</p><p>We argue that ( <ref type="formula">7</ref>) is roughly saying that the pseudoinverse of the subgraph G[C] can be bounded by the pseudoinverse of the original graph that is (i) restricted to indices in C and (ii) rescaled in a certain way. For technical reasons, on the LHS of <ref type="bibr">(7)</ref> we normalize the Laplacian of the vertex-induced subgraph using the degree matrix of the original graph H. As for the RHS, we can see it as a rescaled version of the pseudoinverse of N restricted to C, by noting that</p><p>Thus, on the RHS of (7) we blow up the small eigenvalues quadratically in 1/(sampling rate), but blow up the large eigenvalues linearly in 1/(sampling rate). With these tools, we are finally able to prove Theorem I.5. 1) Techniques for proving Theorems III.11, III.13, III.14: We now explain, at a very high level, the techniques that we use to prove these three key theorems, as well as their connections to previous works. More details can be found in the subsequent sections.</p><p>a) Proof of Theorem III.11.: We consider the "spectral embedding" induced by the bottom eigenvectors of the normalized Laplacian, which maps each vertex to a rectangular matrix. Such a spectral embedding may be seen as the matrixweighted counterpart of the ones for scalar-weighted graphs, which map each vertex to a vector. The latter embeddings were previously used to prove higher-order Cheeger inequalities <ref type="bibr">[24]</ref>, <ref type="bibr">[29]</ref>. We show that, for matrix-weighted graphs that are almost regular (in the sense of (4)), the spectral embedding has vertex-wise bounded spectral norm, and as a result the number of bottom eigenvectors must be small. b) Proof of Theorem III.13.: Our proof consists of two steps: (i) decomposing the graph into an almost regular graph, and (ii) decomposing the graph into an almost regular expander. In achieving (ii), we actually invoke (i) repeatedly to maintain the almost regularity of the graph.</p><p>As noted above, the almost regularity condition (4) is essentially saying that no incident edge has leverage score too large comparing to the average. A similar task to (i) has in fact been investigated by a previous work <ref type="bibr">[7]</ref>, where the authors showed that given a set of vectors, one can, by downscaling a small number of them, make every vector have small leverage score comparing to the average. This result is achieved by an algorithm that iteratively downscales vectors with large leverage scores, while analyzing how each vector's leverage score changes in the process. While it is possible to directly invoke the result from <ref type="bibr">[7]</ref> to get a large almost regular graph, its guarantee does not suffice for our purpose of smoothly incorporating (i) into (ii). In particular, since we will repeatedly invoke (i) in (ii), we need, in addition to that the number of rescaled edges is small, an extra bound on the number of completely deleted edges (i.e. those rescaled to 0) that is proportional to the rank change of the degree matrix D. As a result, we design a more involved algorithm for obtaining the scaling as well as carry out a more careful analysis of the algorithm.</p><p>Achieving (ii) turns out to be much more challenging. Although the LHS of (5) may be seen as a leverage score, there is the intrinsic difficulty that whenever the edge weights change, so do the eigenvalues and eigenvectors of the normalized Laplacian, as well as the degree matrix itself (hence also the normalized incidence vectors). Thus it is not clear how the LHS of (5) will change. As a result, when trying to obtain a desired scaling, we have to use some global measure of progress. This is in contrast to (i), where we can track the leverage score change of each edge locally. We resolve this issue by considering, as a potential function, the determinant of the normalized Laplacian restricted to the bottom eigenspace.</p><p>In other words, our potential function is the product of the eigenvalues of N that are between (0, &#950;] 6 . We show that, by a delicate global analysis of such a potential function, we are able to make the graph an expander by only downscaling a small number of edges. c) Proof of Theorem III.14.: Our proof is motivated by the approximate Gaussian elimination of the Laplacian matrices of scalar-weighted graphs, which was previously used as an algorithmic tool for solving graph structured linear systems <ref type="bibr">[22]</ref>, <ref type="bibr">[23]</ref> and building data structures for dynamically maintaining effective resistances <ref type="bibr">[9]</ref>, <ref type="bibr">[25]</ref>, <ref type="bibr">[27]</ref>. Our approach also relies on analyzing matrix-valued martingales which played a key role in <ref type="bibr">[23]</ref>. which have played key roles in constructing vertex/subspace sparsifiers <ref type="bibr">[11]</ref>, <ref type="bibr">[23]</ref>, <ref type="bibr">[26]</ref>.</p><p>Let us first briefly review the Gaussian elimination of the Laplacian matrix of a scalar-weighted graph. Roughly speaking, by eliminating the row and column of L corresponding to a vertex u, we can obtain another Laplacian matrix L supported on V \ {u} whose pseudoinverse equals the pseudoinverse of the original L restricted to V \ {u} (i.e. (L ) &#8224; = (L &#8224; ) V \{u}V \{u} ). Given a vertex subset C &#8838; V , one can also eliminate the vertices outside of C one by one and get a Laplacian matrix L supported on C with the same property that (L ) &#8224; = (L &#8224; ) CC . The matrix L is referred to as the Schur complement of L onto C. However, the graphs associated with L and L could be dense, which are inefficient for algorithm design. Therefore <ref type="bibr">[23]</ref> showed that one can perform an approximate Gaussian elimination by, upon each elimination, implicitly sub-sampling the edges in L . They then showed that we eventually get a good approximation to L by analyzing a matrix-valued martingale induced by this process.</p><p>We now explain how to apply this idea to prove Theorem III.14. Since we are considering an induced subgraph G[C] where C is a uniformly random subset of size s, we can also view the process for choosing C as deleting a sequence of ns vertices from V uniformly at random. Our goal is to compare the pseudoinverse of G[C] with that of the original graph, therefore it suffices to compare it with the Schur complement of L onto C. We will in fact do such a comparison upon the elimination of every vertex. That is, if we let C i be the set of remaining vertices at the i th step, then we want to compare the Laplacian of G[C i ] with the Schur complement of L onto C i . At a high level, we do so by setting up a matrix-valued martingale, and show that it has good concentration when G is a matrix-weighted expander (in the sense of Definition III.6).</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Note that in a weighted graph, the length of a path is defined as the total edge weight along the path.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Recall that the support of a vector is the set of indices at which it is non-zero.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>In our actual proof of the proposition, we have to add some random Gaussian noise to each edge's weight for the lower bound to carry out.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_3"><p>Authorized licensed use limited to: University of Pennsylvania. Downloaded on July 17,2023 at 22:39:31 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4"><p>Note that we consider i + 1 as 0 when i = n 4/5 -1.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5"><p>Due to technical reasons, the actual potential function slightly differs from the one stated here.</p></note>
		</body>
		</text>
</TEI>
