<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Convergence to Lexicographically Optimal Base in a (Contra)Polymatroid and Applications to Densest Subgraph and Tree Packing</title></titleStmt>
			<publicationStmt>
				<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>
				<date>01/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10500362</idno>
					<idno type="doi">10.4230/LIPICS.ESA.2023.56</idno>
					<title level='j'>31st Annual European Symposium on Algorithms, ESA 2023, September 4-6, 2023, Amsterdam, The Netherlands</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Elfarouk Harb</author><author>Kent Quanrud</author><author>Chandra Chekuri</author><author>Inge Li Gørtz</author><author>Martin Farach-Colton</author><author>Simon J. Puglisi</author><author>Grzegorz Herman</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Boob et al. [Boob et al., 2020] described an iterative peeling algorithm called Greedy&#43;&#43; for the Densest Subgraph Problem (DSG) and conjectured that it converges to an optimum solution. Chekuri, Qaunrud and Torres [Chandra Chekuri et al., 2022] extended the algorithm to supermodular density problems (of which DSG is a special case) and proved that the resulting algorithm Super-Greedy&#43;&#43; (and hence also Greedy&#43;&#43;) converges. In this paper we revisit the convergence proof and provide a different perspective. This is done via a connection to Fujishige’s quadratic program for finding a lexicographically optimal base in a (contra) polymatroid [Satoru Fujishige, 1980], and a noisy version of the Frank-Wolfe method from convex optimization [Frank and Wolfe, 1956; Jaggi, 2013]. This yields a simpler convergence proof, and also shows a stronger property that Super-Greedy&#43;&#43; converges to the optimal dense decomposition vector, answering a question raised in Harb et al. [Harb et al., 2022]. A second contribution of the paper is to understand Thorup’s work on ideal tree packing and greedy tree packing [Thorup, 2007; Thorup, 2008] via the Frank-Wolfe algorithm applied to find a lexicographically optimum base in the graphic matroid. This yields a simpler and transparent proof. The two results appear disparate but are unified via Fujishige’s result and convex optimization.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In this paper we consider iterative greedy algorithms for two di erent combinatorial optimization problems and show that the convergence of these algorithms can be understood by combining two general tools, one coming from the theory of submodular functions, and the other coming from convex optimization. This yields simpler proofs via a unified perspective, while also yielding additional properties that were previously unknown.</p><p>Densest subgraph and supermodularity. We start with the problem that motivated this work, namely, the densest subgraph problem (DSG). The input to DSG is an undirected graph G = (V, E) with m = |E| and n = |V |. The goal is to return a subset S &#8482; V that maximizes |E(S)|</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>|S|</head><p>where E(S) = {uv oe E : u, v oe S} is the set of edges with both end points Faster algorithms, Greedy and Greedy++. Although DSG is polynomial-time solvable via maxflow or submodular function minimization, the corresponding algorithms are not yet practical for the large graphs that arise in many applications; this is despite the fact that we now have very fast theoretical algorithms for maxflow and mincost flow <ref type="bibr">[12]</ref>. For this reason there has been considerable interest in fast (approximation) algorithms. More than 20 years ago Charikar <ref type="bibr">[9]</ref> showed that a simple "peeling" algorithm (Greedy) yields a 1/2-approximation for DSG. An ordering of the vertices as v i1 , v i2 , . . . , v in is computed as follows: v i1 is a vertex of minimum degree in G (ties broken arbitrarily), v i2 is a minimum degree vertex in G &#8800; v i1 and so on 1 . After creating the ordering, the algorithm picks the best su x, in terms of density, among the n-possible su xes of the ordering. Charikar also developed a simple exact LP relaxation for DSG. Charikar's results have been quite influential. Greedy can be implemented in (near)-linear time and has also been adapted to other variants. The LP relaxation has also been used in several algorithms that yield a (1 &#8800; ')-approximate solution <ref type="bibr">[5,</ref><ref type="bibr">8]</ref>, and has led to a flow-based (1 &#8800; ')-approximation <ref type="bibr">[10]</ref>. More recently, Boob et al. <ref type="bibr">[7]</ref> developed an algorithm called Greedy++ that is based on combining Greedy with ideas from multiplicative weight updates (MWU); the algorithm repeatedly applies a simple peeling algorithm with the first iteration coinciding with Greedy but later iterations depending on a weight vector that is maintained on the vertices -the formal algorithm is described in a later section. The advantage of the algorithm is its simplicity, and Boob et al. <ref type="bibr">[7]</ref> showed that it has good empirical performance. Moreover they conjectured that Greedy++ converges to a (1 &#8800; ')-approximation in O(1/' 2 ) iterations.</p><p>Although their strong conjecture is yet unverified, Chekuri, Quanrud and Torres <ref type="bibr">[10]</ref> proved that Greedy++ converges in O( log |V | ' 2 &#8260; &#250; (G) ) iterations where is the maximum degree of G and &#8260; &#250; (G) is the optimum density.</p><p>The convergence proof in <ref type="bibr">[10]</ref> is non-trivial and relies crucially in considering DSS and supermodularity. <ref type="bibr">[10]</ref> shows that Greedy and Greedy++ can be generalized to SuperGreedy and SuperGreedy++ for DSS, and that SuperGreedy++ converges to a (1 &#8800; ')-approximation solution in O(f /' 2 ) iterations wheref depends (only) on the function f . Dense subgraph decomposition and connections. As we discussed, DSG is a special case of DSS and hence DSG inherits certain nice structural properties from supermodularity. One of these is the fact that the vertex set V of every graph G = (V, E) admits a unique decomposition into S 1 , S 2 , . . . , S k for some k using the following procedure: S 1 is the vertex set of the unique maximal densest subgraph, S 2 is the unique maximal densest subgraph after "contracting" S 1 , and so on. The existence of such a unique decomposition is more transparent in the setting of DSS. The fact that there is a unique maximal densest set S 1 follows from supermodularity; if A and B are optimum dense sets then so is A fi B. One can then consider a new supermodular function</p><p>The new function is also supermodular. Then S 2 is the unique maximal densest set for f S1 . We iterate this process until we obtain an empty set. The decomposition also allows us to assign a density value &#8260; v to each v oe V (which corresponds to the density of the set when v is in the maximal set). We call this the density vector associated with f . Dense decompositions follow from the theory of principal partitions of submodular functions <ref type="bibr">[35,</ref><ref type="bibr">36,</ref><ref type="bibr">20]</ref>. In the context of graphs and DSG this was rediscovered by Tatti and Gionis who called it the locally-dense decomposition <ref type="bibr">[45,</ref><ref type="bibr">44]</ref>, and gave algorithms for computing it. Subsequently, Danisch et al. <ref type="bibr">[14]</ref> applied the well-known Frank-Wolfe algorithm for constrained convex optimization to a quadratic program derived from Charikar's LP relaxation for DSG. More recently, Harb et al. <ref type="bibr">[24]</ref> obtained faster algorithms for computing the dense decomposition in graphs via Charikar's LP; they used a di erent method called FISTA for constrained convex optimization based on acceleration. Although DSS was not the main focus, <ref type="bibr">[24]</ref> also made an important connection to Fujishige's result on lexicographically optimal base in polymatroids <ref type="bibr">[18]</ref> which elucidated the work of Danisch et al. on DSG. We describe this next.</p><p>Lexicographical optimal base and dense decomposition. We briefly describe Fujishige's result <ref type="bibr">[18]</ref> and its connection to dense decompositions. Let f : 2</p><p>Following Edmonds, the polymatroid associated with f , denote by P f is the polyhedron</p><p>The base polyhedron associated with f , denote by B f , is the polyhedron</p><p>inequalities are reversed), and similarly B f is the base contrapolymatroid obtained by intersecting P f with equality constraint x(V ) = f (V ). Fujishige proved that there exists a unique lexicographically minimal base in any polymatroid, and morover it can found by solving the quadratic program: min q v x 2 v s.t x oe B f . In the context of supermodular functions, one obtains a similar result; E S A 2 0 2 3</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>56:4</head><p>Convergence to Lexicographically Optimal Base and Applications the quadratic program min q v x 2 v s.t x oe B f where B f is contrapolymatroid associated with f has a unique solution. As observed explicity in <ref type="bibr">[24]</ref>, the lexicographically optimal base gives the dense decomposition vector for DSS. That is, if x &#250; is the optimal solution to the quadratic program then for each v, x &#250; v = &#8260; v . In particular, as noted in <ref type="bibr">[24]</ref>, one can apply the well-known Frank-Wolfe algorithm to the quadratic program and it converges to the dense decomposition vector. As we will see later, each iteration corresponds to finding a maximum weight base in a contrapolymatroid which is easy to find via the greedy algorithm.</p><p>(Ideal) Tree packings in graphs and the Tutte-Nash-Williams theorem. Our discussion so far focused on DSG. Now we describe a di erent problem on graphs and relevant background. Our goal is to present a unified perspective on these two problems. The well-known Tutte-Nash-Williams theorem in graph theory (see <ref type="bibr">[41]</ref>) establishes a min-max result for the maximum number of edge-disjoint spanning trees in a multi-graph G. Given an undirected graph G = (V, E), and a partition P of the vertices, let E(P ) denote the set of edges crossing the partition. The strength of a partition P is defined as We note that the graph theoretic result is a special case of matroid base packing. Tree packings are useful for a number of applications. In particular, Karger <ref type="bibr">[26]</ref> used tree packings and other ideas in his well-known near-linear randomized algorithm for computing the global minimum cut of a graph. We are mainly concerned here with Thorup's work in <ref type="bibr">[46,</ref><ref type="bibr">47]</ref> that was motivated by dynamic mincut and k-cut problems. He defined the so-called ideal edge loads and ideal tree packing (details in later section) by recursively decomposing the graph via Tutte-Nash-Williams partitions <ref type="bibr">[46]</ref>. He also proved that a simple iterative greedy tree packing algorithm converges to the ideal loads <ref type="bibr">[47]</ref>. He used the approximate ideal tree packing to obtain new deterministic algorithms for the k-cut problem, and his approach has been quite influential in a number of subsequent results <ref type="bibr">[21,</ref><ref type="bibr">11,</ref><ref type="bibr">31,</ref><ref type="bibr">29,</ref><ref type="bibr">23]</ref>. Thorup obtained his tree packing result from first principles. We ask: is there a connection between ideal tree packing and DSG?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Contributions of the paper</head><p>This paper has two main contributions. The first is a new proof of the convergence of SuperGreedy++ for DSS. Our proof is based on showing that SuperGreedy++ can be viewed as a "noisy" or "approximate" variant of the Frank-Wolfe algorithm applied to the quadratic program defined by Fujishige. The advantage of the new proof is twofold. First, it shows that SuperGreedy++ not only converges to a (1 &#8800; ')-approximation to the densest set, but that in fact it converges to the densest decomposition vector. This was empirically observed in <ref type="bibr">[24]</ref> for DSG, and was left as an open problem to resolve. The proof in <ref type="bibr">[10]</ref> on convergence of SuperGreedy++ is based on the MWU method via LPs, and does not exploit Fujishige's result which is key to the stronger property that we prove here. Second, the proof connects two powerful tools directly and at a high-level: Fujishige's result on submodular functions, and a standard method for constrained convex optimization.</p><p>I Theorem 1. Let b &#250; be the dense decomposition vector for a non-negative monotone supermodular set function f : 2 V ae R + where |V | = n. Then, SuperGreedy++ converges in O(f /' 2 ) iterations to a vector b such that ||b &#8800; b &#250; || 2 AE ', wheref depends only on f . For a graph with m edges and n vertices, Greedy++ converges in O(mn 2 /' 2 ) iterations for unweighted multigraphs. I Remark 2. The new convergence gives a weaker bound than the one in <ref type="bibr">[10]</ref> in terms of convergence to a (1 &#8800; ') relative approximation to the maximum density. However, it gives a strong additive guarantee to the entire dense decomposition vector.</p><p>Our second contribution builds on our insights on DSG and DSS, and applies it towards understanding ideal tree packing and greed tree packing. We connect the ideal tree packing of Thorup to the dense decomposition associated with the rank function of the underlying graphic matroid (which is submodular). We then show that greedy tree packing algorithm can be viewed as the Frank-Wolfe algorithm applied to the quadratic program defined by Fujishige, and this easily yields a convergence guarantee. <ref type="figure">E</ref>) be a graph. The ideal edge load vector &#184;&#250; : E ae R + for G is given by the lexicographically minimal base in the polymatroid associated with the rank function of the graphic matroid of G. The Frank-Wolfe algorithm with step size 1 k+1 , when applied to the quadratic program for computing the lexicographically minimal base in the graphic matroid of G, coincides with the greedy tree packing algorithm. For unweighted graphs on m edges, the generic analysis of Frank-Wolfe method's convergence shows that greedy tree packing converges to a load vector &#184;:</p><p>Although the algorithm is the same (greedy tree packing), Thorup's analysis guarantees a strongly polynomial-bound even in the capacitated case <ref type="bibr">[47]</ref>. However we obtain a stronger additive guarantee via a generic Frank-Wolfe analysis and our analysis has a 1/' 2 dependence while Thorup's has a 1/' 3 dependence. We give a more detailed comparison in Section 5.</p><p>Organization. The rest of the paper is devoted to proving the two theorems. The paper relies on tools from theory of submodular functions and an adaptation of the analysis of Frank-Wolfe. We first describe the relevant background and then prove the two results in separate sections. Due to space constraints, most of the proofs are provided in the full version.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background on Frank-Wolfe algorithm and a variation</head><p>Let D &#8482; &#376; d be a compact convex set, and f : D ae &#376; be a convex, di erentiable function.</p><p>Consider the problem of min xoeD f (x). Frank-Wolfe <ref type="bibr">[17]</ref> is a first order method and it relies on access to a linear minimization oracle, LMO, for f that can answer LMO(w) = arg min soeD &#200;s, &#210;f (w)&#205; for any given w oe D. In several applications such oracles with fast running times exist. Given f, D as above, the Frank-Wolfe algorithm is an iterative algorithm that converges to the minimizer x &#250; oe D of f . See Algorithm 1. The algorithm starts with a guess of the minimizer b (0) oe D. In each iteration, it finds a direction d (k+1) to move towards by calling the linear minimization oracle on the current guess b (k) . It then moves slightly towards that direction using a convex combination to ensure that the new point is in D. The amount the algorithm moves towards the new direction decreases as k increases signifying the "confidence" in its current guess as the minimizer.</p><p>The original convergence analysis for the Frank-Wolfe algorithm is from <ref type="bibr">[17]</ref>. Jaggi <ref type="bibr">[25]</ref> gave an elegant and simpler analysis. His analysis characterizes the convergence rate in terms of the curvature constant C f of the function f . E S A 2 0 2 3 56:6</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convergence to Lexicographically Optimal Base and Applications</head><p>Algorithm 1 Frank-Wolfe-Original.</p><p>be a compact convex set, and f : D ae &#376; be a convex, di erentiable function. The curvature constant C f of f is defined as</p><p>I Definition 6. Let g : D ae &#376; be a di erentiable function. Then g is Lipschitz with constant </p><p>Jaggi's proof technique can be used to prove the convergence rate of "noisy/approximate" variants of the Frank-Wolfe algorithm. This motivates the following definition. An 'approximate linear minimization oracle is an oracle that for any w oe D, returns &#349; such that &#200;&#349;, &#210;f (w)&#205; AE &#200;s &#250; , &#210;f (w)&#205; + ', where s &#250; = LMO(w). While an e cient exact linear minimization oracle exists in some applications, in others one can only '-approximate it (using numerical methods or otherwise). Jaggi's proof technique extends to show that an approximate linear minimization oracles su ces for convergence as long as the approximation quality improves with the iterations. Suppose the oracle, in iteration k, provides a "C f k+2approximate solution where " &gt; 0 is some fixed constant. The convergence rate will only deteriorate by a (1 + ") multiplicative factor. Qualitatively, this says that we can a ord to be inaccurate in computing the Frank-Wolfe direction in early iterations, but the approximation should approach LMO(b (k) ) as k ae OE.</p><p>Another question of interest is the resilience of the Frank-Wolfe algorithm to changes in the learning rate " k = 2 k+2 . Indeed, the variants we will look at will require " k = 1 k+1 . Jaggi's proof can again be adapted to handle this case, with only an O(log k) multiplicative deterioration in the convergence rate. We state the following theorem whose proof we defer to the appendix.</p><p>be a compact convex set, and f : D ae &#376; be a convex, di erentiable function with minimizer b &#250; . Suppose instead of computing d (k+1) by calling LMO(b (k) ) in iteration k, we call a "C f k+2 -approximate linear minimization oracle, for some fixed " &gt; 0. Also, suppose instead of using " k = 2 k+2 , we use</p><p>, where H n is the n-th Harmonic term.</p><p>We refer to the variant of Frank-Wolfe algorithm, as described by Theorem 8, as noisy Frank-Wolfe.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sub and supermodular functions, and dense decompositions</head><p>We already defined submodular and supermodular set functions, polymatroids and contrapolymatroids. We restrict attention to functions satisfying f (&#255;) = 0 which together with supermodularity and non-negativity implies monotonocity, that is,</p><p>the inequality is reversed for supermodular set functions. We need the following simple lemma.</p><p>is supermodular. In particular if f is a normalized monotone submodular function then g is a normalized monotone supermodular function.</p><p>Deletion and contraction, and non-negative summation. Sub and supermodular functions are closed under a few simple operations. Given f : 2 V ae R, restricting it to a subset V &#213; corresponds to deleting V \ V &#213; . Given A &#181; V , contracting f to A yields the function g : 2 V \A ae R where g(X) = g(X fi A) &#8800; g(A). Given two functions f and g we can take their non-negative sum af + bg where a, b &#216; 0. Monotonicity and normalization is also preserved under these operations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dense decompositions for submodular and supermodular functions</head><p>Following the discussion in the introduction, we are interested in decompositions of supermodular and submodular functions. Dense decompositions follow from the theory of principal partitions of submodular functions that have been explored extensively. We refer the reader to Fujishige's survey <ref type="bibr">[20]</ref> as well as Naraynan's work <ref type="bibr">[35,</ref><ref type="bibr">36]</ref>. The standard perspective comes from considering the minimizers of the function f &#8260; for a scalar &#8260; where f &#8260; (S) = f (S) &#8800; &#8260;|S|. As &#8260; varies from &#8800;OE to OE the minimizers change only at a finite number of break points. In this paper we are interested in the notion of density, in the form of ratios, for non-negative submodular and supermodular functions. For this reason we follow the notation from recent work <ref type="bibr">[44,</ref><ref type="bibr">14,</ref><ref type="bibr">10,</ref><ref type="bibr">24]</ref> and state lemmas in a convenient form, and provide proofs in the appendix for the sake of completeness.</p><p>Supermodular function dense decomposition. The basic observation is the following. I Lemma 10. Let f : 2 V ae &#376; + be a non-negative supermodular set function. There exists a unique maximal set S &#8482; V that maximizes f (S)</p><p>|S| .</p><p>The preceding lemma can be used in a simple fashion to derive the following corollary (this was explicitly noted in <ref type="bibr">[10]</ref> for instance).</p><p>I Corollary 11. Let f : 2 V ae &#376; + be a non-negative supermodular set function. There is a unique partition S 1 , S 2 , . . . , S h of V with the following property. Let V i = V &#8800; fi j&lt;i S j and let A i = fi j&lt;i S i . Then, for each i = 1 to h, S i is the unique maximal densest set for the function f Di : 2 Vi ae R + . Moroever, letting &#8260; i be the optimum density of f Di , we have</p><p>Based on the preceding corollary, we can associated with each v oe V a value &#8260;(v): &#8260;(v) = &#8260; i where v oe S i . See Figure <ref type="figure">1</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convergence to Lexicographically Optimal Base and Applications</head><p>Dense decomposition for submodular functions. We now discuss submodular functions. We consider two variants. We start with a basic observation. I Lemma 12. Let f : 2 V ae &#376; + be a monotone non-negative submodular set function such that f (v) &gt; 0 for all v oe V . There is a unique minimal set S &#8482; V that minimizes</p><p>for submodular function f . Consider the following variant of a decomposition of f . We let S 0 = V and find S 1 as the unique minimal set S &#8482; V that minimizes |V |&#8800;|S| f (V )&#8800;f (S) . Then we "delete" &#348;1 = V \ S 1 , and find the minimal set S 2 &#8482; S 1 that minimizes |S1|&#8800;|S| f (S1)&#8800;f (S) . In iteration i, we find the unique minimal set</p><p>. For u oe &#348;i , we say the density of u is &#8260; u = &#8260; i . Hence the dense decomposition of f is &#348;1 , ..., &#348;k with densities &#8260; 1 , . . . , &#8260; k . We refer to this decomposition as the first variant which is based on deletions.</p><p>We now describe a second dense decomposition for submodular functions. Let f : 2 V ae R + be a monotone submodular function. Consider the supermodular function g : 2 V ae R + where g(X) = f (V ) &#8800; f (V \ X) for all X &#8482; V . From Lemma 9, g is monotone supermodular. We can then apply Corollary 11 to obtain a dense decomposition of g. Let T 1 , T 2 , . . . , T k &#213; be the unique decomposition obtained by considering g and let &#8260;1 , ..., &#8260;k &#213; be the corresponding densities. Note that this second decomposition is based on contractions.</p><p>Not too surprisingly, the two decompositions coincide, as we show in the next theorem. The main reason to consider them separately is for technical ease in applications where one or the other view is more natural. I Theorem 13. Let &#348;1 , ..., &#348;k be a dense decomposition (using deletion variant) of a submodular function f with densities &#8260; i , . . . , &#8260; k . Let T 1 , ..., T k &#213; be a dense decomposition (using contraction variant) of the same function with densities &#8260;1 , ..., &#8260;k &#213; . We have (i) k &#213; = k, (ii) &#348;1 , ... &#348;k is exactly T 1 , ..., T k , and (iii) &#8260;i = 1 &#8260;i for 1 AE i AE k.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Fujishige's results on lexicographically optimal bases</head><p>Fujishige <ref type="bibr">[18]</ref> gave a polyhedral view of the dense decomposition which is the central ingredient in our work. He stated his theorem for polymatroids, however, it can be easily generalized to contrapolymatroids. We restrict attention to the unweighted case for notational ease - <ref type="bibr">[18]</ref> treats the weighted case. Vectors in R n can be totally ordered by sorting the coordinates in increasing order of value and considering the lexicographical ordering of the two sorted sequences of length n. In the following, for a, b oe R n we use a &#170; &#248; b and a &#8734; &#248; b to refer to this order. We say that a vector x in a set D is lexicographically maximum if for all y oe D we have y &#8734; &#248; x.</p><p>Fujishige proved the following theorem for polymatroids.</p><p>I Theorem 14 <ref type="bibr">([18]</ref>). Let f : 2 V ae R + be a monotone submodular function (a polymatroid) and let B f be its base polytope. Then there is a unique lexicographically maximum base b &#250; oe B f and for each</p><p>is the optimum solution to the quadratic program: min q v x 2 v subject to x oe B f . Another ordering is to sort the coordinates in decreasing order of value and then taking the lexicographic ordering on the two sorted sequences. We denote this ordering by &#170; &#191; , &#8734; &#191; for strict and non-strict ordering respectively. We say that a vector x in a set D is lexicographically minimum if for all y oe D we have x &#8734; &#191; y. The preceding theorem can be generalized to contrapolymatroids in a straight forward fashion and this was explicitly pointed out in <ref type="bibr">[24]</ref>. We paraphrase it to be similar to the preceding theorem statement. I Theorem 15. Let f : 2 V ae R + be a monotone supermodular function (a contrapolymatroid) and let B f be its base polytope. Then there is a unique lexicographically minimum base b &#250; oe B f and for each v oe V , b &#250; v = &#8260; v . Moreover, b &#250; is the optimum solution to the quadratic program: min q v x 2 v subject to x oe B f .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Approximating a lexicographically optimal base using Frank-Wolfe</head><p>Consider the convex quadratic program min q voeV x 2 v subject to x oe B f where B f is the base polytope of f (could be submodular of supermodular). We can use the Frank-Wolfe method to approximately solve this optimization problem. The gradient of the quadratic function is 2x and it follows that in each iteration, we need to answer the linear minimization oracle of LMO(w) = arg min soeB f &#200;s, 2w&#205; for w oe B f . This is equivalent to arg min soeB f &#200;s, w&#205;, in other words optimizing a linear objective over the base polytope. Edmonds <ref type="bibr">[15]</ref> showed that the simple greedy algorithm is an O(|V | log |V |) time exact algorithm (assuming O(1) time oracle access to f ).</p><p>I Theorem 16 <ref type="bibr">([15]</ref>). Fix a polymatroid f : 2 V ae &#376; + . Given a weight vector w oe &#376; n , let v j1 , v j2 , . . . , v jn be a sort of V = {v 1 , ..., v n } in ascending order of w i values. Let</p><p>The theorem also holds for supermodular functions but by reversing the order from ascending to descending order of w and complimenting the set A i .</p><p>I Theorem 17 <ref type="bibr">([15]</ref>). Fix a contrapolymatroid f : 2 V ae &#376; + . Given a weight vector w oe &#376; n , let v j1 , v j2 , . . . , v jn be a sort of V = {v 1 , ..., v n } in descending order of w i values.</p><p>. Then s &#250; = arg min soeB f &#200;s, w&#205;.</p><p>Both algorithms are dominated by the sorting step and thus takes O(|V | log |V |) time. These simple algorithms imply that the Frank-Wolfe algorithm can be used on the quadratic program to obtain an approximation to the lexicographically maximum (respectively minimum) bases of submodular (respectively supermodular) functions. The standard Frank-Wolfe algorithm would need O(</p><p>) iterations to converge to a vector b satisfying .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Application 1: Convergence of GREEDY++ and SUPERGREEDY++</head><p>We begin by describing Greedy++ <ref type="bibr">[7]</ref> and its generlization SuperGreedy++ <ref type="bibr">[10]</ref>.</p><p>Greedy++ is built upon the peeling idea of Greedy, and applies it over several iterations. The algorithm initializes a weight/load on each v oe V , denoted by w(v), to 0. In each iteration it creates an ordering by peeling the vertices: the next vertex to be chosen is</p><p>where G &#213; is the current graph (after removing the previously peeled vertices). At the end of the iteration, w(v) is increased by the degree of v when it was peeled in the current iteration. A precise description can be found below. SuperGreedy is a natural generalization of Greedy, and SuperGreedy++ generalizes Greedy++. A formal description of SuperGreedy++ is given below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E S A 2 0 2 3</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>56:10 Convergence to Lexicographically Optimal Base and Applications</head><p>Algorithm 2 Greedy++(G(V, E), T ) <ref type="bibr">[7]</ref>.</p><p>The goal of this section is to prove Theorem 1 on the convergence of SuperGreedy++ and Greedy++ to the lexicographically maximal base.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Intuition and main technical lemmas</head><p>As we saw in Section 3.3, if one applies the Frank-Wolfe algorithm to solve the qaudratic program min q voeV x 2 v subject to x oe B f , each iteration corresponds to finding a minimum weight base of f where the weights are given by the current vector x. Finding a minimum weight base corresponds to sorting V by x. However, SuperGreedy++ and Greedy++ use a more involved peeling algorithm in each iteration; the peeling is based on the weights as well as the degrees of the vertices and it is not a static ordering (the degrees change as peeling proceeds). This is the di culty in formally analyzing these algorithms. In <ref type="bibr">[10]</ref>, the authors used a connection to the multiplicative weight update method via LP relaxations. Here we rely on the quadratic program and noisy Frank-Wolfe. The high-level intuition, that originates in <ref type="bibr">[10]</ref>, is the following. As the algorithm proceeds in iterations, the weights on the vertices accumulate; recall that the total increase in the weight in the case of DSG is m = |E|. The degree term, which influences the peeling, is dominant in early iterations, but its influence on the ordering of the vertices decreases eventually as the weights of the vertices get larger. It is then plausible to conjecture that the algorithm behaves like the standard Frank-Wolfe method in the limit. The main question is how to make this intuition precise. <ref type="bibr">[10]</ref> relies on a connection to the MWU method while we use a connection to noisy Frank-Wolfe.</p><p>For this purpose, consider an iteration of Greedy++ and SuperGreedy++. The algorithm peels based on the current weight vector and the degrees. We isolate and abstract this peeling algorithm and refer to it as Weighted-Greedy and Weighted-SuperGreedy respectively, and formally describe them with the weight vector w as a parameter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 4 Weighted-Greedy(G, w).</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Input: G(V, E) and w(u) for</head><p>The peeling algorithms also compute a base d oe B f . In the case of graphs and DSG, d(u) is set to the degree of the vertex u when it is peeled. One can alternatively view the base as an orientation of the edges of E. Define for each edge uv oe G two weights x uv , x vu . We say that x is valid if x uv + x vu = 1 and x uv , x vu &#216; 0 for all {u, v} oe E(G). For b oe &#376; |V | , we say x induces b if b u = q voe"(u) x uv for all u oe V . We say a vector d is an orientation if there is a valid x that induces it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>and only if b is an orientation.</head><p>Recall that the Frank-Wolfe algorithm, for a given weight vector w : V ae R + , computes the minimum-weight base b with respect to w since &#200;w, b&#205; = min yoeB f &#200;w, y&#205;. It is worth taking a moment to note that this base (or orientation due to Lemma 18) is easily computable: we orient each edge integrally (i.e x vu = 1, x uv = 0) from v to u if w(u) &#216; w(v), and from u to v otherwise. A simple exchange argument yields a proof of correctness and is implicit in many works <ref type="bibr">[14]</ref> 2 . This induces an optimal base d &#250; w with respect to w. Our goal is to compare how the peeling order created by Weighted-Greedy (and Weighted-SuperGreedy) compares with the best base. The following two technical lemmas formalize the key idea. The first is tailored to DSG and the second applies to DSS. I Lemma 19. Let d be the output from Weighted-Greedy(G, w) and d &#250; w be the optimal orientation with respect to w. Then &#200;w, d&#205; AE &#200;w,</p><p>. In particular, the additive error does not depend on the weight vector w. I Lemma 20. For a supermodular function f : 2 V ae &#376; + , let d be the output from Weighted-SuperGreedy(f, w) (Algorithm 5) and d &#250; w be the optimal vector with respect to w as described in Theorem 17. Then &#200;w, d&#205; AE &#200;w,</p><p>. In particular, the additive error does not depend on the weight vector w.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Convergence proof for Greedy++</head><p>Why is Lemma 19 crucial? First, observe that the minimizer d &#250; w of &#200;w, d&#205; is exactly the same minimizer as &#200;Kw, d&#205; for any constant K &gt; 0 (and vice-versa).</p><p>I Lemma 21. Let dK be the output of Weighted-Greedy(G, Kw). Then &#200;w, dK &#205; AE &#200;w,</p><p>. 2 Since the optimal orientation is easily computable, one can replace the "peeling" iteration of Greedy++ with the optimum base. This would result in the Frank-Wolfe based algorithm of <ref type="bibr">[14]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E S</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>!q</head><p>uoeV Kw(u)d(u)</p><p>Dividing by K implies the claim. J</p><p>We are now ready to view Greedy++ as a noisy Frank-Wolfe algorithm. Algorithm 6 shows how Greedy++ could be interpreted. Input: G = (V, E) and w(u) for u oe V</p><p>The algorithm is exactly the same as the one described in Algorithm 2. Indeed, one can prove that kb (k) is precisely the weights that Greedy++ ends with at round k by induction. Observe that (k + 1)b (k+1) = kb (k) + d (k+1) which is precisely the load as described in Algorithm 2 (via induction). We note that " &#937; 1/(k + 1) is crucial here to ensure we are taking the average. Lemma 25 in the appendix (full version) implies that each peel in Algorithm 2 is "C f k+2 -approximate linear minimization oracle. Using Theorem 8, this implies that Greedy++ (as described in Algorithm 2) converges to b &#250; in &#213;( mn 2 ' 2 ) iterations since</p><p>). We use the probabilistic method to bound C f in the full version.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Extension to SuperGreedy++. An essentially similar analysis works for</head><p>SuperGreedy++. Instead of Lemma 19, we rely on Lemma 20. For technical reasons, the convergence analysis of SuperGreedy++ is slightly weaker than for Greedy++.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Application 2: Greedy Tree Packing interpreted via Frank-Wolfe</head><p>Let G = (V, E) be a graph with non-negative edge capacities. The goal of this section is to view Thorup's definitions of ideal edge loads and the associated tree packing from a di erent perspective, and to derive an alternate convergence analysis of his greedy tree packing algorithm <ref type="bibr">[46,</ref><ref type="bibr">47]</ref>. In previous work, Chekuri, Quanrud and Xu <ref type="bibr">[11]</ref> obtained a di erent tree packing based on an LP relaxation for k-cut, and used it in place of ideal tree packing. Despite this, there was a gap in our understanding which we address here. We restrict our attention to unweighted multi-graphs throughout this section, and comment on the capacitated case at the end of the section. Let G = (V, E) be a connected multi-graph, with n vertices and m edges. Consider the graphic matroid M G (E, F) induced by G; E is the ground set, and F consists of all sub-forests of G. The bases of the matroid are precisely the spanning trees of G. Consider the rank function r : 2 E ae Z + of M G . r is submodular, and it is well-known that for a edge subset X &#8482; E, r(X) = n &#8800; &#376;(X) where &#376;(X) is the number of connected components induced by X.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Thorup's recursive algorithm as dense decomposition</head><p>For consistency with previous notation, we use f to denote the submodular rank function r. We first describe ideal loads as defined by Thorup. Consider the Tutte-Nash-Williams partition P for G. Recall that P minimizes the ratio |E(P )| |P |&#8800;1 among all partitions, and this ratio is &#8226; (G). For each edge e oe E(P ), assign &#184;&#250;(e) = 1</p><p>&#8226; (G) . Remove the edges in E(P ) to obtain a graph G &#213; which now consists of several disconnected components. Recursively compute ideal loads for the edges in each connected component of G &#213; (the process stops when G has no edges).</p><p>We claim that Thorup's recursive decomposition coincides with the dense decomposition of f (the first variant). To see this, it su ces to see the first step of the dense decomposition. We find the minimal set S 1 &#8482; E that minimizes |E|&#8800;|S| f (E)&#8800;f (S) . We let &#348;1 = E \ S 1 and assign the edges in &#348;1 the density f (E)&#8800;f (S)</p><p>|E|&#8800;|S| . Then, we "delete" &#348;1 . Observe that &#348;1 = E \ S 1 is just the edges crossing the partition P (S 1 ) defined by the &#376;(S 1 ) connected components spanned by S 1 . Also, recall that f (E)&#8800;f (S1)</p><p>Hence, the density assigned to edges in &#348;1 is exactly 1</p><p>&#8226; (G) by the Tutte-Nash-Williams theorem. The next step is deleting &#348;1 = E \ S 1 , which, as discussed above, are the edges crossing the partition P (S 1 ).</p><p>Via induction we prove the following lemma.</p><p>I Lemma 22. The weights given to the by the dense decomposition algorithm on f coincide with &#184;&#250;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Greedy tree packing converge to ideal relative loads</head><p>Thorup considered the following greedy tree packing algorithm. For each edge define a load &#184;(e) which is initialized to 0. The algorithm proceeds in iterations. In iteration i the algorithm computes an MST T i in G with respect to edge weights w(e) = &#184;(e). The load of each edge e oe T i is increased by 1. Thorup showed that as k ae OE, the quantity &#184;(e)/k converges to &#184;&#250;(e) for each edge e. His proof is fairly technical. In this section, we present a di erent proof of this fact that uses the machinery we have built thus far. I Lemma 23. The vector &#184;&#250; is the lexicographically maximal base of the spanning tree polytope.</p><p>Proof. We showed that Thorup's definition of ideal loads is obtained by simply running the dense decomposition on the rank function of the graphic matroid induced by G. The bases of the graphic matroid are the spanning trees of G and hence the base polytope of f is the spanning tree polytope of G. The dense decomposition of f gives us the lexicographically maximum base, and hence &#184;&#250; is the lexicographically maximal base in the spanning tree polytope of G. J Hence, &#184;&#250; is the unique solution to the quadratic program: min q e &#184;(e) 2 subject to &#184;oe SPT(G) where SPT(G) is the spanning tree base polytope. We can thus apply a noisy Frank-Wolfe algorithm to the quadratic program to obtain Algorithm 7.</p><p>The main observation is that this algorithm is exactly the same as Thorup's greedy tree packing algorithm. Indeed, observe that (k + 1)&#184;( k+1) &#937; k&#184;( k) + d (k+1) = k&#184;( k) + 1{e oe MST(G, &#184;(k) )} where MST(G, w) is a minimum spanning tree of G with respect to edge weights w. Since noisy Frank-Wolfe converges, then &#184;(k) converges to &#184;&#250;(e), and greedy tree packing converges. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Input: G(V, E)</head><p>Initialize l (0) (u) = 1{e oe T } for any spanning tree T . for k &#937; 0 to T &#8800; 1 do</p><p>&#200;l (k) , s&#205; &#219; This is the minimum spanning tree with respect to l (k)</p><p>We now establish the convergence guarantee for greedy tree packing. For the spanning tree polytope of an m edge graph, the curvature constant C f AE 4m because for x, y oe B f , 2(x &#8800; y) T (x &#8800; y) = q eoeE (x e &#8800; y e ) 2 AE 4m. Plugging this bound into Theorem 8, after k = O( m log(m/') ' 2</p><p>) iterations, .</p><p>. &#184;(k) &#8800; &#184;&#250;.</p><p>. 2 AE '. we run the standard Frank-Wolfe algorithm with " = 2/(k + 2). Then, the convergence guarantee improves to O( m ' 2 ). Note that each iteration still corresponds to finding an MST in the graph with weights. However, the load vector is no longer a simple average of the trees taken so far.</p><p>Comparison to Thorup's bound and analysis. <ref type="bibr">Thorup [47]</ref> considered ideal tree packings in capacitated graphs; let c(e) &#216; 1 (via scaling) denote the capacity of edge e. Via <ref type="bibr">[18]</ref>, one sees that the optimum solution of the quadratic program q e x 2 e /c(e) subject to x oe SP (G) is the ideal load vector &#184;&#250;. Greedy tree packing generalizes to the capacitated case easily; in each iteration we compute the MST with respect to weights w(e) = &#184;(e)c(e). Thorup proved the following.</p><p>I Theorem 24 <ref type="bibr">([47]</ref>). Let G = (V, E) be capacitated graph. Greedy tree packing after O( m log(mn/') ' 3</p><p>) iterations ouputs a load vector &#184;such that for each edge e oe E, &#184;(e) AE (1 + ')&#184;&#250;(e).</p><p>We observe that if all capacities are 1 (or identical) then Thorup's guarantee is that &#184;(e) &#8800; &#184;&#250;(e) AE O(') for each edge e. For this case, via Frank-Wolfe, we obtain the much stronger guarantee that ||&#184;&#8800; &#184;&#250;|| 2 AE ' which easily implies the per edge condition, however the per edge guarantee does not imply a guarantee on the norm. Further, in the unweighted case, our iteration complexity dependence on ' is 1/' 2 while Thorup's is 1/' 3 . Thorup's guarantee works for the capacitated case in strongly polynomial number of iterations. We can adapt the Frank-Wolfe analysis to the capacitated case but it would yield a bound that depends on C = q e c(e) (in the unweighted case C = m); on the other hand the guarantee provided by Frank-Wolfe is stronger.</p><p>It may seem surprising that the same greedy tree packing algorithm yields di erent types of guarantees based on the type of analysis used. We do not have a completely satisfactory explanation but we point out the following. Thorup's analysis is a non-trivial refinement of the standard MWU type analysis of tree packing <ref type="bibr">[38,</ref><ref type="bibr">50,</ref><ref type="bibr">3]</ref>. As already noted in <ref type="bibr">[24]</ref>, if one uses Frank-Wolfe (with " = 1/(k + 1)) with the softmax potential function that is standard in the MWU framework, then the resulting algorithm would also be greedy tree packing. Fujishige's uses a quadratic objective to guarantee that the optimum solution is the unique maximal base but in fact any increasing strongly convex function would su ce. In the context of optimizing a linear function over B f , due to the optimality of the greedy algorithm, the only thing that determines the base is the ordering of the elements of V according to the weight vector; the weights themselves do not matter. Thus, Frank-Wolfe applied to di erent convex objectives can result in the same greedy tree/base packing algorithm. However, the specific objective can determine the guarantee one obtains after a number of iterations. The softmax objective is better suited for obtaining relative error guarantees while the quadratic objective is better suited for obtaining additive error guarantees. Thorup's analysis is more sophisticated due to the per edge guarantee in the capacitated setting. A unified analysis that explains both the relative and additive guarantees is desirable. We leave this is an interesting direction for future research.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Reid Andersen and Kumar Chellapilla. Finding dense subgraphs with size bounds. In Konstantin Avrachenkov, Debora Donato, and Nelly Litvak, editors, Algorithms and Models</p></note>
		</body>
		</text>
</TEI>
