<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Practical Parallel Algorithms for Near-Optimal Densest Subgraphs on Massive Graphs</title></titleStmt>
			<publicationStmt>
				<publisher>SIAM</publisher>
				<date>01/12/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10526828</idno>
					<idno type="doi"></idno>
					
					<author>Pattara Sukprasert</author><author>Quanquan C Liu</author><author>Laxman Dhulipala</author><author>Julian Shun</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The densest subgraph problem has received significant attention, both in theory and in practice, due to its applications in problems such as community detection, social network analysis, and spam detection. Due to the high cost of obtaining exact solutions, much attention has focused on designing approximate densest subgraph algorithms. However, existing approaches are not able to scale to massive graphs with billions of edges.In this paper, we introduce a new framework that combines approximate densest subgraph algorithms with a pruning optimization. We design new parallel variants of the state-of-the-art sequential Greedy++ algorithm, and plug it into our framework in conjunction with a parallel pruning technique based on k-core decomposition to obtain parallel (1+ε)-approximate densest subgraph algorithms. On a single thread, our algorithms achieve 2.6-34× speedup over Greedy++, and obtain up to 22.37× self-relative parallel speedup on a 30core machine with two-way hyper-threading. Compared with the state-of-the-art parallel algorithm by Harb et  al. [NeurIPS'22]  , we achieve up to a 114× speedup on the same machine. Finally, against the recent sequential algorithm of Xu et al. [PACMMOD'23]  , we achieve up to a 25.9× speedup. The scalability of our algorithms enables us to obtain near-optimal density statistics on the hyperlink2012 (with roughly 113 billion edges) and clueweb (with roughly 37 billion edges) graphs for the first time in the literature.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The densest subgraph problem is a fundamental problem in graph mining that has been studied extensively for decades, both because of its theoretical challenges and its practical importance. The numerous applications of the problem include community detection and visualization in social networks <ref type="bibr">[1,</ref><ref type="bibr">15,</ref><ref type="bibr">27,</ref><ref type="bibr">31,</ref><ref type="bibr">33,</ref><ref type="bibr">42]</ref>, motif discovery in protein and DNA <ref type="bibr">[21,</ref><ref type="bibr">25,</ref><ref type="bibr">45]</ref>, and pattern identification <ref type="bibr">[2,</ref><ref type="bibr">22,</ref><ref type="bibr">29]</ref>.</p><p>Significant effort has been made in the theoretical computer science community in computing exact and approximate densest subgraphs under various models of computation, in particular in the static <ref type="bibr">[10,</ref><ref type="bibr">11,</ref><ref type="bibr">13,</ref><ref type="bibr">32,</ref><ref type="bibr">46]</ref>, streaming <ref type="bibr">[7]</ref>, distributed <ref type="bibr">[4,</ref><ref type="bibr">26,</ref><ref type="bibr">44]</ref>, parallel <ref type="bibr">[5,</ref><ref type="bibr">19,</ref><ref type="bibr">17,</ref><ref type="bibr">28]</ref>, dynamic <ref type="bibr">[7,</ref><ref type="bibr">12,</ref><ref type="bibr">14,</ref><ref type="bibr">43]</ref>, and privacypreserving <ref type="bibr">[20,</ref><ref type="bibr">24,</ref><ref type="bibr">39]</ref> settings. However, despite a plethora of theoretical improvements on these fronts, there still does not exist practical near-optimal densest subgraph algorithms that can scale up to the largest publicly-available graphs with tens to hundreds of billions of edges. In particular, for the largest such graphs, hyperlink2012 (with roughly 113 billion edges) and clueweb (with roughly 37 billion edges), no previous approximations for the densest subgraph were known that are better than a 2-approximation.</p><p>There are two typical approaches for solving the densest subgraph problem exactly. The first is to solve a combinatorial optimization problem using a linear program solver. The other is to set up a flow network with size polynomial in the size of the original graph, and then run a maximum flow algorithm on it. However, the caveat to both approaches is that they are not scalable to modern massive graphs; namely, both approaches have large polynomial runtimes and the best theoretical algorithms for these approaches are often not practical. Because of this bottleneck, many have instead investigated approaches for approximate densest subgraphs.</p><p>The best-known approximation algorithms for the densest subgraph problem fall into two categories. The first category contains parallel approximation algorithms, which work by iteratively removing carefully chosen subsets of low-degree vertices while computing the density of the induced subgraph of the remaining vertices; then, the induced subgraph with the largest density is taken as the approximate densest subgraph <ref type="bibr">[5,</ref><ref type="bibr">7,</ref><ref type="bibr">11]</ref> using poly(log n) rounds of peeling vertices with degree smaller than some threshold. Unfortunately, such methods give (2 + &#949;)-approximations at best and no one has thus far made such methods work in poly(log n) rounds and give better approximations.</p><p>The second category consists of algorithms obtained from the multiplicative weight update (MWU) method. The multiplicative weight update framework approximately solves an optimization problem by using expert oracles to update the weights assigned to the variables multiplicatively and iteratively over several rounds depending on how the experts performed in previous rounds. The MWU framework allows for obtaining (1 + &#949;)-approximate densest subgraphs in poly(log n) iterations; however, it requires more work than the peeling algorithm per iteration to update the weights of the variables. As such, neither approach is particularly scalable to massive graphs.</p><p>In terms of practical solutions, Boob et al. <ref type="bibr">[10]</ref> present a fast, sequential, iterative peeling algorithm called Greedy++ that combines peeling with the MWU framework. Chekuri et al. <ref type="bibr">[13]</ref> show that running Greedy++ for &#920;( &#8710; log n &#961; * &#949; 2 ) iterations results in a (1 + &#949;)approximation of the densest subgraph, where &#961; * is the density of the densest subgraph. However, Greedy++ is not parallel, and does not take advantage of modern multi-core and multiprocessor architectures. Recently, Harb et al. <ref type="bibr">[28]</ref> proposed an iterative algorithm based on projections that solves a quadratic objective function with linear constraints derived from the dual of the densest subgraph linear program of Charikar <ref type="bibr">[11]</ref>. For a graph with m edges and maximum degree &#8710;, they prove that their algorithm converges to a (1 + &#949;)approximation in O( &#8730; m&#8710;/&#949;) iterations, where each iteration takes O(m) work.</p><p>Xu et al. <ref type="bibr">[48]</ref> recently introduce a framework for a generalized version of the densest subgraph problem that includes variants like the densest-at-least-ksubgraph problem. Their framework alternates between iteratively using maximum flow to obtain denser subgraphs and then peeling according to the k-core to shrink the graph for the next maximum flow iteration. However, their algorithm is not parallel and, thus, cannot scale to the largest publicly available graphs. Parallel implementations exist that give 2-approximations on the densest subgraph <ref type="bibr">[18,</ref><ref type="bibr">19,</ref><ref type="bibr">36]</ref>, but such algorithms and implementations achieve worse theoretical approximation guarantees than our (1 + &#949;)-approximation algorithms. We also demonstrate that they obtain worse empirical approximations.</p><p>In our work, we design fast practical algorithms that simultaneously make use of parallelism as well as the closely related concept of the k-core decomposition. The k-core decomposition decomposes the graph into k-cores for different values of k. Within the induced subgraph of each k-core, each vertex has degree at least k. It is a well-known fact that the density of the densest subgraph is within a factor of 2 of the maximum core value. However, it is less clear how to make use of this fact in creating scalable algorithms for the largest publiclyavailable graphs. In this paper, we design a pruning framework that, combined with our parallel densest subgraph subroutines, results in both theoretical as well as practical improvements over the state-of-the-art. The main idea of our framework is to iteratively prune the graph using lower bounds on the density of densest subgraph computed from our parallel densest subgraph subroutines, while preserving the densest subgraph.</p><p>The concept of using pruning to obtain a smaller subgraph from which to approximate the densest subgraph is also used in some recent works <ref type="bibr">[23,</ref><ref type="bibr">48]</ref>. However, in their works, the pruning procedures they use are inherently sequential. Compared to previous work, we introduce a parallel, iterative pruning approach in this paper and demonstrate via our comprehensive experimentation that our algorithms are more efficient and more scalable than all previous baselines. Specifically, we give parallel peeling-based MWU and sorting-based MWU iterative algorithms that use pruning and are based on Greedy++ <ref type="bibr">[10]</ref>. Our algorithms achieve the same theoretical number of iterations as Chekuri et al. <ref type="bibr">[13]</ref>, but is more amenable to parallelization. Experimentally, on an 30-core machine with hyperthreading, our parallel sorting-based algorithm outperforms our parallel peeling-based algorithm, as well as previous state-of-the-art algorithms on most graphs. For instance, compared with the state-of-theart parallel algorithm by Harb et al. <ref type="bibr">[28]</ref>, we achieve up to a 114&#215; speedup on the same machine.</p><p>Leveraging the scalability of our parallel algorithms, we provide a number of previously unknown graph statistics and graph mining results on the largest of today's publicly available graphs, hyperlink2012 and clueweb, using commodity multicore machines. We also provide statistics (such as the empirical width) that may prove to be interesting and useful in aiding future work on this topic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>Given an undirected, unweighted graph</p><p>The goal of the densest subgraph problem is to find a subgraph S &#8838; G, such that &#961;(S) is maximized. We will use S * to denote a densest subgraph of G with maximum density &#961; * .</p><p>A central structure that we study is the k-core of an undirected graph. We now define k-core formally. </p><p>It is well known that to find core(G, k), one can repeatedly peel<ref type="foot">foot_0</ref> an arbitrary vertex v from G so long as deg G (v) &lt; k. This process terminates when all remaining vertices have degree at least k, or the graph becomes empty. If the remaining graph is not empty, then it is the unique subgraph, core(G, k). Next, we define the coreness or core number of a vertex v:</p><p>An easy modification of the peeling algorithm described above yields core(v) for all vertices v. We call this peeling-based algorithm Coreness. In this algorithm, we pick a vertex with minimum degree and peel it one at a time until there are no vertices left. Let D be a variable that represents the maximum degree of peeled vertices at the time we peel them. Initially, D = 0. Once v is about to be peeled, we set D &#8592; max(D, deg G (v)). We then set core(v) &#8592; D and peel v from G. We refer to the ordering of vertices that we peel in this process as a degeneracy ordering of the graph, which is unique up to permuting vertices in the order with the same coreness.</p><p>We also use the following notion of c-approximate k-core decomposition, which can be computed more efficiently than exact k-core.</p><p>Later on, we will want to find an ordering that is similar to the degeneracy ordering, but certain loads of vertices are also given as input. Let &#8467;(v) be load of v. At each step, we peel the vertex that minimizes the term &#8467;(v) + deg G (v). Note that after v is peeled, the induced degrees deg G (v &#8242; ) of v's neighbors v &#8242; are decreased. As a special case, we obtain the degeneracy ordering by setting &#8467;(v) = 0 for all v. For the remainder of this paper, we refer to the ordering obtained using &#8467;(v) + deg G (v) as the load ordering . Model Definitions. We analyze the theoretical efficiency of our parallel algorithms in the work-depth model <ref type="bibr">[16,</ref><ref type="bibr">30]</ref>. In this model, the work is the total number of operations executed by the algorithm and the depth (parallel time) is the longest chain of sequential dependencies. We assume that concurrent reads and writes are supported in O(1) work/depth. A work-efficient parallel algorithm is one with work that asymptotically matches the best-known sequential time complexity for the problem. All of our algorithms presented in this paper are work-efficient. We say that a bound holds with high probability (whp) if it holds with probability at least 1 -1/n c for any c &#8805; 1.</p><p>We use the following parallel primitives in our algorithms: ParFor , SuffixSum, FindMax , Bucketing , and IntegerSort. Each primitive takes a sequence A of length n. ParFor is a parallel version of a forloop that we use to apply a function f to each element in the sequence. FindMax returns an element with maximum value among those in the sequence. Suffix-Sum and FindMax can be implemented to take O(n) work and O(log n) depth. IntegerSort returns a sequence in sorted order (either non-increasing or nondecreasing order) according to integer keys. We use two different implementations of IntegerSort: the first is an algorithm by Raman <ref type="bibr">[40]</ref> which takes O(n log log n) expected work and O(log n) depth whp, and the second is a folklore algorithm that takes O(n/&#949;) work and O(n &#949; ) depth for 0 &lt; &#949; &lt; 1 <ref type="bibr">[47]</ref>. The decision to use one of these two sorting algorithms depends on whether work or depth is more important. We state the complexity of our algorithm in both ways when necessary.</p><p>for some real number x &#8712; (0, 1). The last line can be viewed as a weighted average between &#961;(G &#8242; ) and deg G (v). Since deg G (v) &lt; &#961;(G), it has to be the case that &#961;(G &#8242; ) &gt; &#961;(G) so that their average becomes &#961;(G).</p><p>As a corollary, any vertex in a densest subgraph has induced degree at least &#961;(S * ). </p><p>Corollary 2.1 follows immediately from Lemma 2.1 because vertices v with deg G (v) &lt; &#961; * can be peeled while increasing the density of the remaining subgraph. Thus, a natural procedure that we have for obtaining the densest subgraph is to iteratively remove any vertex v that has degree less than the current density of the subgraph. Notice that this process is very similar to the algorithm for computing core(G, k) described in Section 2. In fact, we can relate k-core to the densest subgraph.</p><p>Lemma 2.2. For some k &#8804; &#8968;&#961; * &#8969;, let C = core(G, k) be the k-core of G. It must be the case that S * &#8838; C.</p><p>Proof. We prove this by contradiction. Assume that S * \ C is non-empty (i.e., there is a vertex in In the example, the density of each successive S &#8242; i is increasing, and the cores Ci decrease in size.</p><p>Similarly, the largest non-empty c-approximate kcore, kmax , also gives us a lower bound on &#961; * , in terms of the density of a (potentially larger) approximate core with smaller approximate core number: Corollary 2.3. Let kmax be the maximum integer such that the apxcore(G, kmax ) is not empty. Let C be the &#8968; kmax 2c &#8969;-approximate core. Then S * &#8838; C. Proof. The proof is identical to that of Corollary 2.2; the only difference is that since the core is approximate, the lower bound on deg S (v) for any v in apxcore(G, kmax ), is kmax /c.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Pruning-and-Refining Framework</head><p>Based on the properties described in Section 2, any algorithm that yields a lower bound on &#961; * can be used for pruning the graph while retaining the densest subgraph. The main idea of the framework is as follows. Let L be a lower bound on &#961; * . We can prune the input graph G by computing G &#8242; , which is the &#8968;L&#8969;-core of G and then search for the densest subgraph in G &#8242; instead of G. This process can be repeated multiple times, making it useful in algorithms that iteratively refine (tighten) the lower bounds for &#961; * over a sequence of steps.</p><p>To the best of our knowledge, the idea of using cores to prune the graph adaptively while refining the approximate densest subgraph solution has not been done in the literature. The closest idea is from Fang et al. <ref type="bibr">[23]</ref> and Xu et al. <ref type="bibr">[48]</ref>. In <ref type="bibr">[23]</ref>, their pruning rules first compute Coreness, and then inspect connected components from the &#8968; kmax 2 &#8969;-core. They take the maximum density found among the connected components as a lower bound and use k max as an upper bound. They then run a flow-based algorithm on each connected component separately. Note that flowbased algorithm can only tell if a graph has a subgraph of a specific density &#961;, so a binary search over the optimal density is required to solve the densest subgraph problem. Their pruning rules do not help much if there is only a single component in the &#8968; kmax 2 &#8969;-core. Then, in <ref type="bibr">[48]</ref>, they give a sequential pruning-based algorithm based on flow where their implementation prunes the graph at the beginning and runs a flow-based algorithm to find approximate densities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Framework Overview</head><p>We apply this idea in an algorithmic framework for computing an approximate densest subgraph, which is shown in Algorithm 1. The pseudocode uses exact pruning, i.e., it uses the value of the exact k max -core, but we also describe how to use approximate k-cores below. On Lines 1-4, we compute the lower bound L by applying Corollary 2.2 and either an exact k-core algorithm or an approximate k-core algorithm. Both algorithms take O(m + n) work, but approximate k-core has provably poly-logarithmic depth. For exact k-core, we use the bucketing-based k-core implementation of <ref type="bibr">[18,</ref><ref type="bibr">19]</ref>. The algorithm iteratively peels all vertices with degree at most d in parallel, starting with d = 0, and incrementing d whenever there are no more vertices with degree at most d. The algorithm takes O(m + n) expected work and O(c p log n) depth with high probability, where c p is the peeling complexity , which is defined as the number of iterations needed to completely peel the graph. For approximate k-core, we use the implementation of Liu et al. <ref type="bibr">[35]</ref>, which gives a (2 + &#948;)-approximation to all core numbers and takes O(m + n) expected work and O(log 3 n) depth whp. Line 2 shows the lower bound L given k max , but we can alternatively compute the lower bound for the approximate k-core approach using Corollary 2.3. Given the coreness values in cores, we extract the j-core from G using getCore(G, cores, j) on Line 3.</p><p>On Lines 5-10, we iterate for T rounds, where each round calls a function Ref ine which computes subgraphs with potentially higher density. Note that for algorithms that we use in our paper, our Ref ine step on Line 6 is oblivious to the parameter L. However, knowing L might be useful for other algorithms (e.g., flow-based algorithms). After each refinement step, we then get a potentially better solution, which we memorize in Line 7-9. In line 10, we leverage this lower bound L by shrinking G to be &#8968;L&#8969;-core. By Lemma 2.2, the densest subgraph &#8968;L&#8969;-core contains densest subgraph S * . We then return the approximate densest subgraph on Line 11. In Section 3.2, we describe various options for the Ref ine function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1: Pruning-and-Refining Framework</head><p>Input :</p><p>11 return S 3.2 Refinement Algorithms Next, we describe algorithms that can be used for the Ref ine function in Algorithm 1. We first describe two existing sequential algorithms, the peeling algorithm and Greedy++, and then introduce our parallel algorithms.</p><p>Peeling Algorithm <ref type="bibr">[11]</ref>. At each step, we compute the density of the current graph. Then, we pick a vertex v with the minimum induced degree and remove it from the graph. We continue until there are no vertices remaining, and return the subgraph with the maximum density found in this process. The peeling algorithm can be parallelized <ref type="bibr">[18]</ref>, but can have linear depth in the worst case. Charikar <ref type="bibr">[11]</ref> proves that the subgraph returned by this peeling algorithm has a density at least half the optimum density, i.e., it gives a 2-approximation to the densest subgraph.</p><p>Greedy++ <ref type="bibr">[10]</ref>. Algorithm 2 presents a greedy loadbased densest subgraph algorithm, for which the stateof-the-art Greedy++ algorithm is a special case. Initially, each vertex v is associated with a load &#8467;(v) = 0 (Lines 2-3). The algorithm runs for T iterations (Lines 4-11). On each iteration, we compute the degeneracy order O with respect to the load &#8467; on graph H to obtain the ordered set of vertices v 1 , . . . , v n (Line 6). Then, on Lines 7-11, we peel vertices in this order. When v i is peeled, we compare the density of the remaining subgraph to the density of the best subgraph found so far, and save the denser of the two. We also update the loads, setting &#8467;</p><p>, where deg * H (v i ) here is the induced degree of v i when it is peeled. We return the best subgraph found after T = &#920;( &#8710; log n &#961; * &#949; 2 ) iterations, where &#8710; is the maximum degree and 0 &lt; &#949; &lt; 1 is an adjustable parameter. This algorithm yields a (1+&#949;)approximation of the densest subgraph as shown in <ref type="bibr">[13]</ref>.</p><p>The first iteration of Greedy++ is exactly the peeling algorithm of Charikar.</p><p>The original Greedy++ algorithm is implemented in a way where the degeneracy ordering and the update steps are fused together. It will become clear once we introduce our algorithm below why we decouple these two steps for obtaining greater parallelization.</p><p>GreedySorting++ (our algorithm). Our second algorithm uses a simpler method for computing O than Greedy++, in that it orders vertices based on their loads at the beginning of the iteration. The motivation for this algorithm is that sorting is highly parallelizable and is also faster in practice than the iterative peeling process used in Greedy++ (which has linear depth). Therefore, on Line 6 of Algorithm 2, we compute v 1 , . . . , v n , such that &#8467;(v 1 ) &#8804; &#8467;(v 2 ) &#8804; . . . &#8804; &#8467;(v n ). Because of the way we decouple the ordering and update steps in Greedy++, Line 6 is the only difference between the two algorithms. Next, we argue that GreedySorting++ has the same guarantees in terms of the approximation and number of rounds as Greedy++.</p><p>, GreedySorting++ outputs a (1 + &#949;)-approximation to the densest subgraph problem.</p><p>Proof. The proof follows almost immediately from Section 4 of <ref type="bibr">[13]</ref>. To prove that Greedy++ works, they define an exponential-sized linear program where each variable corresponds to one possible peeling order (i.e., a permutation). They then utilize the multiplicative weight update (MWU) framework on the linear pro-  <ref type="foot">2</ref> By the way the formulate their linear program, the subproblem that we need to solve w.r.t. MWU is to find a good ordering. Lemma 4.6 of <ref type="bibr">[13]</ref> shows that the ordering obtained with Greedy++ is a good approximate ordering. The proof of Lemma 4.6 work for any order v 1 , . . . , v n that satisfies the following property: &#8467;(v i ) &#8804; &#8467;(v j ) + &#8710; if i &lt; j. This is true for the ordering used in GreedySorting++, where we sort by the initial load of the vertices. Hence, by plugging in this ordering, all of the proofs in <ref type="bibr">[13]</ref> go through.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Parallel Implementation</head><p>In this subsection, we present our parallelizations of Greedy++ and GreedySorting++. We still run both algorithms for T iterations, one iteration at a time, and our aim is to achieve low depth within each iteration. Parallelization of Algorithm 2. We first describe how to parallelize all parts of Algorithm 2 except for Line 6. Let v 1 , . . . , v n be an ordering of vertices for peeling. On the i'th iteration of the for-loop on Line 7, the induced subgraph of G that we use is S i = G(v i , . . . , v n ). This holds for all 1 &#8804; i &#8804; n. For any edge e = (v j , v k ), e will contribute to the density of S i if and only if i &#8804; j and i &#8804; k.</p><p>Our implementation for computing the densities and updating the loads in parallel is shown in Algorithm 3. We first initialize an empty array A of size n (Line 1). Then, for each edge e = (v j , v k ), we add 1 to A[min(j, k)] (Lines 2-3). Let B be the suffix sum array of A (Line 4). Then, B[i] corresponds to the number of edges remaining in the graph after vertices v 1 , . . . , v i-1 are peeled. To see why it is the case, let us consider the remaining subgraph after v 1 , . . . , v i-1 are peeled.</p><p>Consider an edge e = (v j , v k ). Edge e will appear in this subgraph if and only if both v j and v k are not yet peeled, i.e., i &#8804; j and i &#8804; k. We add 1 to A[min(j, k)] to account for the presence of e in the subgraphs induced by v i&#8804;min(j,k) , . . . , v n . We then compute the densities in parallel and take the maximum density on Lines 5-7. We update the loads in parallel on Lines 8-9. The work and the depth of this implementation are O(n + m) and O(log n), respectively. As we described in Section 2, ParFor , SuffixSum, and FindMax all take linear work. SuffixSum and FindMax have O(log n) depth, and ParFor have O(1) depth, so our depth bound follows. ParallelGreedy++. In order to parallelize Greedy++, what is left for us is to parallelize the computation of the degeneracy ordering, which can be computed with a k-core decomposition algorithm in O(m + n) expected work and O(c p log n) depth, with high probability. When we peel multiple vertices at the same time, the degeneracy ordering can be different from the order obtained sequentially. The reason is that when a vertex is peeled in the sequential algorithm, it affects its neighbors' degrees immediately. However, in the parallel version, this effect is delayed until the end of the peeling step where multiple vertices may be peeled together. We claim that this does not significantly affect the order. Consider a pair of vertices v i and v j . To make the proof in Theorem 4 go through, it suffices to show that i &lt; j implies &#8467;(v i ) &#8804; &#8467;(v j ) + &#8710;. We prove the contrapositive. Suppose &#8467;(v i ) &gt; &#8467;(v j ) + &#8710;. Because the number of neighbors of both v i and v j are bounded by &#8710;, it is the case that, even if all of v i 's neighbors are peeled and none of v j 's neighbors are peeled, v i will still be peeled after v j , implying that i &gt; j. Therefore, for i &lt; j we have that &#8467;(v i ) &#8804; &#8467;(v j ) + &#8710;. Note that for our algorithms, the number of steps T depends on &#961; * , but we can choose T based on k max instead, as it is within a factor of 2 of &#961; * . Combining Refinement with Pruning.</p><p>Using the framework in Algorithm 1, we can combine a pruning method with one iteration of Greedy++, GreedySorting++, ParallelGreedy++, and ParallelGreedySorting++ as the Ref ine function.</p><p>For example, we can combine approximate k-core for pruning with one iteration of ParallelGreedySorting++, which would give an algorithm with either O(T (n log log n + m)) expected work and O(T log n + log<ref type="foot">foot_3</ref> n) depth or O(T (n + m)) work and O(T log n + n &#949; ) depth for any 0 &lt; &#949; &lt; 1. Remark. While pruning gives speedups in practice, it does not improve the theoretical complexity of the algorithm, as there exists a graph where the core that we prune down to covers most of the original graph. However, as we observe in our experiments below, pruning results in massive improvements in runtime in practice as most real-world graphs exhibit a densest subgraph that is a small percentage of the input (in some graphs, the densest subgraph contains fewer than 1% of the vertices in the input).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>In this section, we implement and benchmark different instantiations of the Pruning-and-Refining framework on real-world datasets. We also compare our algorithms with existing algorithms. We demonstrate that our approach is practical and is scalable to the largest publicly-available graphs. In addition, we also provide interesting statistical data on large-scale graphs, in particular, near-optimal densities of the larger graphs, previously not reported to such accuracy in literature. All of our code is provided at this link. 3 There is some experimental data omitted due to space constraints. They are included in the full version of our paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Implementations.</head><p>We implement Greedy++, GreedySorting++, and their parallel instantiations as our refinement algorithms. We consider two algorithms for pruning: pruning using exact k-cores, and pruning using approximate k-cores. Both pruning algorithms are parallel and are modular across all of the refinement algorithms. We use the exact k-core decomposition algorithm from Dhulipala et al. <ref type="bibr">[18]</ref> and the approximate k-core decomposition algorithm from Liu et al. <ref type="bibr">[35]</ref> for our pruning step. The algorithm of Liu et al. <ref type="bibr">[35]</ref> allows us to specify the approximation ratio (c in Definition 3). When c is higher, the algorithm tends to be more parallelizable but will be less accurate. We run our experiments with c = 1.5. We also combine approximate k-core decomposition with exact k-core decomposition for further speedups. In particular, our combined pruning algorithm first uses approximate k-core decomposition to shrink the graph and then uses exact k-core decomposition for greater accuracy in the refine step on the smaller graph. Such a procedure results in speedups for a peeling-based algorithm like ParallelGreedy++ since in each iteration, the algorithm already performs much of the necessary work (with minimal modification) to find the k-core decomposition. We name our algorithms as follows: 1. PaRGreedy++: is ParallelGreedy++ combined with our Pruning-and-Refining framework with exact kcore pruning. 2. PaRSorting++: is ParallelGreedySorting++ combined with our Pruning-and-Refining framework with exact k-core pruning. 3. PaRApxGreedy++: is ParallelGreedy++ combined with our Pruning-and-Refining framework with approximate k-core pruning followed by exact k-core pruning. 4. PaRApxSorting++: is ParallelGreedySorting++ combined with our Pruning-and-Refining framework with approximate k-core pruning. We present experimental results for each of our methods and also compare with existing implementations.</p><p>Existing Algorithms. We compare with the stateof-the-art (1 + &#949;)-approximation algorithms from the sequential algorithms of <ref type="bibr">Fang et al. (CoreExact [23]</ref> and CoreApp <ref type="bibr">[23]</ref>), the sequential algorithm of Boob et al. (Greedy++ <ref type="bibr">[10]</ref>), the parallel algorithm of Harb et al. (FISTA <ref type="bibr">[28]</ref>, Frank-Wolfe <ref type="bibr">[17]</ref>, and MWU <ref type="bibr">[4]</ref>), and the sequential algorithm of Xu et al. <ref type="bibr">[48]</ref>. We also compare with the state-of-the-art parallel 2-approximation algorithms of Luo et al. <ref type="bibr">[36]</ref> and Dhulipala et al. <ref type="bibr">[19]</ref>.</p><p>Fang et al. <ref type="bibr">[23]</ref> and Xu et al. <ref type="bibr">[48]</ref> implement variants of maximum flow algorithms to find the densest subgraph. Their CoreExact <ref type="bibr">[23]</ref> and cCoreExact <ref type="bibr">[48]</ref> implementations are exactly the flow algorithm with binary search over the density. They perform a density lower-bound estimation using cores and approximate maximum flow, which we described in more detail in Section 2.1. If a graph has many connected components, then a subgraph with maximum density lies exclusively in one component. Hence, they run the flow algorithm on each connected component separately. The densities of cores are then used to determine the lower bounds and upper bounds of the binary search needed for the flow computation. cCoreExact <ref type="bibr">[48]</ref> improves CoreExact <ref type="bibr">[23]</ref> by using cores to shrink the input graph. Their approximation algorithm, CoreApp <ref type="bibr">[23]</ref> is an algorithm that finds core(G, k max ) directly. Once the maximum core is found, they return the component with the highest density. This algorithm yields a 2-approximation for the densest subgraph problem. cCoreG++ <ref type="bibr">[48]</ref> is their implementation of Greedy++ <ref type="bibr">[10]</ref> that uses one iteration of a k-core decomposition algorithm to shrink the input graph.</p><p>Harb et al. <ref type="bibr">[28]</ref> propose a gradient descent based algorithm called FISTA <ref type="bibr">[28]</ref>, where the number of iterations needed is O( &#8730; &#8710;m/&#949;). They use accelerated proximal gradient descent, which is faster than the standard gradient descent approach <ref type="bibr">[6,</ref><ref type="bibr">38]</ref>. The algorithm runs in iterations, where each iteration can be made parallel. The output in each iteration is a feasible solution to a linear program for the densest subgraph problem. They then use Greedy++-inspired rounding, which they call fractional peeling, to round the linear program solution into an integral solution.</p><p>Finally, we benchmark against recent parallel algorithms, Julienne <ref type="bibr">[19]</ref> and PKMC <ref type="bibr">[36]</ref>, for computing the exact k-core decomposition. Then, the maximum core gives a 2-approximation of the densest subgraph. Although these algorithms achieve parallelism, their approximations are worse than the (1 + &#949;)-approximation algorithms, both theoretically and empirically.</p><p>Setup. We use c2-standard-60 Google Cloud instances (3.1 GHz Intel Xeon Cascade Lake CPUs with a total of 30 cores with two-way hyper-threading, and 236 GiB RAM) and m1-megamem-96 Google Cloud instances (2.0 GHz Intel Xeon Skylake CPUs with a total of 48 cores with two-way hyper-threading, and 1433.6 GB RAM). We use hyper-threading in our parallel experiments by default. Our programs are written in C++. We use parallel primitives from the GBBS <ref type="bibr">[18]</ref> and Parlay <ref type="bibr">[8]</ref> libraries. The source code is compiled using g++ (version 10) with the -O3 flag. We terminate experiments that take over 1 hour. We run each experiment for three times and take the average for the runtime and accuracy analyses. Using enough threads, with our framework, Greedy++, GreedySorting++, and their parallel versions finished within 1 hour for all of our experiments. However, all other (1+&#949;)-approximation algorithms took longer than 1 hour on all of the large graphs (clueweb, twitter, friendster, and hyperlink2012), so we omit the entries for these datasets for these algorithms.</p><p>Datasets. We run our experiments on various synthetic and real-world datasets. The real-world datasets are obtained from SNAP <ref type="bibr">[34]</ref> (cahepth, ascaida, hepph, dblp, wiki, youtube, stackoverflow, livejournal, orkut, twitter, and friendster), Network Repository <ref type="bibr">[41]</ref> (brain), Lemur project at CMU <ref type="bibr">[9]</ref> (clueweb), and WebDataCommons <ref type="bibr">[37]</ref> (hyperlink2012). The hyperlink2012 graph is the largest publicly-available real-world graph today. closecliques is a synthetic dataset designed to be challenging for Greedy++ <ref type="bibr">[10,</ref><ref type="bibr">13,</ref><ref type="bibr">28]</ref>. We remove self-loop and zero-degree vertices from all graphs, and symmetrize any directed graphs. We run most of our experiments on c2-standard-60 machines. However, on the larger graphs (namely, twitter, friendster, clueweb, and hyperlink2012), we use m1-megamem-96 machines as more memory is required. The sizes of our inputs and their maximum core values is included in our full paper.</p><p>Overview of Results. We show the following experimental results in this section.</p><p>&#8226; Our pruning strategy is very efficient in practice as our pruned graph contains 175&#215; fewer edges on average and 3, 596&#215; fewer vertices on average. &#8226; Our algorithms, similar to the state of the art, take only a few iterations to converge. PaRSorting++ takes more iterations, but it still converges to &lt; 1.01approximation within 10-20 iterations and each iteration is significantly faster than all other algorithms. &#8226; Our algorithms are faster than existing algorithms by a large margin. &#8226; Our algorithms are highly parallelizable, achieving up to 22.37&#215; self-relative parallel speedup on a 30-core machine with two-way hyperthreading. &#8226; We measure empirical "width", which is a parameter that correlates to the number of iterations needed to converge. We observe that the empirical width is much smaller than the upper bound used to analyze the algorithm. This may lead to more fine-grained analyses of many MWU-inspired algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Core-Based Pruning</head><p>In this section, we present experimental results related to various different pruning methods using exact and approximate k-core decomposition (and combinations thereof).</p><p>Pruning with core(G, &#8968; kmax 2 &#8969;). We first study the benefit of performing pruning using the exact k-core computation. The data for this experiment across all graphs can be found in our full paper. For the realworld graphs, the cores contain between 2-282&#215; fewer edges than the actual graphs (48.3&#215; fewer on average), and between 4.5-14227&#215; fewer vertices (2420&#215; fewer on average). The only exception is the brain dataset, where the core is half the size of the actual graph. Even in this case, the number of vertices left in the core is around 25% of the original graph. For the synthetic dataset closecliques, the input is designed so that the maximum-core is identical to the original graph, so there is no benefit to pruning. However, this situation is very unlikely to occur in real-world datasets.</p><p>Due to the significant reduction in graph sizes in terms of both the number of vertices and number of edges, using core(G, &#8968; kmax 2 &#8969;) is almost always preferable over using G, especially since computing all cores of G (a linear-work algorithm, with reasonably high parallelism in practice <ref type="bibr">[18]</ref>) is inexpensive compared to the cost of running any of the refinement algorithms, which mostly require super-linear work. To summarize, we find that pruning is nearly always beneficial and should be applied prior to refinement.</p><p>Pruning with Highest Cores. As our algorithms progress, we perform additional pruning to shrink the graph even further. We report the sizes of the final graphs in our full paper. In many cases, the sizes of the final graph (after pruning to the highest cores) are less than half of the sizes of their &#8968; kmax 2 &#8969;-cores. Across all datasets, we find that iterative pruning yields up to a 30.3&#215; reduction in the number of vertices over the core(G, &#8968;k max /2&#8969;), and up to a 200&#215; reduction in the number of edges when comparing these quantities in core(G, &#8968;k max /2&#8969;) and core(G, &#8968;&#961;&#8969;), where &#961; is the best density found by our algorithms. Thus, we see advantages in performing multiple rounds of refinement in certain graphs. Note that, there are cases when this additional pruning step is not helpful, e.g., in dblp and clueweb. In each of these cases, we notice that the best density found is very close to &#8968;k max /2&#8969;, so there is no room for pruning opportunities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Number of Iterations Versus Density</head><p>Next, we study the progress that different refinement algorithms make in our framework vs. other implementations toward finding the maximum density. We perform this experiment on all variants of our algorithms: PaRGreedy++, PaRSorting++, PaRApxGreedy++, and PaRApxSorting++; and all other benchmarks that use iterations: FISTA, Greedy++, FrankWolfe, MWU, and cCoreG++. We also include PKMC as a baseline of comparison against a 2-approximation algorithm. All algorithms were run for at least 20 iterations. The results are illustrated in Fig. <ref type="figure">2</ref>. In fact, most algorithms converge very early with our algorithms PaRGreedy++ and PaRApxGreedy++ converging no later than the fastest converging algorithms. In fact, on most graphs, Greedy++, PaRGreedy++, and PaRSorting++ took the fewest iterations to converge. Two algorithms, PaRSorting++ and MWU, take more iterations in many graphs. This matches with our understanding of the width of MWU as discussed below. Fur-thermore, PaRApxGreedy++ and PaRApxSorting++ exactly match convergence rates of PaRGreedy++ and PaRSorting++, respectively; such is expected as our use of approximate vs. exact pruning affects the runtime not the accuracy. PaRApxSorting++ needs more iteration to converge since we only use approximate k-core pruning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Approximation Ratio</head><p>We compare the densities returned from various algorithms at iteration 10 with the best density currently known in the literature. Except for brain, twitter, friendster, clueweb, and hyperlink2012, the best known density is equal to the optimum. To compute the optimum density, we run a linear program solver on core(G, &#8968;&#961;&#8969;).</p><p>Except for FrankWolfe, all algorithms have approximation ratios less than 1.02 after 10 iterations. Our PaRGreedy++ algorithm achieves the best approximation ratio after 10 iterations for all four of the largest graphs, twitter, friendster, clueweb, and hyperlink2012. For the rest of the graphs, our algorithm achieves an approximation ratio no worse than 1.0001&#215; the smallest approximation ratio. We also include a table that compares densities after iteration 20 in Table <ref type="table">2</ref>. After 20 iterations, most approximation ratios are less than 1.001. Our algorithm PaRGreedy++ obtains the best approximation for 8 out of the 12 tested graphs and obtains an approximation ratio no worse than 1.0000002&#215; the best for the remaining graphs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Empirical Widths</head><p>As mentioned at the end of Section 3.2, in the multiplicative weight update framework, width is a parameter that is correlated with the number of rounds needed for a solution to converge. In our context, the width &#969; corresponds to the maximum increase of a load of a single vertex in any iteration. See, e.g., <ref type="bibr">[3,</ref><ref type="bibr">13]</ref> for more details on width and its analysis. &#969; is lower bounded by &#961; * because when we peel the first vertex from the densest subgraph, its degree must be at least &#961; * . The width is also upper bounded by the maximum degree &#8710;, since the increase of a load of a vertex is bounded by its degree, i.e., &#961; * &#8804; &#969; &#8804; &#8710;. This upper bound is reflected in the T = O &#8710; log n &#961; * &#949; 2 iterations needed for our algorithms in the worst case. However, this bound does not reflect reality, as most of our iterative algorithms usually converges in just a few iterations. The bound on the number of iterations could have been T = O( &#969; log n &#961; * &#949; 2 ). This can be significant if &#969; &#8810; &#8710;. Here, we partially explain this phenomenon by measuring the width empirically. In the full paper, we show information about the width across multiple datasets gathered from our experiments. We observe that the widths for running PaRGreedy++ are much closer to the best density found, while the iterations. On the other hand, if &#969; &#8811; &#961; * , our algorithms should take more iterations to converge. This supports what we observed in our experiments, and also explains why it takes very few iterations, e.g., fewer than 10-20 iterations, for PaRGreedy++ to converge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Scalability</head><p>Here, we show the scalability of our algorithms compared to the parallel version of FISTA and the parallel 2-approximation algorithms, Julienne and PKMC in Fig. <ref type="figure">3</ref>, which shows the running time of the algorithms (in milliseconds) versus the number of threads used by our algorithms and parallel FISTA, Julienne, and PKMC when each algorithm is run for 5 iterations. We show additional plots for 10 and 20 iterations the full version of our paper. We see that our PaRSorting++ algorithm achieves greater selfrelative speedups than FISTA and PaRGreedy++. Specifically, PaRSorting++ achieves up to a 10.6&#215; self-relative speedup (on livejournal), while FISTA achieves up to a 14&#215; self-relative speedup (on dblp) and PaRGreedy++ achieves up to a 5.51&#215; self-relative speedup (on orkut). Furthermore, both of our algorithms take shorter time than parallel FISTA regardless of the number of threads. Our PaRApxGreedy++ and PaRApxSorting++ achieve the greatest self-relative speedup. Specifically, On livejournal, PaRApxGreedy++ and PaRApxSorting++ achieves up to a 17.6&#215; and a 20.51&#215; self-relative speedup, respectively.</p><p>PaRApxSorting++ achieves greater self-relative speedup than the rest of the implementations for 8 of the 12 tested graphs. Although PKMC achieves greater self-relative speedups on dblp, hepph, stackoverflow, and friendster, they obtain worse approximations guarantees since they only guarantee a 2-approximation on the density. As we discussed previously, it is much easier to obtain greater parallelism when the approximation guarantee is relaxed to a 2-approximation (we see in Fig. <ref type="figure">2</ref> that PKMC obtains noticeably worse approximations on most graphs).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6">Comparing the Total Running Time</head><p>We first compare PaRSorting++ with the algorithms given in <ref type="bibr">[23]</ref>. We ran experiments on two of their algorithms, namely, CoreExact and CoreApp. CoreExact took too long to run on most datasets. CoreApp is faster than our implementation on some graphs, however, since it uses core(G, k max ), this algorithm gives a 2-approximation and is less accurate than our algorithms. More details are included in the full version of this paper.</p><p>We include plots that compare the total runtime of our algorithms (PaRGreedy++,</p><p>PaRGreedy++ PaRSorting++ PaRApxGreedy++ PaRApxSorting++ FISTA-1 FrankWolfe MWU PKMC cCoreG++ 10 10 1 10 2 10 1.4 10 1.6 Iteration DBLP Density 10 0 10 1 10 2 10 2 10 2.2 Iteration LJ Density 10 0 10 1 10 2 10 2.3 10 2.32 10 2.34 10 2.36 Iteration Orkut Density 10 0 10 1 10 2 10 2.9 10 3 Iteration Brain Density 10 0 10 1 10 2 10 3.2 10 3.4 Iteration Twitter Density 10 0 10 1 10 2 10 2.3 10 2.4 Iteration Friendster Density 10 0 10 1 10 2 10 3.2 10 3.3 Iteration ClueWeb Density 10 0 10 1 10 2 10 3.7 10 3.75 10 3.8 Iteration Hyperlink2012 Density PaRGreedy++ PaRSorting++ PaRApxGreedy++ PaRApxSorting++ Julienne FISTA PKMC 0 20 40 60 10 2 10 3 Threads DBLP Runtime (ms) 0 20 40 60 10 2 10 3 10 4 10 5 Threads LJ Runtime (ms) 0 20 40 60 10 2 10 3 10 4 10 5 Threads Orkut Runtime (ms) 0 20 40 60 10 2 10 3 10 4 10 5 10 6 Threads Brain Runtime (ms) b r a in o r k u t li v e jo u r n a l d b lp s t a c k o v e r fl o w w ik i y o u t u b e h e p p h 10 1 10 4 10 7 Runtime (ms) PaRGreedy++ PaRSorting++ PaRApxGreedy++ PaRApxSorting++ FISTA Greedy++ FrankWolfe MWU cCoreG++ cCoreExact PKMC Figure 4: Runtimes of different densest subgraph algorithms on our small graph inputs. The algorithms are run for 20 iterations. Parallel algorithms use 60 hyper-threads.</p><p>Graph Dataset &#961; FISTA MWU FrankWolfe Greedy++ PaRGreedy++ PaRSorting++ PaRApxSorting++ hepph* 265.969 1.00043 1.00011 1.00011 1 1 1.00031 1.00001 dblp* 56.565 1 1 1 1 1 1 1 brain 1057.458 1.00011 1.00005 1.00031 1 1 1.00026 1.0001 wiki* 108.59 1 1 1.00723 1 1 1.00002 1 youtube* 45.599 1.00023 1.00104 1.01522 1.00007 1 1.00079 1.00063 stackoverflow* 181.587 1 1.00001 1.0031 1 1 1.00002 1.00001 livejournal* 229.846 1.00113 1.00003 1.03671 1 1 1.00019 1.00006 orkut* 227.874 1 1.00011 1.00123 1 1 1.00026 1.00026 twitter 1643.301 n/a n/a n/a 1 1 1.00003 1.00006 friendster 273.519 n/a n/a n/a n/a 1 1 1.00006 clueweb 2122.5 n/a n/a n/a n/a 1 1 1 hyperlink2012 6496.649 n/a n/a n/a n/a 1 1 1.00002 with Greedy++ <ref type="bibr">[10]</ref>, FISTA, FrankWolfe, PKMC, cCoreG++, cCoreExact, and MWU <ref type="bibr">[28]</ref> in Fig. <ref type="figure">4</ref>. The detailed results can be found in our full paper. In short, when measuring the quality of our solutions in running time, our algorithms outperform all existing algorithms by significant margins, and is up to 25.9&#215; faster than the fastest (1 + &#949;)-approximation algorithm for each graph. On many of the graphs that we tested, we achieve a 2&#215; improvement in runtime when using approximate k-core compared to when we use exact.</p><p>Large Graph Runtime and Accuracy Results.</p><p>Even when using multi-threading, our algorithms are the only algorithms to finish processing the large graphs (twitter, friendster, clueweb, and hyperlink2012) within 1 hour. Specifically, for 20 iterations and 60 threads, our fastest algorithms PaRGreedy++ and PaRSorting++ require 10. <ref type="bibr">44</ref>, 15.35, 112.64, and 352.65 seconds, and 8.41, 10.54, 83.91, and 270.39 seconds on twitter, friendster, clueweb, and hyperlink2012, respectively. When using approximate k-core, PaRApxGreedy++ and PaRApxSorting++ require 9.77, 15.15, 120.97, and 375.71 seconds, and 8.27, 8.62, 85.29, and 287.27 seconds on twitter, friendster, clueweb, and hyperlink2012, respectively. Moreover, we obtain the best densities known in literature for these graphs (shown in Table <ref type="table">2</ref>). For massive real-world graphs, our experiments show that using approximate kcore does not yield much benefit. This is because larger graphs lead to more parallelism in the exact k-core decomposition algorithm, so the exact k-core algorithm exhibits similar parallelism to the approximate k-core algorithm on a 30-core machine.</p><p>Initialization time. We measure the initialization time of our algorithms (i.e., the time to perform the k-core decomposition step), and report the results in our full paper. The finding is that significant portion of total runtime is on this initialization step for all of the graphs. This makes sense as the initial graph tends to be much larger than the graph we obtain after pruning. To illustrate this point, for brain, when running on 60 threads, the initialization step takes at least 2655 ms for PaRGreedy++, PaRSorting++, and PaRApxGreedy++, and 1074 ms for PaRApxSorting++. The average time spent on each iteration for these algorithms are 138, 3, 148, and 2.3 ms, respectively. PaRApxSorting++ is still faster than all other algorithms despite running for 300 more iterations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>We introduced a framework that combines pruning and refinement for solving the approximate densest subgraph problem. We designed new parallel variants of the sequential Greedy++ algorithm, and achieved state-of-the-art performance by plugging them into our framework. We showed that our algorithms can scale to the large hyperlink2012 and clueweb graphs and obtain near-optimal approximations of their densest subgraphs for the first time in the literature.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Throughout this paper, we say a vertex v is peeled from G when v and all its adjacent edges are deleted.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>See, e.g.,<ref type="bibr">[3]</ref> for a survey on this topic.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Downloaded 01/21/24 to 73.69.254.6 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>https://github.com/PattaraS/gbbs/tree/ALENEX</p></note>
		</body>
		</text>
</TEI>
