<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Subcubic Equivalences between Graph Centrality Problems, APSP, and Diameter</title></titleStmt>
			<publicationStmt>
				<publisher>Association for Computing Machinery</publisher>
				<date>01/31/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10565797</idno>
					<idno type="doi">10.1145/3563393</idno>
					<title level='j'>ACM Transactions on Algorithms</title>
<idno>1549-6325</idno>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Amir Abboud</author><author>Fabrizio Grandoni</author><author>Virginia Vassilevska_Williams</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>Measuring the importance of a node in a network is a major goal in the analysis of social networks, biological systems, transportation networks, and so forth. Different<italic>centrality</italic>measures have been proposed to capture the notion of node importance. For example, the<italic>center</italic>of a graph is a node that minimizes the maximum distance to any other node (the latter distance is the<italic>radius</italic>of the graph). The<italic>median</italic>of a graph is a node that minimizes the sum of the distances to all other nodes. Informally, the<italic>betweenness centrality</italic>of a node<italic>w</italic>measures the fraction of shortest paths that have<italic>w</italic>as an intermediate node. Finally, the<italic>reach centrality</italic>of a node<italic>w</italic>is the smallest distance<italic>r</italic>such that any<italic>s</italic>-<italic>t</italic>shortest path passing through<italic>w</italic>has either<italic>s</italic>or<italic>t</italic>in the ball of radius<italic>r</italic>around<italic>w</italic>.</p> <p>The fastest known algorithms to compute the center and the median of a graph and to compute the betweenness or reach centrality even of a single node take roughly cubic time in the number<italic>n</italic>of nodes in the input graph. It is open whether these problems admit truly subcubic algorithms, i.e., algorithms with running time Õ(n<sup>3-δ</sup>) for some constant δ > 0.<xref ref-type='fn'><sup>1</sup></xref></p> <p>We relate the complexity of the mentioned centrality problems to two classical problems for which no truly subcubic algorithm is known, namely All Pairs Shortest Paths (APSP) and Diameter. We show that Radius, Median, and Betweenness Centrality are<italic>equivalent under subcubic reductions</italic>to APSP, i.e., that a truly subcubic algorithm for any of these problems implies a truly subcubic algorithm for all of them. We then show that Reach Centrality is equivalent to Diameter under subcubic reductions. The same holds for the problem of approximating Betweenness Centrality within any finite factor. Thus, the latter two centrality problems could potentially be solved in truly subcubic time, even if APSP required essentially cubic time.</p> <p>On the positive side, our reductions for Reach Centrality imply an improved Õ(Mn<sup>ω</sup>)-time algorithm for this problem in case of non-negative integer weights upper bounded by<italic>M</italic>, where ω is a fast matrix multiplication exponent.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Identifying the importance of nodes in networks is a major goal in the analysis of social networks (e.g., citation networks, recommendation networks, or friendship circles), biological systems (e.g., protein interaction networks), computer networks (e.g., the Internet or peer-to-peer networks), transportation networks (e.g., public transportation or road networks), and so forth. A variety of graph theoretic notions of node importance have been proposed; among the most relevant ones are betweenness centrality <ref type="bibr">[25]</ref>, graph centrality <ref type="bibr">[36]</ref>, closeness centrality <ref type="bibr">[54]</ref>, and reach centrality <ref type="bibr">[35]</ref>.</p><p>The graph centrality of a node w is the inverse of its maximum distance to any other node. The closeness centrality of w is the inverse of the total distance of w to all the other nodes. The reach centrality of w is the maximum distance between w and the closest endpoint of any s-t shortest path passing through w. Informally, the betweenness centrality of w measures the fraction of shortest paths having w as an intermediate node.</p><p>In this article we study four fundamental graph centrality computational problems associated with the mentioned centrality measures. Let G = (V , E) be an n-node m-edge (directed or undirected) graph, with integer edge weights w : E &#8594; {0, . . . , M } for some M &#8805; 1. Though we focus here on non-negative weights, part of our results can be extended to the case of directed graphs with possibly negative weights and no negative cycles. Let d G (s, t ) denote the distance from node s to node t, and let us use d (s, t ) instead when G is clear from the context.</p><p>&#8226; The Radius problem is to compute R * := min r * &#8712;V max v &#8712;V d (r * , v) (radius of the graph). &#8226; The Betweenness Centrality problem (for a given node b) is to compute the number BC (b) of shortest paths that have b as an intermediate node. <ref type="foot">2</ref>All of these notions are related in one way or another to shortest paths. In particular, we can solve the first three problems by running an algorithm for the classical All-Pairs Shortest Paths (APSP) problem on the underlying graph and doing a negligible amount of post-processing. The same holds for Betweenness Centrality by assuming that shortest paths are unique by a simple algorithm. This was recently extended to the case of (possibly) non-unique shortest paths in unweighted graphs <ref type="bibr">[12]</ref>. Part of our results for Betweenness Centrality assume the uniqueness of shortest paths. Using the best known algorithms for APSP <ref type="bibr">[61]</ref>, this leads to a slightly subcubic (by an n o (1) factor) running time for the considered problems, and no faster algorithm is known.</p><p>Each of these problems, however, only asks for the computation of a single number. It is natural to ask, is solving APSP necessary? Could it be that these problems admit much more efficient solutions? In particular, do they admit a truly subcubic <ref type="foot">3</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>algorithm?</head><p>Subcubic Equivalences between Graph Centrality Problems, APSP, and Diameter</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3:3</head><p>Besides the fundamental interest in understanding the relations between such basic computational problems (can Radius be solved truly faster than APSP?), these questions are well motivated from a practical viewpoint. As evidence to the necessity of faster algorithms for the mentioned centrality problems, we remark that some papers presenting algorithms for Betweenness Centrality <ref type="bibr">[8]</ref> and Median <ref type="bibr">[37]</ref> have received more than 1,000 citations each.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Approach</head><p>The techniques of this article fall within the realm of fine-grained complexity (see <ref type="bibr">[58]</ref> for a survey on the topic). A refinement of NP-hardness, the fine-grained approach strives to prove, via "finegrained" reductions, that improving on a given upper bound for a computational problem B would yield breakthrough algorithms for many other famous and well-studied problems. At a high level, the idea is to consider two problems A and B for which the fastest known algorithms have running times O (a(n)) and O (b (n)) (here n is a size parameter such as the number of nodes in a graph), respectively. Typically A is a problem that is widely believed to need a(n) 1-o (1) time. The approach then uses special reductions to transform an instance of A to instances of B, so that if there were an algorithm for B with running time O (b (n) 1-&#949; ) for some &#949; &gt; 0, then composing this algorithm with the reduction would yield an algorithm for A running in time O (a(n) 1-&#948; ) for &#948; &gt; 0. Since A is widely believed to not have such an algorithm, this can be used as evidence that a O (b (n) 1-&#949; ) time algorithm for problem B is unlikely to exist (or at least very hard to find). When a(n) = b (n) = n 3 , a reduction of the above kind is called a subcubic reduction <ref type="bibr">[64]</ref> from A to B. We say that two problems A and B are equivalent under subcubic reductions if there exists a subcubic reduction from A to B and from B to A. In other terms, a truly subcubic time algorithm for one problem implies a truly subcubic time algorithm for the other and vice versa.</p><p>In this article we will also consider randomized reductions of the above type. In more detail, there exists a Monte-Carlo subcubic reduction with success probability p from A to B if, given a truly subcubic algorithm for B, we can solve A in truly subcubic time and the answer is correct with probability at least p. If p &#8805; 1 -1/n O (1) , the above Monte-Carlo reduction is a high-probability one. Equivalence under such Monte-Carlo reductions is defined similarly.</p><p>Vassilevska Williams and Williams <ref type="bibr">[64]</ref> introduced this approach to the realm of graph algorithms to show the subcubic equivalence between APSP and a list of seven other problems, including deciding if an edge-weighted graph has a triangle with negative total weight (Negative Triangle), deciding if a given matrix defines a metric, and the Replacement Paths problem <ref type="bibr">[33,</ref><ref type="bibr">34,</ref><ref type="bibr">53,</ref><ref type="bibr">59,</ref><ref type="bibr">62]</ref>. Other examples of this approach <ref type="bibr">[1,</ref><ref type="bibr">3,</ref><ref type="bibr">48]</ref> include the famous results on 3-SUM hardness starting with the work of Gajentaan and Overmars <ref type="bibr">[26]</ref>. More recently, the fine-grained approach has gained popularity. The main prototypical hard problems used are CNF-SAT, APSP, and 3SUM, but also some others such as k-Clique and more. Many incredibly diverse problems are now known to have fine-grained reductions from these prototypical hard problems. See the survey by Vassilevska Williams <ref type="bibr">[58]</ref>.</p><p>In this article we exploit both APSP and Diameter as our prototypical problem and prove a collection of subcubic equivalences with the above graph centrality problems. Recall that the Diameter problem is to compute the largest distance in the graph. There is a trivial subcubic reduction from Diameter to APSP, and although no truly subcubic algorithm is known for Diameter, finding a reduction in the opposite direction is one of the big open questions in this area: can we compute the largest distance faster than we can compute all the distances?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Subcubic Equivalences with APSP</head><p>Our first main result is to show that Radius, Median, and Betweenness Centrality are equivalent to APSP under subcubic reductions. Therefore, we add three relevant problems to the list of APSP-hard problems <ref type="bibr">[64]</ref>, and if any of these problems can be solved in truly subcubic time, then all of them can.</p><p>Theorem 1.1. Radius is equivalent to APSP under subcubic reductions. Theorem 1.2. Median is equivalent to APSP under subcubic reductions. Theorem 1.3. Betweenness Centrality (with unique shortest paths) is equivalent to APSP under high-probability Monte-Carlo subcubic reductions.</p><p>We remark that, in the proof of Theorem 1.3, randomization is used only to guarantee the uniqueness of shortest paths in the reduction from APSP to Betweenness Centrality. In particular, dropping the uniqueness requirement, the same reduction would be deterministic. However, the converse reduction would not work as we mentioned earlier since the number of alternative shortest paths could be exponentially large.</p><p>Unfortunately, this is strong evidence that a truly subcubic algorithm for computing these centrality measures is unlikely to exist (or at least is very hard to find) since it would imply a huge and unexpected algorithmic breakthrough.</p><p>We find the APSP-hardness result for Radius quite interesting since, prior to our work, there was no good reason to believe that Radius might be a truly harder problem than Diameter. Indeed, in terms of approximation algorithms, any known algorithm to approximate the diameter can be converted to also approximate the radius in undirected graphs within the same factor <ref type="bibr">[4,</ref><ref type="bibr">7,</ref><ref type="bibr">14,</ref><ref type="bibr">52]</ref>. Furthermore, the exact algorithms for Diameter and Radius in graphs with small integer weights are also extremely similar <ref type="bibr">[17]</ref>. The same holds for the lower bounds on fast approximation algorithms for Radius and Diameter in sparse graphs <ref type="bibr">[2,</ref><ref type="bibr">52]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Subcubic Equivalence between Reach Centrality and Diameter</head><p>Our second main result is to show that Reach Centrality and Diameter are equivalent under subcubic reductions. Theorem 1.4. Diameter and Reach Centrality are equivalent under subcubic reductions.</p><p>On the positive side, it is within the realm of possibility that Diameter is a truly easier problem than APSP, which would imply the same for Reach Centrality. On the negative side, Theorem 1.4 shows that finding a subcubic algorithm for Reach Centrality is as hard as finding a subcubic algorithm for Diameter-a big open problem.</p><p>As a consequence of the tightness of our reductions, namely not only the number of nodes but also the largest absolute weight is roughly preserved, we also obtain a faster algorithm for Reach Centrality in directed graphs with small integer weights. Theorem 1.5. There exists an &#213; (Mn &#969; ) time algorithm for Reach Centrality in directed graphs.</p><p>Above &#969; &#8712; [2, 2.373) <ref type="bibr">[16,</ref><ref type="bibr">19,</ref><ref type="bibr">27,</ref><ref type="bibr">28,</ref><ref type="bibr">63]</ref> denotes fast matrix multiplication exponent. The previous best algorithm for small integer weights, which is based on the solution of APSP, takes time &#213; (M 0.752 n 2.529 ) <ref type="bibr">[66]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4">Approximation Algorithms</head><p>An approximate value of the mentioned graph centrality measures might be sufficiently good in practice. This is indeed the topic of several empirical works on Betweenness Centrality <ref type="bibr">[6,</ref><ref type="bibr">9,</ref><ref type="bibr">29]</ref>. Furthermore, there are practically fast shortest paths algorithms based on reach centrality <ref type="bibr">[30,</ref><ref type="bibr">31,</ref><ref type="bibr">35]</ref>: these algorithms can be adapted to work with approximate values of the reach centrality as well. In this article we formally study the approximability of the mentioned problems. In more detail, given a quantity X (e.g., a graph centrality measure), an &#945;-approximation algorithm computes a quantity x such that 1 &#945; X &#8804; x &#8804; &#945;X for some &#945; &#8805; 1 (&#945; is the approximation factor). A polynomial-time approximation scheme (PTAS) for a given measure X is an algorithm that, given an input parameter &#949; &gt; 0, computes a 1 + &#949; approximate solution x in the above sense. Furthermore, the running time is polynomial for every fixed constant &#949; &gt; 0. Our high-level goal is to design fast &#945;-approximation algorithms with &#945; as close to 1 as possible. It is known how to solve APSP within a multiplicative error (1 + &#949; ) in time &#213; (n &#969; ) for any constant &#949; <ref type="bibr">[65]</ref>. This provides truly subcubic (1 + &#949; ) approximation algorithms for Radius and Median. However, this approach does not help with Reach/Betweenness Centrality, since in those measures almost shortest paths are irrelevant. Here we present some negative and (conditionally) positive results on the approximability of the latter two problems.</p><p>We define the Approximate Betweenness Centrality problem as the problem of computing an &#945;-approximation of BC (b) for some finite &#945; &gt; 0. The Approximate Reach Centrality problem is defined analogously. We present reductions from Approximate Reach/Betweenness Centrality to the following Positive Betweenness Centrality problem: determine whether there exists some shortest path using b as an intermediate node. To the best of our knowledge, the latter problem was not studied before and it might be of independent interest. We show that Positive Betweenness Centrality is equivalent to Diameter (under subcubic reductions), while the corresponding All-Nodes version (where we solve the problem for all possible b) is equivalent to APSP! This explains why it has been difficult to develop approximation algorithms for Betweenness Centrality and Reach Centrality that are at the same time fast and provably accurate.</p><p>On the positive side, we show that a truly subcubic algorithm for Diameter implies a truly subcubic Monte-Carlo PTAS for Betweenness Centrality. Analogously to the case of Reach Centrality, this gives some more hope that a truly subcubic PTAS for Betweenness Centrality exists; however, such algorithm is probably not easy to find. Part of the mentioned reductions are summarized in Figure <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.5">SETH Hardness</head><p>We consider the problem of solving Approximate Reach/Betweenness Centrality in sparse graphs.</p><p>Here we can prove, again passing through Positive Betweenness Centrality, that O (m 2-&#949; ) time algorithms do not exist unless the Strong Exponential Time Hypothesis (SETH) fails. Our reduction can be adapted to the stronger Orthogonal Vector Conjecture (OVC).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.6">Related Work</head><p>APSP is among the best-studied problems in Computer Science. If the edge weights are nonnegative, one can run Dijkstra's algorithm <ref type="bibr">[21]</ref> from every source node and solve the problem in time O (mn + n 2 log n) (by implementing Dijkstra's algorithm with Fibonacci heaps <ref type="bibr">[24]</ref>). Johnson <ref type="bibr">[43]</ref> showed how to obtain the same running time in the case of negative weights also (but no negative cycles). Pettie <ref type="bibr">[49]</ref> improved the running time to O (mn + n 2 log log n) and together with Ramachandran to O (mn log &#945; (m, n)) <ref type="bibr">[50]</ref>. If the graph is undirected and the edge weights are integers fitting in a word, one can solve the problem in time O (mn) in the word-RAM model <ref type="bibr">[57]</ref>. In dense graphs the running time of these algorithms is O (n 3 ). Slightly subcubic algorithms were developed as well, starting with the work of Fredman <ref type="bibr">[23]</ref>. Following a long sequence of improvements (among others, <ref type="bibr">[11,</ref><ref type="bibr">38]</ref>), Williams <ref type="bibr">[61]</ref> obtained an algorithm with running time &#213; (n 3 /2 &#937;( &#8730; log n) ). Faster algorithms are known for small integer weights bounded in absolute value by M: in undirected graphs APSP can be solved in &#213; (Mn &#969; ) time <ref type="bibr">[56]</ref> and in directed graphs in &#213; (n 2 (Mn) 1 4-&#969; ) time <ref type="bibr">[66]</ref>. The result for the directed case can be refined to &#213; (M 0.752 n 2.529 ) using fast rectangular matrix multiplication <ref type="bibr">[39]</ref>.</p><p>As we already mentioned, for general edge-weights the fastest known algorithms for Diameter and Radius solve APSP (hence taking roughly cubic time). In the case of directed graphs with small integer weights bounded by M there are faster, &#213; (Mn &#969; ) time algorithms (see <ref type="bibr">[17]</ref> and the references therein). Faster approximation algorithms are known. Aingworth et al. <ref type="bibr">[4]</ref> showed how to compute a (roughly) 3/2 approximation of the diameter in time &#213; (m &#8730; n + n 2 ). The same approximation factor and running time can be achieved for Radius in undirected graphs <ref type="bibr">[7]</ref>. The running time for both Radius and Diameter was reduced to &#213; (m &#8730; n) by Roditty and Vassilevska Williams <ref type="bibr">[52]</ref> (see also <ref type="bibr">[14]</ref> for a refinement of the approximation factor). The authors also show that a 3/2&#949; approximation for Diameter running in time O (m 2-&#949; ) (for any constant &#949; &gt; 0) would imply that the SETH of <ref type="bibr">[40]</ref> fails, thus showing that improving on the 3/2-approximation factor while still using a fast algorithm would be difficult. A similar hardness result for Radius was shown in <ref type="bibr">[2]</ref> under the Hitting Set Conjecture. Under SETH, there is no better than 5/3 approximation for Diameter in time O (m 3/2-&#949; ) <ref type="bibr">[5]</ref>. See also <ref type="bibr">[10]</ref> for related results on Diameter and Radius. Upper and lower bounds on fast approximation algorithms to compute the Eccentricity of all nodes are given in <ref type="bibr">[2,</ref><ref type="bibr">5,</ref><ref type="bibr">10,</ref><ref type="bibr">14]</ref>. Some more recent fine-grained complexity results on the fast approximability of Diameter are given in <ref type="bibr">[18]</ref>.</p><p>The notion of betweenness centrality was introduced by Freeman in the context of social networks <ref type="bibr">[25]</ref> and since then became one of the most important graph centrality measures in the applications. For example, this notion is used in the analysis of protein networks <ref type="bibr">[20,</ref><ref type="bibr">42]</ref>, social networks <ref type="bibr">[47,</ref><ref type="bibr">51]</ref>, sexual networks <ref type="bibr">[45]</ref>, and terrorist networks <ref type="bibr">[15,</ref><ref type="bibr">44]</ref>. From an algorithmic point of view, betweenness centrality was used to identify a highway-node hierarchy for routing in road networks <ref type="bibr">[55]</ref>. Brandes's algorithm <ref type="bibr">[8]</ref> computes the betweenness centrality of all nodes in time O (mn +n 2 log n). This result is based on a counting variant of Dijkstra's algorithm. We remark that <ref type="bibr">[8]</ref>, similarly to other papers in the area, neglects the bit complexity of the counters that store the number of pairwise shortest paths. This is reasonable in practice since the maximum number N of alternative shortest paths between two nodes tends to be small in many of the applications. By considering also N , the running time grows by a factor of O (log N ) = O (n log n). Indeed, in some applications one can even assume that shortest paths are unique (as we do in some parts of this article). The uniqueness of shortest paths is either a consequence of tie-breaking rules (Canonical-Path Betweenness Centrality problem <ref type="bibr">[29]</ref>) or can be enforced by perturbing edge weights <ref type="bibr">[30]</ref>. Chan et al. <ref type="bibr">[12]</ref> obtain an &#213; (n 3 ) time algorithm for the case of non-unique shortest paths in unweighted graphs. The running time to compute the exact betweenness centrality can be prohibitive in practice for very large networks even assuming the uniqueness of shortest paths. For this reason, some work was devoted to the fast approximation of the betweenness centrality of all nodes <ref type="bibr">[6,</ref><ref type="bibr">9,</ref><ref type="bibr">29]</ref>. Those works are based on random pivot-sampling techniques. They do not provide any theoretical bound on the approximation factor: this is not surprising a posteriori, in view of our APSP-hardness results. In contrast, our results suggest a candidate way to obtain a provably fast and accurate algorithm for Approximate Betweenness Centrality (for a single node). Our approach deviates substantially from <ref type="bibr">[6,</ref><ref type="bibr">9,</ref><ref type="bibr">29]</ref> for small values of the betweenness centrality.</p><p>The Reach Centrality notion was introduced by Gutman <ref type="bibr">[35]</ref> in the framework of practically fast algorithms to solve the Single-source Shortest Paths problem. In particular, the values RC (b) can be used to filter out some nodes during an execution of Dijkstra's algorithm. The notion of Reach Centrality is also used in other works on the same topic <ref type="bibr">[30,</ref><ref type="bibr">31]</ref>.</p><p>Eppstein and Wang <ref type="bibr">[22]</ref> consider the problem of approximating the closeness centrality of all nodes. They present a random-sampling-based O ((m + n log n) log n &#949; 2 ) time algorithm that w.h.p. computes estimates within an additive error &#949;D * , where D * is the diameter of the graph. The same problem is investigated in <ref type="bibr">[9]</ref> from an experimental point of view. The Median problem was also studied in a distance-oracle query model <ref type="bibr">[13,</ref><ref type="bibr">32,</ref><ref type="bibr">41]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.7">Preliminaries and Notation</head><p>W.l.o.g. we assume that the considered graph G = (V , E) is connected, hence m &#8805; n -1. We make the usual assumption that the nodes of the considered graph are labeled with integers between 0 and n -1, and where needed we implicitly assume that n is lower bounded by a sufficiently large constant. For two nodes u, v &#8712; V , by uv we indicate either an undirected edge between u and v or an edge directed from u to v. The interpretation will be clear from the context.</p><p>For a given node w &#8712; V , we let Rad (w ) := max v &#8712;V {d (w, v)} (eccentricity of w) and Med (w</p><p>A node w minimizing Rad (w ) and Med (w ) is a center and a median of the graph, respectively. By BC s,t (b) we denote the number of shortest s-t paths that have b as an internal node. In particular, BC s,s (b) = BC s,b (b) = BC b,t (b) = 0. Furthermore, BC s,t (b) &#8712; {0, 1} in the case of unique shortest paths. Notice that BC (b) = s,t &#8712;V BC s,t (v). <ref type="foot">4</ref> In the literature the betweenness centrality is sometimes defined differently as BC (b) = s,t &#8712;V -{b },s t &#963; s, t (b ) &#963; s, t , where &#963; s,t is the number of distinct shortest paths from s to t, and &#963; s,t (b) is the number of such paths that use node b as an intermediate node. Here when &#963; s,t = 0 (hence &#963; s,t (b) = 0),</p><p>&#963; s, t is assumed to be 0. Notice that this is equivalent to our definition in the case of unique shortest paths.</p><p>We remark that, in our subcubic reductions, it would be sufficient to preserve (modulo polylogarithmic factors) the number n of nodes only. However, whenever possible, we will also try to preserve (in the same sense) also m and M. In many cases we obtain extremely tight reductions that even allow us to obtain new faster algorithms, as is the case with Reach Centrality via our tight reduction to Diameter. In some claims we assume that a T (n, m) time, T (n, M ) time, or T (n, m, M ) time algorithm for some problem is given. In all those claims we implicitly assume that those running times are polynomial functions of the input parameters lower bounded by &#937;(m). This way, one has that</p><p>). We will use this fact multiple times along the article. We remark that this is without loss of generality since all the considered problems admit a polynomial-time algorithm in the mentioned parameters, and a lower bound of &#937;(m) is implied by the input size.</p><p>Throughout this article, with high probability (w.h.p.) means with probability at least 1 -1/n O (1) .</p><p>In some reductions involving Betweenness Centrality we will need to enforce the uniqueness of shortest paths. This can be enforced w.h.p. using the Isolation Lemma from <ref type="bibr">[46]</ref>. <ref type="foot">5</ref>Lemma 1.6 (Isolation Lemma <ref type="bibr">[46]</ref>). Consider a set system (U , S) over a universe U of h elements. Let us assign an integer weight w (i) &#8712; {1, . . . , q} chosen uniformly and independently at random to each i &#8712; U and define the weight of each set S &#8712; S as w (S ) = i &#8712;S w (i). Then there exists a unique set of minimum weight with probability at least 1h/q. Corollary 1.7. Let G = (V , E) be a directed or undirected graph with edge weights w : E &#8594; {0, . . . , M } and let c &#8805; 5 be an integer. Consider the random weight function w : E &#8594; {1, . . . , n c + n c+1 M } given by w (e) = n c+1 w (e) + r (e), where each r (e) &#8712; {1, . . . , n c } is chosen independently and uniformly at random (random perturbation). Then with probability at least 1 -1/n c-4 all shortest paths induced on G by weights w are unique. Furthermore, any such path is deterministically also a shortest path w.r.t. weights w.</p><p>Proof. Consider the directed case, the undirected one being analogous (with slightly better bounds). We first observe that deterministically any shortest path for (G, w ) has to be a shortest path also for (G, w ). Indeed, any such shortest path of length W in (G, w ) has length at most (n -1)n c +n c+1 W in (G, w ), while any non-shortest path would have length at least 1+n c+1 (W +1) in (G, w ).</p><p>For each pair of distinct nodes (a, b), we consider the set system (E, S ab ), where S ab is the set of shortest a-b paths in (G, w ) (interpreted as subsets of edges), of (common) length W . By the previous observation, any shortest a-b path in (G, w ) must belong to S ab . Define r (S ) = e &#8712;S r (e) for each S &#8712; S ab . The Isolation Lemma 1.6 implies that there exists exactly one S &#8712; S ab with minimum r (S ) with probability at least 1</p><p>deterministically for each S &#8712; S ab , this implies that there exists exactly one shortest path in S ab (hence in G) according to weights w with the same probability. The claim follows by applying the union bound over the possible pairs (a, b).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">SUBCUBIC EQUIVALENCE WITH APSP</head><p>In this section we prove the subcubic equivalence between APSP and the following problems: Radius, Median, and Betweenness Centrality. As mentioned in the introduction, reducing these problems to APSP is fairly straightforward and here we will focus on the opposite reductions.</p><p>We exploit Negative Triangle as an intermediate sub-problem: determine whether a given undirected graph G = (V , E), with integer edge weights w : E &#8594; {-M, . . . , M }, contains a triangle whose edges sum to a negative number; such a triangle is called a negative triangle. The latter problem was shown to be equivalent to APSP under subcubic reductions in <ref type="bibr">[64]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 2.1 ([64]). Negative Triangle and APSP (in directed or undirected graphs) are equivalent under subcubic reductions.</head><p>In order to simplify our proofs, we assume that the input instance of Negative Triangle satisfies the following properties:</p><p>(1) Path lengths are even. This can be achieved by multiplying the weights by a factor of 2.</p><p>(2) Any two nodes are connected by a path containing at most 2 edges. This can be achieved by adding a dummy node r , and n edges of weight 2M between r and any other node. Observe that no new negative triangle is created this way. (3) By appending at most n + 1 leaf nodes to r with edges of cost 2M, we can assume w.l.o.g.</p><p>that n is a power of 2.</p><p>These reductions can be performed in linear time; they increase the number of nodes by O (n), the number of edges by O (n), and the maximum absolute weight by a factor of 2. Therefore, any algorithm with (polynomial and at least linear in m) running time &#213; (T (n, m, M )) for the  Combining the reductions below with Lemma 2.1 proves Theorem 1.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Betweenness Centrality</head><p>We start with the reduction to Betweenness Centrality. We obtain slightly different results assuming that the algorithm for Betweenness Centrality works on general instances or only under the restriction that shortest paths are unique. Later when we talk about the case of non-unique shortest paths, we mean that the shortest paths might not be unique.</p><p>Lemma 2.2. Given a T (n, m) time algorithm for Betweenness Centrality in directed or undirected graphs in the case of non-unique (resp., unique) shortest paths, there exists a deterministic (resp., highprobability Monte-Carlo) &#213; (T (n, m)) time algorithm for Negative Triangle.</p><p>Proof. Let (G = (V , E), w ) be the input instance of Negative Triangle (reduced as described above). Let n = 2 k+1 be the number of nodes of G, for some non-negative integer k. We initially consider the case of non-unique shortest paths.</p><p>We start with the simpler directed case (see also Figure <ref type="figure">2</ref>). We construct a weighted directed graph (G , w ) as follows. Graph G contains four sets of nodes I , J , K, and L (layers). Each layer contains a copy of each node v &#8712; V . Let v I be the copy of v in I , and define analogously v J , v K , and v L . Let Q = &#920;(M ) be a sufficiently large integer. For each edge uv &#8712; E, we add to G the edges u I v J , u J v K , and u K v L and assign to those edges weight 2Q+w (uv). We add to G a dummy node b, and edges v I b and bv L for any v &#8712; V , of weight 3Q -1 and 3Q, respectively. We also add to G two sets of nodes Z = {z 0 , . . . , z k } and O = {o 0 , . . . , o k }. For any v &#8712; V , we add the following edges of weight 3Q -1 to G . Let v 0 , v 1 , . . . ,v k be a binary representation of v (interpreted as an integer between 0 and n -1 = 2 k +1 -1). For each j = 0, . . . , k, we add edges v I z j and o j v L if v j = 0, and edges v I o j and z j v L otherwise. We also add edges o j z j and z j o j of weight 3Q -1 for j = 0, . . . , k. Observe that k = O (log n); hence there are O (n log n) edges of the latter type.</p><p>On (G , w ) we compute BC (b) and output YES to the input Negative Triangle instance if and only if BC (b) &lt; n. Let us prove the correctness of this reduction. The only paths passing through b are of the form s I , b, t L and have weight 6Q -1. For s t, there must exist a node w &#8712; Z &#8746; O such that s I , w, t L is a path of cost 6Q -2. Therefore, the only pairs of nodes that can contribute to BC (b) are of the form (s I , s L ). The shortest path of type s I , v J , w K , s L has weight at most 6Q -2 if s belongs to a negative triangle, and at least 6Q otherwise. Therefore, BC s I ,s L (b) = 1 if s does not belong to any negative triangle, and BC s I ,s L (b) = 0 otherwise. The correctness follows.</p><p>In the undirected case, we use the same weighted graph (G , w ) as before, but removing edge directions (and leaving one copy of parallel edges). The rest of the reduction is as before, with the difference that now the answer is YES if and only if BC (v) &lt; 2n (the extra factor 2 here is due to the fact that there are potentially 2n shortest paths passing through b). Proving correctness requires a slightly more complicated case analysis. Consider any pair s, t &#8712; V -{b}. Suppose (s, t ) (I &#215; L) &#8746; (L &#215; I ). Then any s-t path passing through b costs at least 2(3Q -1) + (2Q -M ). On the other hand, any s &#8712; Z &#8746; O can reach any t &#8712; Z &#8746; O within distance 2(3Q -1), and any</p><p>there exists an s-t path of length at most 3(2Q + M ). It remains to consider the case that s = s I &#8712; I and t = t L &#8712; L. The path s I , b, t L has cost 6Q -1. If s t, analogously to the directed case there exists w &#8712; Z &#8746; O such that s I , w, t L is a path of weight 6Q -2. We can conclude that, like in the directed case, the only pairs that can contribute to BC (b) are of the form (s I , s L ). The shortest path of the form s I , v J , w k , s L has weight at most 6Q -2 if s belongs to a negative triangle, and at least 6Q otherwise. Any other path avoiding b contains at least 4 edges, and therefore costs at least 4(2Q -M ). We can conclude that BC s I ,s L (b) = 1 if s is not contained in a negative triangle of (G, w ), and BC s I ,s L (b) = 0 otherwise. The correctness follows.</p><p>It remains to consider the case of unique shortest paths. Observe that in the above reduction shortest paths are not necessarily unique. The latter property can, however, be enforced w.h.p. by modifying weights as in Corollary 1.7. Notice that this randomized reduction gives the right answer (at least) whenever shortest paths are unique; hence this happens w.h.p. Since weights increase by a polynomial factor in n while n and m are asymptotically preserved, the running time is &#213; (T (n, m)), as required.</p><p>We remark that in the reduction in Lemma 2.2 the blow-up of the weights happens only when we need to enforce the uniqueness of shortest paths. In particular, if we had a &#213; (T (n, m, M )) time algorithm for the variant of Betweenness Centrality not requiring such uniqueness, this would imply a &#213; (T (n, m, M )) time algorithm for Negative Triangle.</p><p>Proof of Theorem 1.3. One direction is obtained by chaining Lemmas 2.1 and 2.2. The other direction is trivial: simply solve APSP and count (in O (n 2 ) total time) how many pairs (s, t ), s, t &#8712;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Radius</head><p>Our reduction from Negative Triangle to Radius is similar to the one in Lemma 2.2. Consider the same construction when we remove the node b from the graph. The key observation is that a node s I has distance at most 6Q -2 to every node t L (including s L ) if and only if s is in a negative triangle in G. Intuitively, this allows us to show that an algorithm distinguishing between radius 6Q -2 and radius 6Q -1 can solve Negative Triangle. To complete the reduction we need to make sure  Proof. Let (G = (V , E), w ) be the considered instance of Negative Triangle (modified as described before). We start with the directed case (see also Figure <ref type="figure">3</ref>). Let Q = &#920;(M ) be a sufficiently large integer. We construct a directed weighted graph (G , w ) as follows. Similarly to the proof of Lemma 2.2, graph G contains four copies I , J , K, and L of the node set V (layers). Let v X be the copy of v &#8712; V in layer X . For each edge uv &#8712; E, we add to G edges u I v J , u J v K , and u K v L of weight Q + w (vu). We also add to G two sets of nodes Z = {z 0 , . . . , z k } and O = {o 0 , . . . , o k }. We add edges incident to nodes Z &#8746;O in the same way as in Lemma 2.2, using edges of cost Q. In more detail, let v 0 , v 1 , . . . ,v k be the binary representation of node v: we add the edges v I z j and o j v L if v j = 0, and the edges v I o j and z j v L otherwise. We also add edges z j o j and o j z j of weight Q for all j = 0, . . . , k. Finally, we add nodes x and y, and for any v &#8712; V we add edges v I x, xv I , and xv J of weight Q, and edges v I y of weight 3Q -1.</p><p>We compute the radius R * of (G , w ) and output YES to the input instance of Negative Triangle if and only if R * &#8804; 3Q-1. The running time of the algorithm is &#213;</p><p>Let us prove its correctness. We first observe that the center r * of the graph belongs to I &#8746; {x } since the other nodes cannot reach any node in I . Observe that d (x, y) = 4Q -1. On the other hand, any node s I is at distance at most 2Q to nodes in Z &#8746; O &#8746; J &#8746; {x } &#8746; (L -{s L }), at most 2Q + 2M to nodes in K (using the copy r J of the root node r ), and exactly 3Q -1 to node y. Note also that if s belongs to a negative triangle, there exists an s I -s L path of the form s I , v J , w K , s L with length at most 3Q -2. Otherwise, one shortest s I -s L path passes through nodes in Z &#8746; O and has length 3Q. We can conclude that the center of the graph belongs to I , and that the corresponding radius is upper bounded by 3Q -1 if and only if there exists a negative triangle in (G, w ). In the undirected case we use precisely the same construction, but removing edge directions (and leaving only one copy of parallel edges). The algorithm is analogous as well as its running time analysis. Its correctness can also be proved analogously. In more detail, similarly to the directed case, nodes in I can reach any other node within distance at most 3Q + 3M. Since d (y, x ) = 4Q -1, and d (s, y) &#8805; (3Q -1) + (Q -M ) for s I &#8746; {y}, we can conclude that r * &#8712; I . Also in this case, for any node s I , its maximum distance to any other node is d (s I , y) = 3Q -1 if s belongs to a negative triangle, and d (s I , s L ) &#8805; 3Q otherwise.</p><p>Proof of Theorem 1.1. One direction is trivial, and the other is given by Lemmas 2.1 and 2.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Median</head><p>The reduction to Median is based on a rather different approach. Lemma 2.4. Given a T (n, M ) time algorithm for Median in undirected or directed graphs, there exists a &#213; (T (n, M )) time algorithm for Negative Triangle.</p><p>Proof. Let (G = (V , E), w ) be the given instance of Negative Triangle. First, consider the directed case (see also Figure <ref type="figure">4</ref>). We create a weighted directed graph (G , w ). Graph G contains five copies A, B, B , C, C of V . With the usual notation, v A is the copy of v in A and similarly for the other sets. Let Q = &#920;(M ) be a large enough integer. For any pair of nodes u, v, we add the edges</p><p>In this construction, when uv E (including the special case u = v), we simply assume w (uv) = 2M. Furthermore, we add a dummy node r and edges rv A and v A r of weight Q/4 for any v &#8712; V .</p><p>In this graph we compute the median value Med and output YES to the input instance of Negative Triangle if and only if Med &lt; Q/4 + (n -1)Q/2 + 6nQ. The running time of the algorithm is</p><p>The median node has to be in A &#8746; {r } since the remaining nodes cannot reach r . Recall that, for a node w, Med (w ) := v &#8712;V d (w, v). Note that</p><p>In the first inequality above we lower bounded the distances to nodes in A, B, B , C, and C with</p><p>and Q/4 + 2Q, respectively. In the second inequality above we used the assumption that Q is sufficiently larger than M. On the other hand, for any node v A ,</p><p>Therefore, the median is in A. In the last inequality we upper bounded</p><p>Here a strict inequality holds if there exists a third node z B such that w</p><p>). Therefore, we can conclude that the strict inequality holds if and only if there exists a triangle {v, z, u} in G such that Q+w (vz) + Q+w (zu) &lt; 2Qw (vu), i.e., a negative triangle. The claim follows.</p><p>Consider next the undirected case. We construct the same weighted graph (G , w ) as in the directed case, but removing edge directions (and leaving one copy of parallel edges). The rest of the algorithm is as in the directed case, and the running time remains &#213; (T (n, M )). In order to prove correctness, we need a slightly more complicated case analysis. Like in the directed case,</p><p>, where a strict inequality holds if and only if v belongs to a negative triangle. For any u B &#8712; B,</p><p>Similarly,</p><p>We can conclude that the median is in A. The correctness follows.</p><p>Proof of Theorem 1.2. One direction is trivial, and the other is given by Lemmas 2.1 and 2.4.</p><p>Finally, we also prove a similar reduction for the following All-Nodes Median Parity problem: compute Med (v) (mod 2) for all nodes v. Lemma 2.5. Given a T (n, M ) time algorithm for the All-Nodes Median Parity problem in a directed or undirected graph, there exists a &#213; (T (n, M )) time algorithm for Negative Triangle.</p><p>Proof. Let (G = (V , E), w ) be the considered instance of Negative Triangle. Let us start with the directed case. Let Q = &#920;(M ) be a sufficiently large even integer. Similarly to the proofs of Lemmas 2.2 and 2.3 and with a similar notation, we construct a four-layer weighted directed graph (G , w ) with layers I , J , K, and L, and edges v I u J , v J u K , and v K u L of weight 2Q + w (vu) for any uv &#8712; E. We also introduce a fifth copy B of V , and for any v B &#8712; B we add edges v I v B and v B v L of weight 3Q and 3Q -1, respectively. We also add edges v I u B of weight 3Q + 3M + 2 for any u v. Finally, we add a node r , and edges v I r and rv I of weight Q for all v &#8712; V . Observe that the edges of type v B v L are the only edges of odd weight (by the preprocessing of the Negative Triangle instance).</p><p>In this graph we compute Med (v) (mod 2) for all v &#8712; V (G ) and we output YES to the input Negative Triangle instance if and only if Med (v I ) (mod 2) = 0 for some v I &#8712; I (i.e., some Med (v I ) is even). The running time is &#213; (T (O (n), O (M ))) = &#213; (T (n, M )). Let us prove correctness. Consider any v I &#8712; I . Any node is reachable from v I ; hence Med (v I ) is finite. Any path of type v I , u , u L , u v, cannot be a shortest path since it has length 6Q + 3M + 2 -1 while there exists a v I -u L path of length at most 6Q + 3M avoiding B. Therefore, the unique candidate shortest path of odd weight is v I , v , v L of length 6Q -1. However, by the usual argument, this is not a shortest path if v is contained in some negative triangle. The claim follows.</p><p>In the undirected case we can use the same graph (G , w ), but removing edge directions (and leaving one copy of parallel edges). The rest of the algorithm is the same and its analysis is analogous to the directed case. Corollary 2.6. Given a truly subcubic algorithm for All-Nodes Median Parity, there exists a truly subcubic algorithm for APSP.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">SUBCUBIC EQUIVALENCE BETWEEN REACH CENTRALITY AND DIAMETER</head><p>In this section we show that Diameter is equivalent to Reach Centrality under subcubic reductions. We start with the simple reductions from Diameter. For the undirected case, we use the same auxiliary weighted graph, but without edge directions (and leaving one copy of parallel edges). The algorithm is the same. The running time is &#213; ((m + T Observe that, if the answer is YES, there must be two nodes s, t &#8712; V -{b} such that some shortest s-t path passes through b, K + M &gt; d (s, b) &#8805; K, and K + M &gt; d (b, t ) &#8805; K. We construct an instance (G , w ) of Diameter as follows. We add to G a copy of G. Furthermore, we add a set of nodes A that contains a node v A for each node v &#8712; V such that K + M &gt; d (v, b) &#8805; K. Symmetrically, we add a set of nodes B that contains a node v B for each node v &#8712; V such that K + M &gt; d (b, v) &#8805; K. We also add edges v A v and vv B of weight K + Md (v, b) and K + Md (b, v), respectively. Note that the weight of the latter edges is in [1, M] by construction. Finally, we add a directed path P = v 0 , . . . ,v q , q = (2K + 2M -2)/M , whose edge weights are chosen arbitrarily in [1, M] so that the length of P is exactly 2K + 2M -2. For every v &#8712; V , we add edges vv 0 and v q v of weight zero. We also add edges av 0 of weight 1 and v q a of weight 0 for any a &#8712; A. Symmetrically, we add edges v q b of weight 1 and bv 0 of weight 0 for any b &#8712; B.</p><p>We compute the diameter D * of (G , w ) and output that RC (b) &#8805; K if and only if <ref type="figure">T (n</ref>, <ref type="figure">m</ref>, <ref type="figure">M</ref> )). Consider its correctness. The distance between any two nodes in G &#8746; P is at most 2K + 2M -2. The distance between any node in G &#8746; P and any other node is at most 2K + 2M -1. The distance between any node in B and any other node is at most 2K + 2M -1. The distance between any node in A and any node in G &#8746; P &#8746; A is at most 2K + 2M -1.</p><p>Consider next any pair s A &#8712; A and t B &#8712; B. An s A -t B path using P would cost at least 2K + 2M. A shortest s A -t B path avoiding Proof of Theorem 1.5. It follows from Lemma 3.2 by exploiting the &#213; (Mn &#969; ) time algorithm for Diameter in directed graphs in <ref type="bibr">[17]</ref>.</p><p>Notice that Lemma 3.2 works only for directed graphs. In the next section we will prove the following reduction, which works also for undirected graphs at a cost of not preserving asymptotically the edge weights. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">APPROXIMATION OF REACH AND BETWEENNESS CENTRALITY</head><p>In this section we present our results about the approximability of Reach and Betweenness Centrality. A key idea in our approach is to consider the following Positive Betweenness Centrality problem, which might be of independent interest: determine whether, for a given node b, there exists some shortest path using b as an intermediate node. We let PosBC (b) denote the answer to this problem (YES or NO).</p><p>The following two lemmas show that Approximate Betweenness and Reach Centrality are at least as hard as Positive Betweenness Centrality under subcubic reductions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 4.1. Given a T (n, m) time algorithm for Approximate Betweenness Centrality in the case of non-unique (resp., unique) shortest paths, there exists a deterministic (resp., high-probability Monte-Carlo) &#213; (T (n, m)) time algorithm for Positive Betweenness Centrality with non-unique (hence unique) shortest paths.</head><p>Proof. Let us initially modify the edge weights of the input Positive Betweenness Centrality instance as follows. We first multiply edge weights by 3n. Then we add 1 to the weights of edges incident to b (considering both ingoing and outgoing edges for directed graphs), and we add 3 to all other edges. Let w be the new edge weights. Observe that any shortest path w.r.t. w is also a shortest path w.r.t. w by an argument similar to Corollary 1.7. In more detail, let W be the length of an a-c shortest path for some pair of distinct nodes a and c w. Otherwise, we first randomly perturb the weights w of the input Positive Betweenness Centrality instance as in Corollary 1.7. Let w be the perturbed weights. Next assume that shortest paths are unique w.r.t. weights w , which happens w.h.p., and let BC (b) be the value of BC (b) w.r.t. weights w . Then we apply the approximation algorithm for Betweenness Centrality and declare PosBC (b) = NO if and only if the approximate value is 0. Clearly the running time is as in the claim since m and n are preserved, while the largest edge weight is increased by a polynomial factor in n. By the above arguments, if PosBC (b) = NO, it must be the case that BC (b) = 0 since the perturbation from Corollary 1.7 does not create alternative shortest paths using b. Hence the approximate algorithm would return 0. Otherwise, there will be some pair (a, c) such that all shortest a-c paths w.r.t. weights w use node b; hence one such path will cause BC (b) &gt; 0. Therefore, the approximation algorithm has to return a positive value. We apply the approximation algorithm for Reach Centrality to the resulting instance and return PosBC (b) = NO if and only if the answer is 0. The running time satisfies the claim since m and n are preserved, while the largest edge weight is increased by a polynomial factor in n. For the correctness, observe that PosBC (b) = PosBC (b) = NO implies that RC (b) = 0; hence the approximation algorithm has to return 0. Otherwise, since all weights are at least 1, the mentioned pair (a, c) guarantees that RC (b) &#8805; 1; hence the approximation algorithm has to return a positive value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Some Results on Positive Betweenness Centrality</head><p>A simple observation is that on unweighted graphs, Positive Betweenness Centrality is asking whether there is an in-neighbor x of b and an out-neighbor y of b such that xy E, and therefore can be solved in O (m) time. We next show that, on weighted graphs, Positive Betweenness Centrality and Diameter are equivalent under subcubic reductions. Proof. Let us focus on the deterministic case, the other case being analogous. This proof is similar in spirit to the proof of Lemma 3.1. Let (G = (V , E), w ) be the input instance of Diameter, where M is the largest integer weight. Consider first the directed case (see also Figure <ref type="figure">5</ref>). Let D be an integer in [1, (n-1)M]. Let (G , w (D)) denote the auxiliary weighted graph consisting of a copy of (G, w ) plus a dummy node b and dummy edges vb and bv of weight<ref type="foot">foot_5</ref> D/2 for any v &#8712; V . Observe that any pair of nodes s, t &#8712; V is connected by a path of length D using b. By performing a binary search on D and solving each time the resulting instance (G , w (D), b) of Positive Betweenness Centrality, we determine the largest value D of D such that the answer is YES (i.e., BC (b) &gt; 0). The output value of the diameter is D . The running time of the algorithm is &#213; ((m +T (n +1, 2n +m)) log(nM )) = &#213; (T (n, m)). Let (s * , t * ) be a witness pair for the diameter D * . In any execution where D * &#8805; D, there exists a shortest s *t * path using node b and hence the answer is YES. In any other execution (where D * &lt; D), any shortest s-t path avoiding b has length at most D * &#8804; D -1, while passing through b would cost at least D (thus the answer is NO). The correctness of the algorithm follows.</p><p>For the undirected case, we use the same auxiliary weighted graph, but without edge directions (and leaving one copy of parallel edges). The algorithm and its analysis are analogous to the directed case. Lemma 4.5. Given a T (n, m, M ) time algorithm for Diameter in directed or undirected graphs, there is a &#213; (T (n, m, M )) time algorithm for Positive Betweenness Centrality with non-unique (hence unique) shortest paths in the same graph class.</p><p>Proof. Let (G, w, b) be the input instance of Positive Betweenness Centrality. Observe that the answer is YES if and only if there exists a shortest path of the form s, b, t.</p><p>Let us consider the directed case first. By adding a dummy node r and dummy edges vr and rv of weight M for any v &#8712; V -{b}, we can assume that the diameter of G is at most D = 3M (w.l.o.g., b has at least one in-neighbor and one out-neighbor). Note that we did not introduce new paths of the form s, b, t. Furthermore, the new graph has n + 1 nodes, m + 2n edges, and maximum weight M. Hence a &#213; (T (n, m, M )) time algorithm for the modified instance implies the same running time for the original one.</p><p>We construct an instance (G , w ) of Diameter as follows (see also Figure <ref type="figure">5</ref>). Initially G = G. We add a copy A of V . Let v A be the copy of v &#8712; V . For every v &#8712; V , we add edges v A v and vv A of weight D + 1w (vb) and D + 1w (bv), respectively. If edges vb or bv are missing (including the case v = b), we set the weight of the corresponding edges v A v and vv A , respectively, to 0. Observe that edge weights are O (M ).</p><p>In this graph we compute the diameter D * and output YES to the input Positive Betweenness Centrality instance if and only if</p><p>Consider a witness pair s * , t * for the value of the diameter. copies I , J , K, L, and B of the node set V . With the usual notation v X is the copy of node v &#8712; V in set X . Let Q = &#920;(M ) be a sufficiently large integer. For every edge uv &#8712; E we add the edges u I v J , u J v K , u K v L to G and set their weight to 2Q+w (uv). We also add edges u I u B and u B u L for every node u in G and set the weight of these edges to 3Q.</p><p>The algorithm solves the All-Nodes Positive Betweenness Centrality problem on (G , w ) in time &#213; (T (n, m, M )) and outputs YES to the input Negative Triangle instance if and only if BC (u B ) &gt; 0 for some u B &#8712; B. To show correctness, observe that the only path through u B is from u I to u L and it has weight 6Q, while every path of type u I , v J , w K , u L corresponds to a triangle {u, v, w } in G and the weight of the path equals the weight of the triangle plus 6Q. The claim follows.</p><p>The same construction, without edge directions, proves the claim for undirected graphs.</p><p>Corollary 4.8. Given a truly subcubic algorithm for All-Nodes Approximate Reach Centrality or for All-Nodes Approximate Betweenness Centrality with non-unique shortest paths, there exists a truly subcubic algorithm for APSP.</p><p>Proof. In case of strictly positive weights, a truly subcubic algorithm for All-Nodes Approximate Reach Centrality or for All-Nodes Approximate Betweenness Centrality with non-unique shortest paths directly implies a truly subcubic algorithm for All-Nodes Positive Betweenness Centrality with non-unique shortest paths (the answer for a node b is YES if and only if the associate approximate value is strictly positive). Notice that in the reduction of Lemma 4.7 all weights are positive; hence this implies a truly subcubic algorithm for Negative Triangle. The claim follows by the subcubic equivalence between Negative Triangle and APSP <ref type="bibr">[64]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">A PTAS for Betweenness Centrality</head><p>In this section we prove the subcubic equivalence between Approximate Betweenness Centrality and Diameter. Theorem 4.9. Diameter and Approximate Betweenness Centrality with unique shortest paths are equivalent under subcubic high-probability Monte-Carlo reductions.</p><p>The main result in this section is the proof of the following lemma. Lemma 4.10. Given a truly subcubic algorithm for Diameter, there exists a truly subcubic highprobability Monte-Carlo PTAS for Betweenness Centrality with unique shortest paths.</p><p>We recall that a PTAS for the problem of estimating a value X is an algorithm that takes in input an instance of the problem and a parameter &#949; &gt; 0 and outputs a (1 + &#949; ) approximation x or X , i.e., 1 1+&#949; X &#8804; x &#8804; (1 + &#949;)X . Furthermore, the running time of the algorithm is polynomial whenever &#949; is lower bounded by some constant. The proof of Theorem 4.9 follows easily.</p><p>Proof of Theorem 4.9. Lemma 4.10 gives one direction. The other direction is obtained by chaining Lemmas 4.1 and 4.4.</p><p>It remains to prove Lemma 4.10. Let (G, w, b) be the considered instance of Betweenness Centrality, and define B * = BC (b). Observe that, under the assumption that shortest paths are unique, BC s,t (b) &#8712; {0, 1} and therefore B * &#8712; {0, . . . , (n-1)(n-2)}. Given s, t &#8712; V -{b} such that BC s,t (b) = 1, we call (s, t ) a witness pair, s a witness source, and t a witness target (of BC (b)).</p><p>Let also B med &#8712; {0, . . . , (n -1)(n -2)} be an integer parameter to be fixed later. Our PTAS is based on two different algorithms: one for B * &#8804; B med (small B * ) and the other for B * &gt; B med (large B * ).  Proof. We use a construction similar to the one in the proof of Lemma 4.5 (see also Figure <ref type="figure">6</ref>). Let (G, w, b, S,T ) be the considered instance of Positive (S,T )-Betweenness Centrality.</p><p>We start with the directed case. Let us construct a directed weighted graph (G , w ). Graph G contains a copy of G. Furthermore, it contains a copy S of S and a copy T of T . Let v S be the copy of node v in S, and define v T analogously. Let K := 2 + A, where A is the maximum distance of type d G (s, b) and d G (b, t ), with s &#8712; S and t &#8712; T . For each s &#8712; S and t &#8712; T , we add edges s S s and tt T of weight Kd G (s, b) and Kd G (b, t ), respectively. We add one dummy node r (resp., r ) and bidirected <ref type="foot">7</ref> edges r v for all v &#8712; S &#8746; V (resp., r v for all v &#8712; T &#8746; V ). We also add edges r v for each v &#8712; S (in particular these edges are not bidirected). Finally, we add bidirected edges r r . All edges incident on r and r have weight K -1 (dummy edges). We compute the diameter D * of (G , w ) and output YES if and only if D * = 2K.</p><p>The running time of the algorithm is &#213; (m</p><p>Let us prove its correctness. Let s * , t * be a witness pair for the diameter. If s * &#8712; V &#8746;T &#8746; {r , r }, then D * &#8804; 2(K -1). Hence we can assume s * = s S &#8712; S for some s &#8712; S. If t * &#8712; S &#8746; V &#8746; {r , r }, then D * &#8804; 2(K -1). So we can also assume t * = t T &#8712; T .</p><p>Any s S -t T path using dummy edges has to use at least two such edges. If it uses three such edges, it costs at least 3(K -1) &gt; 2K. Otherwise, it costs at least</p><p>, where the equality holds if and only if b belongs to some shortest s-t path in G. Summarizing, if there exists a shortest s-t path passing through b (in which case the answer is YES), then the diameter is 2K. Otherwise, the diameter is at most 2K -1. source (resp., witness target) if BC w,V &gt; 0 (resp., BC V ,w &gt; 0). At high level, our algorithm is based on the computation of the contribution BC s,V to BC of a random sample of candidate witness sources s. Then we exploit Chernoff's bound to prove that the approximation factor is small w.h.p. One technical difficulty here is that some witness sources might give a very large contribution to BC, which is problematic since we need concentrated results. In order to circumvent this problem, we first sample a random subset of candidate witness targets to identify the problematic witness sources (which are considered separately).</p><p>In more detail, we sample a random subset T of p med</p><p>&#8226; n nodes, where p med = C log n &#8730; B med and C is a sufficiently large constant (more precisely C = O (1/&#949; 2 ) is sufficient). We compute all the shortest paths ending in T and use them to derive BC s,T for all s &#8712; V . We partition V into sets S lar&#1076;e and S small , where s &#8712; V belongs to S lar&#1076;e if and only if BC s,T &#8805; C log n. Then we sample a random subset R small of p med |S small | nodes in S small and compute BC s,V for all s &#8712; R small . Finally, we output the estimate B = 1 p med s &#8712;S l ar &#1076;e BC s,T + s &#8712;R small BC s,V . It is easy to see that the running time of the algorithm is &#213; ( Cnm &#8730; B med ). It is also not hard to see that E[ 1 p med s &#8712;S l ar &#1076;e BC s,T ] = s &#8712;S l ar &#1076;e BC s,V and E[ 1 p med s &#8712;R small BC s,V ] = s &#8712;S small BC s,V . Therefore, E[ B] = B * . The following lemma shows that B is concentrated around its mean. Lemma 4.14. For C = O (1/&#949; 2 ) large enough, w.h.p. B &#8712; [(1 -2&#949; )B * , (1 + 2&#949; )B * ]. Proof. We start by showing that w.h.p., for any s &#8712; V , if s &#8712; S lar&#1076;e , then BC s,V &#8805; &#8730; B med /(1 +&#949; ), and otherwise BC s,V &#8804; &#8730; B med /(1&#949; ). Define B = BC s,T and B = BC s,V . Note that E[B ] = C log n &#8730; B med B. Note also that B = BC s,T = t &#8712;V X s,t , where X s,t = 0 if t T and X s,t = BC s,t otherwise. Since the variables X s,t are negatively correlated, we can apply Chernoff's bound to BC s,T . In particular, conditioning implicitly on B &lt; &#8730; B med 1+&#949; , one obtains</p><p>Above we used the fact that the function xe 1-x is increasing for x &#8712; [0, 1 1+&#949; ] (and strictly smaller than 1 in the same range). Similarly, conditioning implicitly on the event that B &gt;</p><p>Thus summarizing, for a fixed s,</p><p>From the union bound and assuming that the constant C = O (1/&#949;</p><p>2 ) is large enough, we conclude that w.h.p. for all s &#8712; S lar&#1076;e one has BC s,V &#8805; &#8730; B med /(1 + &#949; ) and for all s &#8712; S small one has BC s,V &#8804; &#8730; B med /(1&#949; ). Next assume that the mentioned high-probability event happens for all s &#8712; V . Define B * lar&#1076;e = s &#8712;S l ar &#1076;e BC s,V and B * small = s &#8712;S small BC s,V . Clearly B * = B * lar&#1076;e + B * small . Define also Blar&#1076;e := 1 p med s &#8712;S l ar &#1076;e BC s,T and Bsmall := 1 p med s &#8712;R small BC s,V , so that B = Blar&#1076;e + Bsmall . Consider any s &#8712; S lar&#1076;e , and define B = BC s,T and B = BC s,V . Recall that by assumption B &#8805; &#8730; B med 1+&#949; and observe that E[B ] = p med B &#8805; C log n 1+&#949; . Then, by Chernoff's bound,</p><p>Since E[ Blar&#1076;e ] = 1 p med [ s &#8712;S l ar &#1076;e BC s,T ] = B * lar&#1076;e , we can conclude that w.h.p. Blar&#1076;e &#8712; [(1&#949;)B * lar&#1076;e , (1 + &#949;)B * lar&#1076;e ]. Consider next Bsmall . Define B = p med Bsmall = s &#8712;R small BC s,V . Observe that E[B ] = p med B * small . Furthermore, B is the sum of independent random variables each one of value at most &#8730; B med 1-&#949; by the assumption on S small . Therefore, by Chernoff's bound,</p><p>small &#8805; &#949;B med /2 and observing that B * &#8805; B * small , one obtains</p><p>Otherwise, B * small &lt; &#949;B med /2 &#8804; &#949;B * /2 and thus</p><p>Similarly,</p><p>Therefore, w.h.p. Bsmall &#8712; [B * small -&#949;B * , B * small + &#949;B * ]. Altogether, w.h.p. one has</p><p>The following lemma summarizes the above discussion. Lemma 4.15. Given an instance (G, w, b) of Betweenness Centrality with unique shortest paths and BC</p><p>) time algorithm that returns a (1 + &#949; ) approximation of B * w.h.p.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proof. Consider the above algorithm. Its running time is &#213; (</head><p>. By Lemma 4.14, the estimate B of B * that it outputs satisfies the claim (modulo scaling &#949; by a constant factor).</p><p>Combining the algorithms for small and large B * , we obtain Lemma 4.10.</p><p>Proof of Lemma 4.10. Let &#213; (n 3-&#948; ) be the running time of the given Diameter algorithm, for some constant &#948; &gt; 0. From Lemmas 4.13 and 4.15, we can use it to compute w.h.p. a (1 + &#949; ) approximation of the betweenness centrality of a given node in time &#213; (B med n</p><p>). Choosing B med = n 2&#948; /3 &#949; 4/3 gives a truly subcubic running time in &#213; ( n 3-&#948; /3 &#949; 4/3 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Reductions Based on SETH</head><p>We are able to show that, assuming the SETH <ref type="bibr">[40]</ref>, a subquadratic algorithm for Positive Betweenness Centrality does not exist even in sparse graphs. We recall that SETH claims that CNF-SAT on n variables cannot be solved in time O ((2&#948; ) n ) for any constant &#948; &gt; 0. One obtains as a corollary a lower bound on the running time of any approximation algorithm for Betweenness/Reach Centrality by the reductions in Lemmas 4.1 and 4.2.</p><p>Theorem 4.16. Suppose that there is an O (m 2-&#949; ) time algorithm, for some constant &#949; &gt; 0, that solves Positive Betweenness Centrality with non-unique shortest paths in directed or undirected graphs with edge weights in {1, 2}. Then SETH is false.</p><p>Proof. Let F be a CNF-SAT formula on n variables. Our goal is to show that we can determine whether F is satisfiable in O * (2 (1-&#948; )n ) time <ref type="foot">8</ref> for some constant &#948; &gt; 0. Using the sparsification lemma of <ref type="bibr">[40]</ref> (as, e.g., in <ref type="bibr">[14]</ref>), we can assume w.l.o.g. that F contains O (n) clauses.</p><p>Let us consider the undirected case first (see also Figure <ref type="figure">7</ref>). We partition the variables into two sets A and B, which differ by at most 1 in cardinality, and create a node &#981; A (resp., &#981; B ) for each partial assignment &#981; A of the variables in A (resp., &#981; B of the variables in B). We also add a node for each clause c and add one edge of weight 1 between each clause c and any partial assignment &#981; of A or B that does not satisfy any literal of c (including the special case that c does not contain any variable in A or B). We also add two nodes x A and x B and add one edge of weight 1 between them and any node in A and B, respectively. Finally, we add a node b and add one edge of weight 2 between b and each assignment of A and B. The algorithm returns YES (i.e., F is satisfiable) if and only if BC (b) &gt; 0.</p><p>Let us prove correctness. The distance between any clause node c and any other node is at most 4, while any path passing through b would cost at least 5. Hence the corresponding shortest paths do not use b. The same claim holds for x A and x B . The distance between any two assignments of A or of B is at most 2, while passing through b would cost at least 4. Hence also the corresponding shortest paths do not use b. It remains to consider shortest paths from some node of type &#981; A to some node of type &#981; B . Observe that there exists one such path of length 2 (hence BC &#981; A ,&#981; B (b) = 0) if and only if there exists a clause c that is not satisfied by &#981; A or by &#981; B . Otherwise (i.e., &#981; A and &#981; B together satisfy F ), &#981; A , b, &#981; B is a shortest such path (hence BC (b) &gt; 0). The graph has O (2 n/2 n) edges, leading to a running time of the form O * (2 (1-&#949; /2)n ). The claim follows.</p><p>In the directed case we use a similar construction (with a similar notation), without nodes x A and x B , and orienting the edges from the assignments of A to the clause nodes and to b, and from the latter nodes to the assignments of B. The algorithm is the same. The proof of correctness is simpler: the only shortest paths that can use b are from a node of type &#981; A to a node of type &#981; B . Similarly to the undirected case, &#981; A and &#981; B together satisfy F if and only if &#981; A , b, &#981; B is a shortest path (hence BC (b) &gt; 0). Also in this case the running time is O * (2 (1-&#949; /2)n ), implying the claim. Corollary 4.17. Suppose that there is an O (m 2-&#949; ) time algorithm for Approximate Betweenness Centrality with non-unique shortest paths or for Approximate Reach Centrality, for some constant &#949; &gt; 0. Then SETH is false. For Reach Centrality we can also show an approximation lower bound for unweighted undirected graphs. Similarly to the proof of Theorem 4.16, we partition the variables into two subsets A and B that differ by at most 1 in cardinality and create a node for each partial assignment of the variables in A and B. We also create a node c for each clause c and connect c to each partial assignment that does not satisfy any literal in c. We also add nodes x A and x B and add edges between them and any node in A and B, respectively. Finally, we add a node b and connect it to x A and x B (note that the final part of the construction deviates from Theorem 4.16).</p><p>To show correctness, note that b is on the shortest path between x A and x B and therefore RC (b) &#8805; 1. Furthermore, b cannot be on the shortest path between a clause node c and another node in G, and therefore RC (b) = 2 if and only if b is on the shortest path between an assignment &#981; A of A and an assignment &#981; B of B. But a shortest path between &#981; A and &#981; B goes through b if and only if for every clause node c either &#981; A c is not an edge or &#981; B c is not an edge. By definition of these edges, this implies that for every clause c, either &#981; A or &#981; B satisfies c (i.e., &#981; A and &#981; B induce a satisfying assignment of F ). The claim follows.</p><p>As observed by one careful reviewer, the above reductions can be adapted to the Orthogonal Vector Conjecture (OVC). In the Orthogonal Vector (OV) problems we are given a set on n binary vectors of dimension D = O (log n). The goal is to determine whether there exists a pair of orthogonal vectors in the set. OVC states that there is no O (n 2-&#948; ) time algorithm for OV where &#948; &gt; 0 is a fixed constant. We remark that SETH implies OVC; i.e., OVC is a stronger conjecture <ref type="bibr">[60]</ref>. Our reductions can be adapted as follows. For each vector v we create a node v A in the set A (resp., v B in the set B). The set C contains one node w for each dimension/entry w. We connect each vector node v A &#8712; A (resp., v B &#8712; B) to each dimension node w such that the wth entry of v is 1. Now a length 2 path between v A &#8712; A and u B &#8712; B through a node in C means that the vectors v and u are not orthogonal. The rest of the construction is similar. The simple details are left to the reader.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">CONCLUSIONS AND OPEN PROBLEMS</head><p>There are many interesting problems that we left open. The main one is probably whether Diameter and APSP are equivalent under subcubic reductions. By our reductions, on one hand a positive answer would indicate that truly subcubic algorithms for Reach Centrality and for Approximate Betweenness Centrality are unlikely to exist. On the other hand, a negative answer would give truly subcubic algorithms for the latter problems as well.</p><p>We have shown that Reach Centrality can be solved in &#213; (Mn &#969; ) time in directed graphs, improving on the previous best algorithm based on APSP. Similar running times are known for Diameter and Radius <ref type="bibr">[17]</ref>. To the best of our knowledge, it is open whether an &#213; (Mn &#969; ) time algorithm exists also for Median and Betweenness Centrality in directed graphs.</p><p>We proved that a subquadratic 2&#949; approximation algorithm for Reach Centrality in sparse graphs is unlikely to exist. In <ref type="bibr">[2,</ref><ref type="bibr">52]</ref> analogous results are proved for Diameter and Radius. It would be interesting to show similar negative results for Betweenness Centrality and Median (or find faster approximation algorithms in sparse graphs for those problems).</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>Another slightly different definition of the problem is used in the literature; this is discussed later.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1"><p>We recall that a truly subcubic algorithm is an algorithm with running time &#213; (n 3-&#948; ) for some constant &#948; &gt; 0.ACM Transactions on Algorithms, Vol. 19, No. 1, Article 3. Publication date: February 2023.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>ACM Transactions on Algorithms, Vol. 19, No. 1, Article 3. Publication date: February 2023.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>We remark that the s-t pairs are ordered; in particular, in undirected graphs shortest s-t paths are counted twice.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4"><p>In<ref type="bibr">[46]</ref> the lemma is stated in a slightly less general form, but the proof extends trivially. ACM Transactions on Algorithms, Vol. 19, No. 1, Article 3. Publication date: February 2023.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5"><p>Fractional weights can be avoided similarly to the proof of Lemma 3.1. ACM Transactions on Algorithms, Vol. 19, No. 1, Article 3. Publication date: February 2023.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6"><p>By a bidirected edge uv of weight w , we mean a directed edge uv and a directed edge vu, both of weight w . ACM Transactions on Algorithms, Vol. 19, No. 1, Article 3. Publication date: February 2023.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7"><p>The O * notation suppresses polynomial factors. ACM Transactions on Algorithms, Vol. 19, No. 1, Article 3. Publication date: February 2023.</p></note>
		</body>
		</text>
</TEI>
