Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared to meet varied demands across different domains. As the starting point for the present work, we observe that two recently introduced families of iterative partitioners---those based on restreaming and those based on balanced label propagation (including Facebook's Social Hash Partitioner)---can be viewed through a common modular framework of design decisions. With the help of this modular perspective, we find that a key combination of design decisions leads to a novel family of algorithms with notably better empirical performance than any existing highly-scalable algorithm on a broad range of real-world graphs. The resulting prioritized restreaming algorithms employ a constraint management strategy based on multiplicative weights, borrowed from the restreaming literature, while adopting notions of priority from balanced label propagation to optimize the ordering of the streaming process. Our experimental results consider a range of stream orders, where a dynamic ordering based on what we call ambivalence is broadly the most performative in terms of the cut quality of the resulting balanced partitions, with a static ordering based on degree being nearly as good.
more »
« less
More Recent Advances in (Hyper)Graph Partitioning
In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the past decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic [29]. In particular, the survey extends the previous survey by also covering hypergraph partitioning and has an additional focus on parallel algorithms.
more »
« less
- Award ID(s):
- 1919021
- PAR ID:
- 10473712
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- ACM Computing Surveys
- Volume:
- 55
- Issue:
- 12
- ISSN:
- 0360-0300
- Page Range / eLocation ID:
- 1 to 38
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to most of the previous work, we study the multi-dimensional variant in which balance according to multiple weight functions is required. As we demonstrate by experimental evaluation, such multi-dimensional balance is essential for achieving performance improvements for typical distributed graph processing workloads. We propose a new scalable technique for the multidimensional balanced graph partitioning problem. It is based on applying randomized projected gradient descent to a non-convex continuous relaxation of the objective. We show how to implement the new algorithm efficiently in both theory and practice utilizing various approaches for the projection step. Experiments with large-scale graphs containing up to hundreds of billions of edges indicate that our algorithm has superior performance compared to the state of the art.more » « less
-
Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that a data-driven approach to parameter tuning can lead to significant improvements in performance. This approach uses atraining setof problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide techniques for derivinggeneralization guaranteesthat bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research [e.g.,12,16,20,62] has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We streamline these analyses with a general theorem that applies whenever an algorithm’s performance is a piecewise-constant, piecewise-linear, or—more generally—piecewise-structuredfunction of its parameters. Our results, which are tight up to logarithmic factors in the worst case, also imply novel bounds for configuring dynamic programming algorithms from computational biology.more » « less
-
This paper introduces a novel concept of intelligent partitioning for group-based distributed optimization (DO) algorithms applied to optimal power flow (OPF) problems. Radial partitioning of the graph of a network is introduced as a systematic way to split a large-scale problem into more tractable sub-problems, which can potentially be solved efficiently with methods such as convex relaxations. Spectral parameter selection is introduced for group-based DO as a crucial hyper-parameter selection step in DO. A software package DiCARP is created, which is implemented in Python using the Pyomo optimization package. Through several numerical examples, we compare the proposed group-based algorithm to component-based approaches, evaluate our radial partitioning method against other partitioning strategies, and assess adaptive parameter selection in comparison to non-adaptive methods. The results highlight the critical role of effective partitioning and parameter selection in solving large-scale network optimization problems.more » « less
-
null (Ed.)We consider the classical Minimum Balanced Cut problem: given a graph $$G$$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $$n$$-vertex $$m$$-edge graph $$G$$ and any parameter $$1\leq r\leq O(\log n)$$, computes a $$(\log m)^{r^2}$$-approximation for Minimum Balanced Cut on $$G$$, in time $$O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$$. In particular, we obtain a $$(\log m)^{1/\epsilon}$$-approximation in time $$m^{1+O(1/\sqrt{\epsilon})}$$ for any constant $$\epsilon$$, and a $$(\log m)^{f(m)}$$-approximation in time $$m^{1+o(1)}$$, for any slowly growing function $$m$$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $$G$$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $$n$$-vertex graph is $$n^{o(1)}$$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $$n$$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.more » « less
An official website of the United States government

