skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: fastball: a fast algorithm to randomly sample bipartite graphs with fixed degree sequences
Abstract Many applications require randomly sampling bipartite graphs with fixed degrees or randomly sampling incidence matrices with fixed row and column sums. Although several sampling algorithms exist, the ‘curveball’ algorithm is the most efficient with an asymptotic time complexity of $O(n~log~n)$ and has been proven to sample uniformly at random. In this article, we introduce the ‘fastball’ algorithm, which adopts a similar approach but has an asymptotic time complexity of $O(n)$. We show that a C$$\texttt{++}$$ implementation of fastball randomly samples large bipartite graphs with fixed degrees faster than curveball, and illustrate the value of this faster algorithm in the context of the fixed degree sequence model for backbone extraction.  more » « less
Award ID(s):
2211744
PAR ID:
10384730
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of Complex Networks
Volume:
10
Issue:
6
ISSN:
2051-1329
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The maximum bipartite matching problem is among the most fundamental and well-studied problems in combinatorial optimization. A beautiful and celebrated combinatorial algorithm of Hopcroft and Karp [26] shows that maximum bipartite matching can be solved in O(m√n) time on a graph with n vertices and m edges. For the case of very dense graphs, a different approach based on fast matrix multiplication was subsequently developed [27, 39], that achieves a running time of O(n2.371). For the next several decades, these results represented the fastest known algorithms for the problem until in 2013, a ground-breaking work of Madry [36] gave a significantly faster algorithm for sparse graphs. Subsequently, a sequence of works developed increasingly faster algorithms for solving maximum bipartite matching, and more generally directed maximum flow, culminating in a spectacular recent breakthrough [9] that gives an m1+o(1) time algorithm for maximum bipartite matching (and more generally, for min cost flows). These more recent developments collectively represented a departure from earlier combinatorial approaches: they all utilized continuous techniques based on interior-point methods for solving linear programs. This raises a natural question: are continuous techniques essential to obtaining fast algorithms for the bipartite matching problem? Our work makes progress on this question by presenting a new, purely combinatorial algorithm for bipartite matching, that, on moderately dense graphs outperforms both Hopcroft- Karp and the fast matrix multiplication based algorithms. Similar to the classical algorithms for bipartite matching, our approach is based on iteratively augmenting a current matching using augmenting paths in the (directed) residual flow network. A common method for designing fast algorithms for directed flow problems is via the multiplicative weights update (MWU) framework, that effectively reduces the flow problem to decremental single-source shortest paths (SSSP) in directed graphs. Our main observation is that a slight modification of this reduction results in a special case of SSSP that appears significantly easier than general decremental directed SSSP. Our main technical contribution is an efficient algorithm for this special case of SSSP, that outperforms the current state of the art algorithms for general decremental SSSP with adaptive adversary, leading to a deterministic algorithm for bipartite matching, whose running time is Õ(m1/3n5/3). This new algorithm thus starts to outperform the Hopcroft-Karp algorithm in graphs with m = ω(n7/4), and it also outperforms the fast matrix multiplication based algorithms on dense graphs. We believe that this framework for obtaining faster combinatorial algorithms for bipartite matching by exploiting the special properties of the resulting decremental SSSP instances is one of the main conceptual contributions of our work that we hope paves the way for even faster combinatorial algorithms for bipartite matching. Finally, using a standard reduction from the maximum vertex-capacitated s-t flow problem in directed graphs to maximum bipartite matching, we also obtain an O(m1/3n5/3) time deterministic algorithm for maximum vertex-capacitated s-t flow when all vertex capacities are identical. 
    more » « less
  2. null (Ed.)
    We present an $$\tilde O(m+n^{1.5})$$-time randomized algorithm for maximum cardinality bipartite matching and related problems (e.g. transshipment, negative-weight shortest paths, and optimal transport) on $$m$$-edge, $$n$$-node graphs. For maximum cardinality bipartite matching on moderately dense graphs, i.e. $$m = \Omega(n^{1.5})$$, our algorithm runs in time nearly linear in the input size and constitutes the first improvement over the classic $$O(m\sqrt{n})$$-time [Dinic 1970; Hopcroft-Karp 1971; Karzanov 1973] and $$\tilde O(n^\omega)$$-time algorithms [Ibarra-Moran 1981] (where currently $$\omega\approx 2.373$$). On sparser graphs, i.e. when $$m = n^{9/8 + \delta}$$ for any constant $$\delta>0$$, our result improves upon the recent advances of [Madry 2013] and [Liu-Sidford 2020b, 2020a] which achieve an $$\tilde O(m^{4/3+o(1)})$$ runtime. We obtain these results by combining and advancing recent lines of research in interior point methods (IPMs) and dynamic graph algorithms. First, we simplify and improve the IPM of [v.d.Brand-Lee-Sidford-Song 2020], providing a general primal-dual IPM framework and new sampling-based techniques for handling infeasibility induced by approximate linear system solvers. Second, we provide a simple sublinear-time algorithm for detecting and sampling high-energy edges in electric flows on expanders and show that when combined with recent advances in dynamic expander decompositions, this yields efficient data structures for maintaining the iterates of both [v.d.Brand~et~al.] and our new IPMs. Combining this general machinery yields a simpler $$\tilde O(n \sqrt{m})$$ time algorithm for matching based on the logarithmic barrier function, and our state-of-the-art $$\tilde O(m+n^{1.5})$$ time algorithm for matching based on the [Lee-Sidford 2014] barrier (as regularized in [v.d.Brand~et~al.]). 
    more » « less
  3. Data augmentation is widely used for training a neural network given little labeled data. A common practice of augmentation training is applying a composition of multiple transformations sequentially to the data. Existing augmentation methods such as RandAugment randomly sample from a list of pre-selected transformations, while methods such as AutoAugment apply advanced search to optimize over an augmentation set of size $k^d$, which is the number of transformation sequences of length $$d$$, given a list of $$k$$ transformations. In this paper, we design efficient algorithms whose running time complexity is much faster than the worst-case complexity of $O(k^d)$, provably. We propose a new algorithm to search for a binary tree-structured composition of $$k$$ transformations, where each tree node corresponds to one transformation. The binary tree generalizes sequential augmentations, such as the SimCLR augmentation scheme for contrastive learning. Using a top-down, recursive search procedure, our algorithm achieves a runtime complexity of $O(2^d k)$, which is much faster than $O(k^d)$ as $$k$$ increases above $$2$$. We apply our algorithm to tackle data distributions with heterogeneous subpopulations by searching for one tree in each subpopulation and then learning a weighted combination, resulting in a \emph{forest} of trees. We validate our proposed algorithms on numerous graph and image datasets, including a multi-label graph classification dataset we collected. The dataset exhibits significant variations in the sizes of graphs and their average degrees, making it ideal for studying data augmentation. We show that our approach can reduce the computation cost by 43% over existing search methods while improving performance by 4.3%. The tree structures can be used to interpret the relative importance of each transformation, such as identifying the important transformations on small vs. large graphs. 
    more » « less
  4. Abstract We provide a deterministic algorithm that finds, in ɛ - O (1) n 2 time, an ɛ-regular Frieze–Kannan partition of a graph on n vertices. The algorithm outputs an approximation of a given graph as a weighted sum of ɛ - O (1) many complete bipartite graphs. As a corollary, we give a deterministic algorithm for estimating the number of copies of H in an n-vertex graph G up to an additive error of at most ɛn v(H) , in time ɛ - O H (1) n 2 . 
    more » « less
  5. Given a set of points $$P = (P^+ \sqcup P^-) \subset \mathbb{R}^d$$ for some constant $$d$$ and a supply function $$\mu:P\to \mathbb{R}$$ such that $$\mu(p) > 0~\forall p \in P^+$$, $$\mu(p) < 0~\forall p \in P^-$$, and $$\sum_{p\in P}{\mu(p)} = 0$$, the geometric transportation problem asks one to find a transportation map $$\tau: P^+\times P^-\to \mathbb{R}_{\ge 0}$$ such that $$\sum_{q\in P^-}{\tau(p, q)} = \mu(p)~\forall p \in P^+$$, $$\sum_{p\in P^+}{\tau(p, q)} = -\mu(q) \forall q \in P^-$$, and the weighted sum of Euclidean distances for the pairs $$\sum_{(p,q)\in P^+\times P^-}\tau(p, q)\cdot ||q-p||_2$$ is minimized. We present the first deterministic algorithm that computes, in near-linear time, a transportation map whose cost is within a $$(1 + \varepsilon)$$ factor of optimal. More precisely, our algorithm runs in $$O(n\varepsilon^{-(d+2)}\log^5{n}\log{\log{n}})$$ time for any constant $$\varepsilon > 0$$. While a randomized $$n\varepsilon^{-O(d)}\log^{O(d)}{n}$$ time algorithm for this problem was discovered in the last few years, all previously known deterministic $$(1 + \varepsilon)$$-approximation algorithms run in~$$\Omega(n^{3/2})$$ time. A similar situation existed for geometric bipartite matching, the special case of geometric transportation where all supplies are unit, until a deterministic $$n\varepsilon^{-O(d)}\log^{O(d)}{n}$$ time $$(1 + \varepsilon)$$-approximation algorithm was presented at STOC 2022. Surprisingly, our result is not only a generalization of the bipartite matching one to arbitrary instances of geometric transportation, but it also reduces the running time for all previously known $$(1 + \varepsilon)$$-approximation algorithms, randomized or deterministic, even for geometric bipartite matching. In particular, we give the first $$(1 + \varepsilon)$$-approximate deterministic algorithm for geometric bipartite matching and the first $$(1 + \varepsilon)$$-approximate deterministic or randomized algorithm for geometric transportation with no dependence on $$d$$ in the exponent of the running time's polylog. As an additional application of our main ideas, we also give the first randomized near-linear $$O(\varepsilon^{-2} m \log^{O(1)} n)$$ time $$(1 + \varepsilon)$$-approximation algorithm for the uncapacitated minimum cost flow (transshipment) problem in undirected graphs with arbitrary \emph{real} edge costs. 
    more » « less