 NSFPAR ID:
 10057815
 Date Published:
 Journal Name:
 SIAM: ACMSIAM Symposium on Discrete Algorithms (SODA17)
 Page Range / eLocation ID:
 2351 to 2363
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We study the problem of estimating the value of sums of the form Sp≜∑(xip) when one has the ability to sample xi≥0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a selfjoin query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of pstars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1±ε)multiplicative approximation of Sp has query and time complexities O(mloglognϵ2S1/pp). Here, m=∑xi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where Sp≤n, and p=2, our upper bound is O~(n/S1/2p), in contrast to their Ω(n/S1/3p) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between indegree and outdegree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case.more » « less

In bipartite matching problems, vertices on one side of a bipartite graph are paired with those on the other. In its online variant, one side of the graph is available offline, while the vertices on the other side arrive online. When a vertex arrives, an irrevocable and immediate decision should be made by the algorithm; either match it to an available vertex or drop it. Examples of such problems include matching workers to firms, advertisers to keywords, organs to patients, and so on. Much of the literature focuses on maximizing the total relevance—modeled via total weight—of the matching. However, in many realworld problems, it is also important to consider contributions of diversity: hiring a diverse pool of candidates, displaying a relevant but diverse set of ads, and so on. In this paper, we propose the Online Submodular Bipartite Matching (OSBM) problem, where the goal is to maximize a submodular function f over the set of matched edges. This objective is general enough to capture the notion of both diversity (e.g., a weighted coverage function) and relevance (e.g., the traditional linear function)—as well as many other natural objective functions occurring in practice (e.g., limited total budget in advertising settings). We propose novel algorithms that have provable guarantees and are essentially optimal when restricted to various special cases. We also run experiments on realworld and synthetic datasets to validate our algorithms.more » « less

Hindsight Optimality in TwoWay Matching Networks
In “On the Optimality of Greedy Policies in Dynamic Matching”, Kerimov, Ashlagi, and Gurvich study centralized dynamic matching markets with finitely many agent types and heterogeneous match values. A matching policy is hindsight optimal if the policy can (nearly) maximize the total value simultaneously at all times. The article establishes that suitably designed greedy policies are hindsight optimal in twoway matching networks. This implies that there is essentially no positive externality from having agents waiting to form future matches. Proposed policies include the greedy longestqueue policy, with a minor variation, as well as a greedy static priority policy. The matching networks considered in this work satisfy a general position condition. General position is a weak (but necessary) condition that holds when the staticplanning problem (a linear program that optimizes the firstorder matching rates) has a unique and nondegenerate optimal solution.

The online matching problem was introduced by Karp, Vazirani and Vazirani nearly three decades ago. In that seminal work, they studied this problem in bipartite graphs with vertices arriving only on one side, and presented optimal deterministic and randomized algorithms for this setting. In comparison, more general arrival models, such as edge arrivals and general vertex arrivals, have proven more challenging, and positive results are known only for various relaxations of the problem. In particular, even the basic question of whether randomization allows one to beat the triviallyoptimal deterministic competitive ratio of 1/2 for either of these models was open. In this paper, we resolve this question for both these natural arrival models, and show the following. For edge arrivals, randomization does not help  no randomized algorithm is better than 1/2 competitive. For general vertex arrivals, randomization helps  there exists a randomized (1/2+ Ω(1))competitive online matching algorithm.more » « less

null (Ed.)We introduce the General Pairwise Model (GPM), a general parametric framework for pairwise comparison. Under the umbrella of the exponential family, the GPM unifies many pop ular models with discrete observations, including the Thurstone (Case V), BerryTerryLuce (BTL) and Ordinal Models, along with models with continuous observations, such as the Gaussian Pairwise Cardinal Model. Using information theoretic techniques, we establish minimax lower bounds with tight topological dependence. When applied as a special case to the Ordinal Model, our results uniformly improve upon previously known lower bounds and confirms one direction of a conjecture put forth by Shah et al. (2016). Performance guarantees of the MLE for a broad class of GPMs with subgaussian assumptions are given and compared against our lower bounds, showing that in many natural settings the MLE is optimal up to constants. Matching lower and upper bounds (up to constants) are achieved by the Gaussian Pairwise Cardinal Model, suggesting that our lower bounds are bestpossible under the few assumptions we adopt.more » « less