We study the problem of estimating the value of sums of the form Sp≜∑(xip) when one has the ability to sample xi≥0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a selfjoin query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of pstars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1±ε)multiplicative approximation of Sp has query and time complexities O(mloglognϵ2S1/pp). Here, m=∑xi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where Sp≤n, and p=2, our upper bound is O~(n/S1/2p), in contrast to their Ω(n/S1/3p) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between indegree and outdegree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case.
more »
« less
Opting Into Optimal Matchings
We revisit the problem of designing optimal, individually rational matching mechanisms (in
a general sense, allowing for cycles in directed graphs), where each player — who is associated
with a subset of vertices — matches as many of his own vertices when he opts into the matching
mechanism as when he opts out. We offer a new perspective on this problem by considering an
arbitrary graph, but assuming that vertices are associated with players at random. Our main
result asserts that, under certain conditions, any fixed optimal matching is likely to be individually
rational up to lowerorder terms. We also show that a simple and practical mechanism is
(fully) individually rational, and likely to be optimal up to lowerorder terms. We discuss the
implications of our results for market design in general, and kidney exchange in particular.
more »
« less
 NSFPAR ID:
 10057815
 Date Published:
 Journal Name:
 SIAM: ACMSIAM Symposium on Discrete Algorithms (SODA17)
 Page Range / eLocation ID:
 2351 to 2363
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We consider the maximum vertexweighted matching problem (MVM), in which nonnegative weights are assigned to the vertices of a graph, and the weight of a matching is the sum of the weights of the matched vertices. Although exact algorithms for MVM are faster than exact algorithms for the maximum edgeweighted matching problem, there are graphs on which these exact algorithms could take hundreds of hours. For a natural number k, we design a k/(k + 1)approximation algorithm for MVM on nonbipartite graphs that updates the matching along certain short paths in the graph: either augmenting paths of length at most 2k + 1 or weightincreasing paths of length at most 2k. The choice of k = 2 leads to a 2/3approximation algorithm that computes nearly optimal weights fast. This algorithm could be initialized with a 2/3approximate maximum cardinality matching to reduce its runtime in practice. A 1/2approximation algorithm may be obtained using k = 1, which is faster than the 2/3approximation algorithm but it computes lower weights. The 2/3approximation algorithm has time complexity O(Δ2m) while the time complexity of the 1/2approximation algorithm is O(Δm), where m is the number of edges and Δ is the maximum degree of a vertex. Results from our serial implementations show that on average the 1/2approximation algorithm runs faster than the Suitor algorithm (currently the fastest 1/2approximation algorithm) while the 2/3approximation algorithm runs as fast as the Suitor algorithm but obtains higher weights for the matching. One advantage of the proposed algorithms is that they are wellsuited for parallel implementation since they can process vertices to match in any order. The 1/2 and 2/3approximation algorithms have been implemented on a shared memory parallel computer, and both approximation algorithms obtain good speedups, while the former algorithm runs faster on average than the parallel Suitor algorithm. Care is needed to design the parallel algorithm to avoid cyclic waits, and we prove that it is livelock free.more » « less

In bipartite matching problems, vertices on one side of a bipartite graph are paired with those on the other. In its online variant, one side of the graph is available offline, while the vertices on the other side arrive online. When a vertex arrives, an irrevocable and immediate decision should be made by the algorithm; either match it to an available vertex or drop it. Examples of such problems include matching workers to firms, advertisers to keywords, organs to patients, and so on. Much of the literature focuses on maximizing the total relevance—modeled via total weight—of the matching. However, in many realworld problems, it is also important to consider contributions of diversity: hiring a diverse pool of candidates, displaying a relevant but diverse set of ads, and so on. In this paper, we propose the Online Submodular Bipartite Matching (OSBM) problem, where the goal is to maximize a submodular function f over the set of matched edges. This objective is general enough to capture the notion of both diversity (e.g., a weighted coverage function) and relevance (e.g., the traditional linear function)—as well as many other natural objective functions occurring in practice (e.g., limited total budget in advertising settings). We propose novel algorithms that have provable guarantees and are essentially optimal when restricted to various special cases. We also run experiments on realworld and synthetic datasets to validate our algorithms.more » « less

We study the revenue guarantees and approximability of item pricing. Recent work shows that with n heterogeneous items, itempricing guarantees an O(logn) approximation to the optimal revenue achievable by any (buymany) mechanism, even when buyers have arbitrarily combinatorial valuations. However, finding good item prices is challenging – it is known that even under unitdemand valuations, it is NPhard to find item prices that approximate the revenue of the optimal item pricing better than O(√n). Our work provides a more finegrained analysis of the revenue guarantees and computational complexity in terms of the number of item “categories” which may be significantly fewer than n. We assume the items are partitioned in k categories so that items within a category are totallyordered and a buyer’s value for a bundle depends only on the best item contained from every category. We show that itempricing guarantees an O(logk) approximation to the optimal (buymany) revenue and provide a PTAS for computing the optimal itempricing when k is constant. We also provide a matching lower bound showing that the problem is (strongly) NPhard even when k=1. Our results naturally extend to the case where items are only partially ordered, in which case the revenue guarantees and computational complexity depend on the width of the partial ordering, i.e. the largest set for which no two items are comparable.more » « less

We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using keystretching to protect user passwords. In fact, LastPass' use of PBKDF2SHA256 with $10^5$ hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted keystretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf's law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf's law distribution. We show that there is a finite threshold which depends on the Zipf's law parameters that characterizes the behavior of a rational attacker  if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary's optimal strategy is {\em always} to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf's law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of $500$ million cracked passwords is less than $100$ times the total value of $5$ million passwords). On a positive note our analysis demonstrates that memory hard functions (MHFs) such as SCRYPT or Argon2i can significantly reduce the damage of an offline attack. In particular, we find that because MHFs substantially increase guessing costs a rational attacker will give up well before he cracks most user passwords and this prediction holds even if the attacker does not encounter diminishing returns for additional cracked passwords. Based on our analysis we advocate that password hashing standards should be updated to require the use of memory hard functions for password hashing and disallow the use of nonmemory hard functions such as BCRYPT or PBKDF2.more » « less