Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
In this paper, we consider the classic fair division problem of allocating m divisible items to n agents with linear valuations over the items. We define novel notions of fair shares from the perspective of individual agents via the cake-cutting process. These shares generalize the notion of proportionality by taking into account the valuations of other agents via constraints capturing envy. We study what fraction (approximation) of these shares are achievable in the worst case, and present tight and non-trivial approximation bounds as a function of n and m. In particular, we show a tight approximation bound of Ξ(βn) for various notions of such shares. We show this bound via a novel application of dual fitting, which may be of independent interest. We also present a bound of O(m^(2/3)) for a strict notion of share, with an almost matching lower bound. We further develop weaker notions of shares whose approximation bounds interpolate smoothly between proportionality and the shares described above. We finally present empirical results showing that our definitions lead to more reasonable shares than the standard fair share notion of proportionality.more » « lessFree, publicly-accessible full text available April 11, 2026
-
Bun, Mark (Ed.)We consider the setting where a user with sensitive features wishes to obtain a recommendation from a server in a differentially private fashion. We propose a "multi-selection" architecture where the server can send back multiple recommendations and the user chooses one from these that matches best with their private features. When the user feature is one-dimensional - on an infinite line - and the accuracy measure is defined w.r.t some increasing function π₯(.) of the distance on the line, we precisely characterize the optimal mechanism that satisfies differential privacy. The specification of the optimal mechanism includes both the distribution of the noise that the user adds to its private value, and the algorithm used by the server to determine the set of results to send back as a response. We show that Laplace is an optimal noise distribution in this setting. Furthermore, we show that this optimal mechanism results in an error that is inversely proportional to the number of results returned when the function π₯(.) is the identity function.more » « less
-
Bun, Mark (Ed.)We consider the problem of assigning students to schools when students have different utilities for schools and schools have limited capacities. The students belong to demographic groups, and fairness over these groups is captured either by concave objectives, or additional constraints on the utility of the groups. We present approximation algorithms for this assignment problem with group fairness via convex program rounding. These algorithms achieve various trade-offs between capacity violation and running time. We also show that our techniques easily extend to the setting where there are arbitrary constraints on the feasible assignment, capturing multi-criteria optimization. We present simulation results that demonstrate that the rounding methods are practical even on large problem instances, with the empirical capacity violation being much better than the theoretical bounds.more » « less
-
Data summarization is a powerful approach to deal with large-scale data analytics, which has wide applications in web search, recommendation systems, approximate query processing, etc. It computes a small, compact summary that preserves vital properties of the original data. In this paper, we study the data summarization problem of conjunctive query results, i.e., computing a k-size subset of a conjunctive query output, for any given k>0, that optimizes a certain objective. More specifically, we are interested in two commonly studied objectives: cohesion, which measures the maximum distance between a tuple in the query result tuples and its closest tuple in the summary (k-center clustering); and diversity, which measures the pairwise distances between the summary items. A simple approach that computes the entire query output and then applies existing algorithms on top of these materialized tuples suffers from high computational complexity because the query output can be large, e.g., for a relational database of N tuples, the number of result tuples can be NO(1).We propose O(1)-approximation algorithms that compute well-representative summaries of size k in time O(N*kO(1)), or even O(N+ kO(1)) in some cases, without computing all result tuples. We also propose the first efficient (2+\eps)-approximation algorithm for the k-center clustering problem over relational data. Our main idea is to formulate a few oracles that enable us to access specific query result tuples with certain properties, to show how these oracles can be implemented efficiently, and to compute desired summaries with few invocations of these oracles.more » « less
-
In this paper, we address clustering problems in scenarios where accurate direct access to the full dataset is impractical or impossible. Instead, we leverage oracle-based methods, which are particularly valuable in real-world applications where the data may be noisy, restricted due to privacy concerns or sheer volume. We utilize two oracles, the quadruplet and the distance oracle. The quadruplet oracle is a weaker oracle that only approximately compares the distances of two pairs of vertices. In practice, these oracles can be implemented using crowdsourcing or training classifiers or other predictive models. On the other hand, the distance oracle returns exactly the distance of two vertices, so it is a stronger and more expensive oracle to implement. We consider two noise models for the quadruplet oracle. In the adversarial noise model, if two pairs have similar distances, the response is chosen by an adversary. In the probabilistic noise model, the pair with the smaller distance is returned with a constant probability. We consider a set V of n vertices in a metric space that supports the quadruplet and the distance oracle. For each of the k-center, k-median, and k-means clustering problem on V, we design constant approximation algorithms that perform roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle in both noise models. When the dataset has low intrinsic dimension, we significantly improve the approximation factors of our algorithms by performing a few additional calls to the distance oracle. We also show that for k-median and k-means clustering there is no hope to return any sublinear approximation using only the quadruplet oracle. Finally, we give constant approximation algorithms for estimating the clustering cost induced by any set of k vertices, performing roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle.more » « less
-
Finding patterns in graphs is a fundamental problem in databases and data mining. In many applications, graphs are temporal and evolve over time, so we are interested in finding durable patterns, such as triangles and paths, which persist over a long time. While there has been work on finding durable simple patterns, existing algorithms do not have provable guarantees and run in strictly super-linear time. The paper leverages the observation that many graphs arising in practice are naturally proximity graphs or can be approximated as such, where nodes are embedded as points in some high-dimensional space, and two nodes are connected by an edge if they are close to each other. We work with an implicit representation of the proximity graph, where nodes are additionally annotated by time intervals, and design near-linear-time algorithms for finding (approximately) durable patterns above a given durability threshold. We also consider an interactive setting where a client experiments with different durability thresholds in a sequence of queries; we show how to compute incremental changes to result patterns efficiently in time near-linear to the size of the changes.more » « less
-
Cormode, Graham; Shekelyan, Michael (Ed.)We are given a set π΅ = {(R_1,s_1), β¦, (R_n,s_n)}, where each R_i is a range in β^d, such as rectangle or ball, and s_i β [0,1] denotes its selectivity. The goal is to compute a small-size discrete data distribution π = {(qβ,wβ),β¦, (q_m,w_m)}, where q_j β β^d and w_j β [0,1] for each 1 β€ j β€ m, and β_{1β€jβ€m} w_j = 1, such that π is the most consistent with π΅, i.e., err_p(π,π΅) = 1/n β_{i = 1}βΏ |s_i - β_{j=1}^m w_jβ 1(q_j β R_i)|^p is minimized. In a database setting, π΅ corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and π can be used as compact model for approximating the data distribution within the table without accessing the underlying contents. In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is NP-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time O((n+Ξ΄^{-d}) Ξ΄^{-2} polylog n), a discrete distribution πΜ of size O(Ξ΄^{-2}), such that err_p(πΜ,π΅) β€ min_π err_p(π,π΅)+Ξ΄ (for p = 1,2,β) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely.more » « less
An official website of the United States government

Full Text Available