Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Semiring provenance is a successful approach, originating in database theory, to providing detailed information on how atomic facts combine to yield the result of a query. In particular, general provenance semirings of polynomials or formal power series provide precise descriptions of the evaluation strategies or “proof trees” for the query. By evaluating these descriptions in specific application semirings, one can extract practical information for instance about the confidence of a query or the cost of its evaluation. This paper develops semiring provenance for very general logical languages featuring the full interaction between negation and fixedpoint inductions or, equivalently, arbitrary interleavings of least and greatest fixed points. This also opens the door to provenance analysis applications for modal μcalculus and temporal logics, as well as for finite and infinite modelchecking games. Interestingly, the common approach based on Kleene’s FixedPoint Theorem for ωcontinuous semirings is not sufficient for these general languages. We show that an adequate framework for the provenance analysis of full fixedpoint logics is provided by semirings that are (1) fully continuous, and (2) absorptive. Full continuity guarantees that provenance values of least and greatest fixedpoints are welldefined. Absorptive semirings provide a symmetry between least and greatest fixedpoints and makemore »

The ubiquitous use of machine learning algorithms brings new challenges to traditional database problems such as incremental view update. Much effort is being put in better understanding and debugging machine learning models, as well as in identifying and repairing errors in training datasets. Our focus is on how to assist these activities when they have to retrain the machine learning model after removing problematic training samples in cleaning or selecting different subsets of training data for interpretability. This paper presents an efficient provenancebased approach, PrIU, and its optimized version, PrIUopt, for incrementally updating model parameters without sacrificing prediction accuracy. We prove the correctness and convergence of the incrementally updated model parameters, and validate it experimentally. Experimental results show that up to two orders of magnitude speedups can be achieved by PrIUopt compared to simply retraining the model from scratch, yet obtaining highly similar models.

This paper introduces a new cryptographic primitive called a private resource allocator (PRA) that can be used to allocate resources (e.g., network bandwidth, CPUs) to a set of clients without revealing to the clients whether any other clients received resources. We give several constructions of PRAs that provide guarantees ranging from informationtheoretic to differential privacy. PRAs are useful in preventing a new class of attacks that we call allocationbased sidechannel attacks. These attacks can be used, for example, to break the privacy guarantees of anonymous messaging systems that were designed specifically to defend against sidechannel and traffic analysis attacks. Our implementation of PRAs in Alpenhorn, which is a recent anonymous messaging system, shows that PRAs increase the network resources required to start a conversation by up to 16× (can be made as low as 4×in some cases), but add no overhead once the conversation has been established.

Suppose a graph G is stochastically created by uniformly sampling vertices along a line segment and connecting each pair of vertices with a probability that is a known decreasing function of their distance. We ask if it is possible to reconstruct the actual positions of the vertices in G by only observing the generated unlabeled graph. We study this question for two natural edge probability functions — one where the probability of an edge decays exponentially with the distance and another where this probability decays only linearly. We initiate our study with the weaker goal of recovering only the order in which vertices appear on the line segment. For a segment of length n and a precision parameter δ, we show that for both exponential and linear decay edge probability functions, there is an efficient algorithm that correctly recovers (up to reflection symmetry) the order of all vertices that are at least δ apart, using only ˜ O( n / δ^2) samples (vertices). Building on this result, we then show that O( n^2 log n / δ^2) vertices (samples) are sufficient to additionally recover the location of each vertex on the line to within a precision of δ. We complementmore »

We consider the problem of designing sublinear time algorithms for estimating the cost of minimum] metric traveling salesman (TSP) tour. Specifically, given access to a n × n distance matrix D that specifies pairwise distances between n points, the goal is to estimate the TSP cost by performing only sublinear (in the size of D) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any epsilon > 0, there exists an O^~(n/epsilon^O(1))time algorithm that returns a (1+epsilon)approximate estimate of the MST cost. This result immediately implies an O^~(n/epsilon^O(1)) time algorithm to estimate the TSP cost to within a (2 + epsilon) factor for any epsilon > 0. However, no o(n^2)time algorithms are known to approximate metric TSP to a factor that is strictly better than 2. On the other hand, there were also no known barriers that rule out existence of (1 + epsilon)approximate estimation algorithms for metric TSP with O^~ (n) time for any fixed epsilon > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. On the algorithmic side, we first consider the graphic TSP problemmore »

We initiate a systematic study of linear sketching over F_2. For a given Boolean function treated as f : F_2^n > F_2 a randomized F_2sketch is a distribution M over d x n matrices with elements over F_2 such that Mx suffices for computing f(x) with high probability. Such sketches for d << n can be used to design smallspace distributed and streaming algorithms. Motivated by these applications we study a connection between F_2sketching and a twoplayer oneway communication game for the corresponding XORfunction. We conjecture that F_2sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) lowdegree F_2polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that F_2sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (nonuniform) streaming algorithms that have to process random updates over F_2 can be constructed as F_2sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require themore »