Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Real-world problems often have parameters that are uncertain during the optimization phase; stochastic optimization or stochastic programming is a key approach introduced by Beale and by Dantzig in the 1950s to address such uncertainty. Matching is a classical problem in combinatorial optimization. Modern stochastic versions of this problem model problems in kidney exchange, for instance. We improve upon the current-best approximation bound of 3.709 for stochastic matching due to Adamczyk et al. (in: Algorithms-ESA 2015, Springer, Berlin, 2015) to 3.224; we also present improvements on Bansal et al. (Algorithmica 63(4):733–762, 2012) for hypergraph matching and for relaxed versions of the problem. These results are obtained by improved analyses and/or algorithms for rounding linear-programming relaxations of these problems.
The increasing reliance on robust data-driven decision-making across many domains has made it necessary for data management systems to manage many thousands to millions of versions of datasets, acquired or constructed at various stages of analysis pipelines over time. Delta encoding is an effective and widely-used solution to compactly store a large number of datasets, that simultaneously exploits redundancies across them and keeps the average retrieval cost of reconstructing any dataset low. However, supporting any kind of rich retrieval or querying functionality, beyond single dataset checkout, is challenging in such storage engines. In this paper, we initiate a systematic study of this problem, and present DEX, a novel stand-alone delta-oriented execution engine, whose goal is to take advantage of the already computed deltas between the datasets for efficient query processing. In this work, we study how to execute checkout, intersection, union and t-threshold queries over record-based files; we show that processing of even these basic queries leads to many new and unexplored challenges and trade-offs. Starting from a query plan that confines query execution to a small set of deltas, we introduce new transformation rules based on the algebraic properties of the deltas, that allow us to explore the searchmore »
As data-driven methods are becoming pervasive in a wide variety of disciplines, there is an urgent need to develop scalable and sustainable tools to simplify the process of data science, to make it easier to keep track of the analyses being performed and datasets being generated, and to enable introspection of the workflows. In this paper, we describe our vision of a unified provenance and metadata management system to support lifecycle management of complex collaborative data science workflows. We argue that a large amount of information about the analysis processes and data artifacts can, and should be, captured in a semi-passive manner; and we show that querying and analyzing this information can not only simplify bookkeeping and debugging tasks for data analysts but can also enable a rich new set of capabilities like identifying flaws in the data science process itself. It can also significantly reduce the time spent in fixing post-deployment problems through automated analysis and monitoring. We have implemented an initial prototype of our system, called ProvDB, on top of git (a version control system) and Neo4j (a graph database), and we describe its key features and capabilities.