skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Provenance for Probabilistic Logic Programs
Despite the emergence of probabilistic logic programming (PLP) languages for data driven applications, there are currently no debugging tools based on provenance for PLP programs. In this paper, we propose a novel provenance model and system, called P3 (Provenance for Probabilistic logic Programs) for analyzing PLP programs. P3 enables four types of provenance queries: traditional explanation queries, queries for finding the set of most important derivations within an approximate error, top-K most influential queries, and modification queries that enable us to modify tuple probabilities with fewest modifications to program or input data. We apply these queries into real-world scenarios and present theoretical analysis and practical algorithms for such queries. We have developed a prototype of P3, and our evaluation on real-world data demonstrates that the system can support a wide-range of provenance queries with explainable results. Moreover, the system maintains provenance and execute queries efficiently with low overhead.  more » « less
Award ID(s):
1704189
PAR ID:
10485034
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
OpenProceedings
Date Published:
Journal Name:
23rd International Conference on Extending Database Technology (EDBT)
ISSN:
2367-2005
ISBN:
978-3-89318-083-7
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep learning and symbolic reasoning are complementary techniques for an intelligent system. However, principled combinations of these techniques are typically limited in scalability, rendering them ill-suited for real-world applications. We propose Scallop, a system that builds upon probabilistic deductive databases, to bridge this gap. The key insight underlying Scallop is a provenance framework that introduces a tunable parameter to specify the level of reasoning granularity. Scallop thereby i) generalizes exact probabilistic reasoning, ii) asymptotically reduces computational cost, and iii) provides relative accuracy guarantees. On synthetic tasks involving mathematical and logical reasoning, Scallop scales significantly better without sacrificing accuracy compared to DeepProbLog, a principled neural logic programming approach. Scallop also scales to a newly created real-world Visual Question Answering (VQA) benchmark that requires multi-hop reasoning, achieving 84.22% accuracy and outperforming two VQA-tailored models based on Neural Module Networks and transformers by 12.42% and 21.66% respectively. 
    more » « less
  2. Answering complex First-Order Logical (FOL) queries on large-scale incomplete knowledge graphs (KGs) is an important yet challenging task. Recent advances embed logical queries and KG entities in the same space and conduct query answering via dense similarity search. However, most logical operators designed in previous studies do not satisfy the axiomatic system of classical logic, limiting their performance. Moreover, these logical operators are parameterized and thus require many complex FOL queries as training data, which are often arduous to collect or even inaccessible in most real-world KGs. We thus present FuzzQE, a fuzzy logic based logical query embedding framework for answering FOL queries over KGs. FuzzQE follows fuzzy logic to define logical operators in a principled and learning-free manner, where only entity and relation embeddings require learning. FuzzQE can further benefit from labeled complex logical queries for training. Extensive experiments on two benchmark datasets demonstrate that FuzzQE provides significantly better performance in answering FOL queries compared to state-of-the-art methods. In addition, FuzzQE trained with only KG link prediction can achieve comparable performance to those trained with extra complex query data. 
    more » « less
  3. Endrullis, Jörg; Schmitz, Sylvain (Ed.)
    Kleene Algebra with Tests (KAT) provides a framework for algebraic equational reasoning about imperative programs. The recent variant Guarded KAT (GKAT) allows to reason on non-probabilistic properties of probabilistic programs. Here we introduce an extension of this framework called approximate GKAT (aGKAT), which equips GKAT with a partially ordered monoid (real numbers) enabling to express satisfaction of (deterministic) properties except with a probability up to a certain bound. This allows to represent in equational reasoning "à la KAT" proofs of probabilistic programs based on the union bound, a technique from basic probability theory. We show how a propositional variant of approximate Hoare Logic (aHL), a program logic for union bound, can be soundly encoded in our system aGKAT. We then illustrate the use of aGKAT with an example of accuracy analysis from the field of differential privacy. 
    more » « less
  4. Using synchronous dynamical systems (SyDSs) as a formal model for networked social systems, we study the problem of inferring users’ choices in such systems. We observe that SyDSs with deterministic and probabilistic threshold functions as local functions can capture users’ choices in the context of contagion propagation in social networks. We use an active query mechanism where a user interacts with a system by submitting queries, and the responses to the queries are used to infer the local functions. We develop methods that provide provably efficient query sets for inferring both deterministic and probabilistic forms of threshold functions. We also present experimental results using real world social networks. 
    more » « less
  5. Formal reasoning about hashing-based probabilistic data structures often requires reasoning about random variables where when one variable gets larger (such as the number of elements hashed into one bucket), the others tend to be smaller (like the number of elements hashed into the other buckets). This is an example ofnegative dependence, a generalization of probabilistic independence that has recently found interesting applications in algorithm design and machine learning. Despite the usefulness of negative dependence for the analyses of probabilistic data structures, existing verification methods cannot establish this property for randomized programs. To fill this gap, we design LINA, a probabilistic separation logic for reasoning about negative dependence. Following recent works on probabilistic separation logic usingseparating conjunctionto reason about the probabilistic independence of random variables, we use separating conjunction to reason about negative dependence. Our assertion logic features two separating conjunctions, one for independence and one for negative dependence. We generalize the logic of bunched implications (BI) to support multiple separating conjunctions, and provide a sound and complete proof system. Notably, the semantics for separating conjunction relies on anon-deterministic, rather than partial, operation for combining resources. By drawing on closure properties for negative dependence, our program logic supports a Frame-like rule for negative dependence andmonotoneoperations. We demonstrate how LINA can verify probabilistic properties of hash-based data structures and balls-into-bins processes. 
    more » « less