Integrity Constraints Revisited: From Exact to Approximate Implication

Kenig, Batya; Suciu, Dan

doi:10.46298/lmcs-18(1:5)2022

Integrity constraints such as functional dependencies (FD) and multi-valueddependencies (MVD) are fundamental in database schema design. Likewise,probabilistic conditional independences (CI) are crucial for reasoning aboutmultivariate probability distributions. The implication problem studies whethera set of constraints (antecedents) implies another constraint (consequent), andhas been investigated in both the database and the AI literature, under theassumption that all constraints hold exactly. However, many applications todayconsider constraints that hold only approximately. In this paper we define anapproximate implication as a linear inequality between the degree ofsatisfaction of the antecedents and consequent, and we study the relaxationproblem: when does an exact implication relax to an approximate implication? Weuse information theory to define the degree of satisfaction, and prove severalresults. First, we show that any implication from a set of data dependencies(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at mostquadratic in the number of variables; when the consequent is an FD, the factorcan be reduced to 1. Second, we prove that there exists an implication betweenCIs that does not admit any relaxation; however, we prove that everyimplication between CIs relaxes "in the limit". Then, we show that theimplication problem for differential constraints in market basket analysis alsoadmits a relaxation with a factor equal to 1. Finally, we show how some of theresults in the paper can be derived using the I-measure theory, which relatesbetween information theoretic measures and set theory. Our results recover, andsometimes extend, previously known results about the implication problem: theimplication of MVDs and FDs can be checked by considering only 2-tuplerelations.

More Like this