skip to main content


Title: A separation logic for negative dependence
Formal reasoning about hashing-based probabilistic data structures often requires reasoning about random variables where when one variable gets larger (such as the number of elements hashed into one bucket), the others tend to be smaller (like the number of elements hashed into the other buckets). This is an example of negative dependence , a generalization of probabilistic independence that has recently found interesting applications in algorithm design and machine learning. Despite the usefulness of negative dependence for the analyses of probabilistic data structures, existing verification methods cannot establish this property for randomized programs. To fill this gap, we design LINA, a probabilistic separation logic for reasoning about negative dependence. Following recent works on probabilistic separation logic using separating conjunction to reason about the probabilistic independence of random variables, we use separating conjunction to reason about negative dependence. Our assertion logic features two separating conjunctions, one for independence and one for negative dependence. We generalize the logic of bunched implications (BI) to support multiple separating conjunctions, and provide a sound and complete proof system. Notably, the semantics for separating conjunction relies on a non-deterministic , rather than partial, operation for combining resources. By drawing on closure properties for negative dependence, our program logic supports a Frame-like rule for negative dependence and monotone operations. We demonstrate how LINA can verify probabilistic properties of hash-based data structures and balls-into-bins processes.  more » « less
Award ID(s):
2153916 2040249 2040222 2035314
NSF-PAR ID:
10318970
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
6
Issue:
POPL
ISSN:
2475-1421
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure. 
    more » « less
  2. Probabilistic couplings are the foundation for many probabilistic relational program logics and arise when relating random sampling statements across two programs. In relational program logics, this manifests as dedicated coupling rules that, e.g., say we may reason as if two sampling statements return the same value. However, this approach fundamentally requires aligning or synchronizing the sampling statements of the two programs which is not always possible.

    In this paper, we develop Clutch, a higher-order probabilistic relational separation logic that addresses this issue by supporting asynchronous probabilistic couplings. We use Clutch to develop a logical step-indexed logical relation to reason about contextual refinement and equivalence of higher-order programs written in a rich language with a probabilistic choice operator, higher-order local state, and impredicative polymorphism. Finally, we demonstrate our approach on a number of case studies.

    All the results that appear in the paper have been formalized in the Coq proof assistant using the Coquelicot library and the Iris separation logic framework. 

    more » « less
  3. Summary

    Most cancer research now involves one or more assays profiling various biological molecules, e.g., messenger RNA and micro RNA, in samples collected on the same individuals. The main interest with these genomic data sets lies in the identification of a subset of features that are active in explaining the dependence between platforms. To quantify the strength of the dependency between two variables, correlation is often preferred. However, expression data obtained from next-generation sequencing platforms are integer with very low counts for some important features. In this case, the sample Pearson correlation is not a valid estimate of the true correlation matrix, because the sample correlation estimate between two features/variables with low counts will often be close to zero, even when the natural parameters of the Poisson distribution are, in actuality, highly correlated. We propose a model-based approach to correlation estimation between two non-normal data sets, via a method we call Probabilistic Correlations ANalysis, or PCAN. PCAN takes into consideration the distributional assumption about both data sets and suggests that correlations estimated at the model natural parameter level are more appropriate than correlations estimated directly on the observed data. We demonstrate through a simulation study that PCAN outperforms other standard approaches in estimating the true correlation between the natural parameters. We then apply PCAN to the joint analysis of a microRNA (miRNA) and a messenger RNA (mRNA) expression data set from a squamous cell lung cancer study, finding a large number of negative correlation pairs when compared to the standard approaches.

     
    more » « less
  4. S. S. Karunakaran ; A. Higgins (Ed.)
    Vector spaces are often taught with an axiomatic focus, but this has been shown to rely on knowledge many students have not yet developed. In this paper, we examine two students’ conceptual resources for reasoning about null spaces drawing on data from a paired teaching experiment. The task sequence is set in the context of a school with one directional hallways. Students’ informal reasoning about paths that leave the room populations unchanged supported more formal reasoning about null spaces. We found that one student used context-based resources (such as ‘loops’ in hallway) to reason about null spaces, while the other student drew largely on previously formalized mathematical resources (e.g. free variables, linear dependence). The use of formal resources sometimes required recontextualization, which may function to constrain student sense-making or afford opportunities for broadening students’ formal prior knowledge. 
    more » « less
  5. Karunakaran, S. S. ; Higgins, A. (Ed.)
    Vector spaces are often taught with an axiomatic focus, but this has been shown to rely on knowledge many students have not yet developed. In this paper, we examine two students’ conceptual resources for reasoning about null spaces drawing on data from a paired teaching experiment. The task sequence is set in the context of a school with one directional hallways. Students’ informal reasoning about paths that leave the room populations unchanged supported more formal reasoning about null spaces. We found that one student used context-based resources (such as ‘loops’ in hallway) to reason about null spaces, while the other student drew largely on previously formalized mathematical resources (e.g. free variables, linear dependence). The use of formal resources sometimes required recontextualization, which may function to constrain student sense-making or afford opportunities for broadening students’ formal prior knowledge. 
    more » « less