skip to main content


Title: Lower bounds for testing graphical models: colorings and antiferromagnetic Ising models
We study the identity testing problem in the context of spin systems or undirected graphical models, where it takes the following form: given the parameter specification of the model M and a sampling oracle for the distribution \mu_{M^*} of an unknown model M^*, can we efficiently determine if the two models M and M^* are the same? We consider identity testing for both soft-constraint and hard-constraint systems. In particular, we prove hardness results in two prototypical cases, the Ising model and proper colorings, and explore whether identity testing is any easier than structure learning. For the ferromagnetic (attractive) Ising model, Daskalasis et al. (2018) presented a polynomial time algorithm for identity testing. We prove hardness results in the antiferromagnetic (repulsive) setting in the same regime of parameters where structure learning is known to require a super-polynomial number of samples. In particular, for n-vertex graphs of maximum degree d, we prove that if |\beta| d = \omega(\log n) (where \beta is the inverse temperature parameter), then there is no identity testing algorithm for the antiferromagnetic Ising model that runs in polynomial time unless RP = NP. We also establish computational lower bounds for a broader set of parameters under the (randomized) exponential time hypothesis. In our proofs, we use random graphs as gadgets; this is inspired by similar constructions in seminal works on the hardness of approximate counting. In the hard-constraint setting, we present hardness results for identity testing for proper colorings. Our results are based on the presumed hardness of #BIS, the problem of (approximately) counting independent sets in bipartite graphs. In particular, we prove that identity testing for colorings is hard in the same range of parameters where structure learning is known to be hard, which in turn matches the parameter regime for NP-hardness of the corresponding decision problem.  more » « less
Award ID(s):
1819546
NSF-PAR ID:
10113943
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
99
ISSN:
2640-3498
Page Range / eLocation ID:
283 - 298
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the computational complexity of estimating local observables for Gibbs distributions. A simple combinatorial example is the average size of an independent set in a graph. A recent work of Galanis et al (2021) established NP-hardness of approximating the average size of an independent set utilizing hardness of the corresponding optimization problem and the related phase transition behavior. We instead consider settings where the underlying optimization problem is easily solvable. Our main contribution is to classify the complexity of approximating a wide class of observables via a generic reduction from approximate counting to the problem of estimating local observables. The key idea is to use the observables to interpolate the counting problem. Using this new approach, we are able to study observables on bipartite graphs where the underlying optimization problem is easy but the counting problem is believed to be hard. The most-well studied class of graphs that was excluded from previous hardness results were bipartite graphs. We establish hardness for estimating the average size of the independent set in bipartite graphs of maximum degree 6; more generally, we show tight hardness results for general vertex-edge observables for antiferromagnetic 2-spin systems on bipartite graphs. Our techniques go beyond 2-spin systems, and for the ferromagnetic Potts model we establish hardness of approximating the number of monochromatic edges in the same region as known hardness of approximate counting results. 
    more » « less
  2. Saraf, Shubhangi (Ed.)
    We initiate a study of the classification of approximation complexity of the eight-vertex model defined over 4-regular graphs. The eight-vertex model, together with its special case the six-vertex model, is one of the most extensively studied models in statistical physics, and can be stated as a problem of counting weighted orientations in graph theory. Our result concerns the approximability of the partition function on all 4-regular graphs, classified according to the parameters of the model. Our complexity results conform to the phase transition phenomenon from physics. We introduce a quantum decomposition of the eight-vertex model and prove a set of closure properties in various regions of the parameter space. Furthermore, we show that there are extra closure properties on 4-regular planar graphs. These regions of the parameter space are concordant with the phase transition threshold. Using these closure properties, we derive polynomial time approximation algorithms via Markov chain Monte Carlo. We also show that the eight-vertex model is NP-hard to approximate on the other side of the phase transition threshold. 
    more » « less
  3. Czumaj, Artur Dawar (Ed.)
    We consider the complexity of counting weighted graph homomorphisms defined by a symmetric matrix A. Each symmetric matrix A defines a graph homomorphism function Z A (·), also known as the partition function. Dyer and Greenhill [10] established a complexity dichotomy of Z A (·) for symmetric {0, 1}-matrices A, and they further proved that its #P-hardness part also holds for bounded degree graphs. Bulatov and Grohe [4] extended the Dyer-Greenhill dichotomy to nonnegative symmetric matrices A. However, their hardness proof requires graphs of arbitrarily large degree, and whether the bounded degree part of the Dyer-Greenhill dichotomy can be extended has been an open problem for 15 years. We resolve this open problem and prove that for nonnegative symmetric A, either Z A (G) is in polynomial time for all graphs G, or it is #P-hard for bounded degree (and simple) graphs G. We further extend the complexity dichotomy to include nonnegative vertex weights. Additionally, we prove that the #P-hardness part of the dichotomy by Goldberg et al. [12] for Z A (·) also holds for simple graphs, where A is any real symmetric matrix. 
    more » « less
  4. We study the identity testing problem for high-dimensional distributions. Given as input an explicit distribution μ, an ε>0, and access to sampling oracle(s) for a hidden distribution π, the goal in identity testing is to distinguish whether the two distributions μ and π are identical or are at least ε-far apart. When there is only access to full samples from the hidden distribution π, it is known that exponentially many samples (in the dimension) may be needed for identity testing, and hence previous works have studied identity testing with additional access to various “conditional” sampling oracles. We consider a significantly weaker conditional sampling oracle, which we call the Coordinate Oracle, and provide a computational and statistical characterization of the identity testing problem in this new model. We prove that if an analytic property known as approximate tensorization of entropy holds for an n-dimensional visible distribution μ, then there is an efficient identity testing algorithm for any hidden distribution π using O˜(n/ε) queries to the Coordinate Oracle. Approximate tensorization of entropy is a pertinent condition as recent works have established it for a large class of high-dimensional distributions. We also prove a computational phase transition: for a well-studied class of n-dimensional distributions, specifically sparse antiferromagnetic Ising models over {+1,−1}^n, we show that in the regime where approximate tensorization of entropy fails, there is no efficient identity testing algorithm unless RP=NP. We complement our results with a matching Ω(n/ε) statistical lower bound for the sample complexity of identity testing in the model. 
    more » « less
  5. In the Maximum Independent Set problem we are asked to find a set of pairwise nonadjacent vertices in a given graph with the maximum possible cardinality. In general graphs, this classical problem is known to be NP-hard and hard to approximate within a factor of n1−ε for any ε > 0. Due to this, investigating the complexity of Maximum Independent Set in various graph classes in hope of finding better tractability results is an active research direction. In H-free graphs, that is, graphs not containing a fixed graph H as an induced subgraph, the problem is known to remain NP-hard and APX-hard whenever H contains a cycle, a vertex of degree at least four, or two vertices of degree at least three in one connected component. For the remaining cases, where every component of H is a path or a subdivided claw, the complexity of Maximum Independent Set remains widely open, with only a handful of polynomial-time solvability results for small graphs H such as P5, P6, the claw, or the fork. We prove that for every such “possibly tractable” graph H there exists an algorithm that, given an H-free graph G and an accuracy parameter ε > 0, finds an independent set in G of cardinality within a factor of (1 – ε) of the optimum in time exponential in a polynomial of log | V(G) | and ε−1. That is, we show that for every graph H for which Maximum Independent Set is not known to be APX-hard in H-free graphs, the problem admits a quasi-polynomial time approximation scheme in this graph class. Our algorithm works also in the more general weighted setting, where the input graph is supplied with a weight function on vertices and we are maximizing the total weight of an independent set. 
    more » « less