skip to main content


Title: Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods
We consider Ising models on the hypercube with a general interaction matrix 𝐽, and give a polynomial time sampling algorithm when all but 𝑂(1) eigenvalues of 𝐽 lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when \emph{all} eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for low-rank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with low-dimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive mean-field approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the Hubbard-Stratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGD-based sampler to solve the full problem.  more » « less
Award ID(s):
1704417
NSF-PAR ID:
10354700
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR
Volume:
178
Page Range / eLocation ID:
4945-4988
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $\mu$ on $k$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $S$, the relative entropy of a single element of $S$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $S$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponential-sized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified log-Sobolev inequalities, for down-up random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence. We furthermore extend our theory to the case where spectral independence does not hold under arbitrary external fields. To do this, we introduce a framework for obtaining tight mixing time bounds for Markov chains based on what we call restricted modified log-Sobolev inequalities, which guarantee entropy contraction not for all distributions, but for those in a sufficiently large neighborhood of the stationary distribution. To derive our results, we relate entropic independence to properties of polynomials: $\mu$ is entropically independent exactly when a transformed version of the generating polynomial of $\mu$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain (1) tight modified log-Sobolev inequalities and mixing times for multi-step down-up walks on fractionally log-concave distributions, (2) the tight mixing time of $O(n\log n)$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $1$, improving upon the prior quadratic dependence on $n$, and (3) nearly-linear time $\widetilde O_{\delta}(n)$ samplers for the hardcore and Ising models on $n$-node graphs that have $\delta$-relative gap to the tree-uniqueness threshold. In the last application, our bound on the running time does not depend on the maximum degree $\Delta$ of the graph, and is therefore optimal even for high-degree graphs, and in fact, is sublinear in the size of the graph for high-degree graphs. 
    more » « less
  2. Abstract The $p$-tensor Ising model is a one-parameter discrete exponential family for modeling dependent binary data, where the sufficient statistic is a multi-linear form of degree $p \geqslant 2$. This is a natural generalization of the matrix Ising model that provides a convenient mathematical framework for capturing, not just pairwise, but higher-order dependencies in complex relational data. In this paper, we consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. Our estimate is based on the maximum pseudolikelihood (MPL) method, which provides a computationally efficient algorithm for estimating the parameter that avoids computing the intractable partition function. We derive general conditions under which the MPL estimate is $\sqrt N$-consistent, that is, it converges to the true parameter at rate $1/\sqrt N$. Our conditions are robust enough to handle a variety of commonly used tensor Ising models, including spin glass models with random interactions and models where the rate of estimation undergoes a phase transition. In particular, this includes results on $\sqrt N$-consistency of the MPL estimate in the well-known $p$-spin Sherrington–Kirkpatrick model, spin systems on general $p$-uniform hypergraphs and Ising models on the hypergraph stochastic block model (HSBM). In fact, for the HSBM we pin down the exact location of the phase transition threshold, which is determined by the positivity of a certain mean-field variational problem, such that above this threshold the MPL estimate is $\sqrt N$-consistent, whereas below the threshold no estimator is consistent. Finally, we derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie–Weiss model, which is the Ising model on the complete $p$-uniform hypergraph. An interesting consequence of our results is that the MPL estimate in the Curie–Weiss model saturates the Cramer–Rao lower bound at all points above the estimation threshold, that is, the MPL estimate incurs no loss in asymptotic statistical efficiency in the estimability regime, even though it is obtained by minimizing only an approximation of the true likelihood function for computational tractability. 
    more » « less
  3. For general spin systems, we prove that a contractive coupling for an arbitrary local Markov chain implies optimal bounds on the mixing time and the modified log-Sobolev constant for a large class of Markov chains including the Glauber dynamics, arbitrary heat-bath block dynamics, and the Swendsen-Wang dynamics. This reveals a novel connection between probabilistic techniques for bounding the convergence to stationarity and analytic tools for analyzing the decay of relative entropy. As a corollary of our general results, we obtain O(n log n) mixing time and Ω(1/n) modified log-Sobolev constant of the Glauber dynamics for sampling random q-colorings of an n-vertex graph with constant maximum degree Δ when q > (11/6–∊0)Δ for some fixed ∊0 > 0. We also obtain O(log n) mixing time and Ω(1) modified log-Sobolev constant of the Swendsen-Wang dynamics for the ferromagnetic Ising model on an n-vertex graph of constant maximum degree when the parameters of the system lie in the tree uniqueness region. At the heart of our results are new techniques for establishing spectral independence of the spin system and block factorization of the relative entropy. On one hand we prove that a contractive coupling of any local Markov chain implies spectral independence of the Gibbs distribution. On the other hand we show that spectral independence implies factorization of entropy for arbitrary blocks, establishing optimal bounds on the modified log-Sobolev constant of the corresponding block dynamics. 
    more » « less
  4. null (Ed.)
    The primary objectives of International Ocean Discovery Program (IODP) Expedition 367/368 to the northern South China Sea (SCS) margin were to (1) examine its history of continental breakup and (2) compare it with other nonvolcanic or magma-poor rifted margins with the broader goal of testing models for continental breakup. A secondary objective was to further our understanding of the paleoceanographic and environmental development of the SCS and southeast Asia during the Cenozoic. Four primary sites were selected for the overall program: one in the outer margin high (OMH) and three seaward of the OMH on distinct, margin-parallel basement ridges. These three ridges are informally labeled A, B, and C and are located in the continent–ocean transition (COT) zone ranging from the OMH to the interpreted steady-state oceanic crust (Ridge C) of the SCS. The main scientific objectives include the following: Determining the nature of the basement in crustal units across the COT of the SCS that are critical to constrain style of rifting, Constraining the time interval from initial crustal extension and plate rupture to the initial generation of igneous ocean crust, Constraining vertical crustal movements during breakup, and Examining the nature of igneous activity from rifting to seafloor spreading. In addition, the sediment cores from the drill sites targeting primarily tectonic and basement objectives will provide information on the Cenozoic regional environmental development of the Southeast Asia margin. Site U1499 on Ridge A and Site U1500 on Ridge B were drilled during Expedition 367. Expedition 368 was planned to drill at two primary sites (U1501 and U1503) at the OMH and Ridge C, respectively, but based on drilling results from Expedition 367, Expedition 368 chose to insert an alternate site on Ridge A (Site U1502). In addition, Expedition 368 added two more sites on the OMH (Sites U1504 and U1505). Expedition 367/368 completed operations at six of the seven sites (U1499–U1502, U1504, and U1505). Site U1503, however, was not completed beyond casing without coring to 990 m because of mechanical problems with the drilling equipment that prevented the expedition, after 25 May 2017, from operating with a drill string longer than 3400 m. New alternate Site U1504, proposed during Expedition 367, met this condition. Original Site U1505 also met the operational constraints of the 3400 m drill string (total) and was an alternate site for the already-drilled Site U1501. At Site U1499, we cored to 1081.8 m in 22.1 days with 52% recovery and then logged downhole data from 655 to 1020 m. In 31 days at Site U1500, we penetrated to 1529 m, cored a total of 1012.8 m with 37% recovery, and collected log data from 842 to 1133 m. At Site U1501, we cored to 697.1 m in 9.4 days with 78.5% recovery. We also drilled ahead for 433.5 m in Hole U1501D and then logged downhole data from 78.3 to 399.3 m. In 19.3 days at Site U1502, we penetrated 1679.0 m in Holes U1502A (758 m) and U1502B (921 m), set 723.7 m of casing and cored a total of 576.3 m with 53.5% recovery, and collected downhole log data from 785.3 to 875.3 m and seismic data through the 10¾ inch casing. At Site U1503, we penetrated 995.1 m and set 991.5 m of 10¾ inch casing, but no cores were taken because of a mechanical problem with the drawworks. At Site U1504, we took 40 rotary core barrel (RCB) cores over two holes. The cored interval between both holes was 277.3 m with 26.8% recovery. An 88.2 m interval was drilled in Hole U1504B. At Site U1505, we cored 668.0 m with 101.1% recovery. Logging data was collected from 80.1 to 341.2 m. Operations at this site covered 6.1 days. Except for Sites U1503 and U1505, all sites were drilled to acoustic basement. A total of 6.65 days were lost due to mechanical breakdown or waiting on spare supplies for repair of drilling equipment, but drilling options were severely limited from 25 May to the end of the expedition by the defective drawworks limiting deployment of drill string longer than 3400 m. At Site U1499, coring ~200 m into the interpreted acoustic basement sampled sedimentary rocks, possibly including early Miocene chalks underlain by Oligocene polymict breccias and poorly cemented gravels of unknown age comprising sandstone pebbles and cobbles. Preliminary structural and lithologic analysis suggests that the gravels might be early to late synrift sediment. At Site U1500, the main seismic reflector corresponds to the top of a basalt sequence at ~1379.1 m. We cored 149.90 m into this volcanic package and recovered 114.92 m (77%) of sparsely to moderately plagioclase-phyric basalt comprising numerous lava flows, including pillow lavas with glass, chilled margins, altered veins, hyaloclastites, and minor sediment. Preliminary geochemical analyses indicate that the basalt is tholeiitic. Sampling of the Pleistocene to lower Miocene sedimentary section at Sites U1499 and U1500 was not continuous for two reasons. First, there was extremely poor recovery in substantial intervals interpreted to be poorly lithified sands, possibly turbidites. Second, we chose to drill down without coring in some sections at Site U1500 to ensure sufficient time to achieve this site’s high-priority deep drilling objectives. The upper Miocene basin sequence, which consists of interbedded claystone, siltstone, and sandstone can be correlated between the two sites by seismic stratigraphic mapping and biostratigraphy. At Site U1501 on the OMH, coring ~45 m into the acoustic basement sampled prerift(?) deposits comprising sandstone to conglomerate of presumed Mesozoic age. These deposits are overlain by siliciclastic synrift sediments of Eocene to Oligocene age followed by primarily carbonaceous postrift sediments of early Miocene to Pleistocene age. Site U1502 on Ridge A was cased to 723.7 m. No coring was attempted shallower than 380 m to save operational time and because of low expectations for core recovery in the upper Plio–Pleistocene sequence. At this site, we recovered 180 m of hydrothermally altered brecciated basalts comprising sheet and pillow lavas below deep-marine sediments of Oligocene to late Miocene age. At Site U1503 on Ridge C, 991.5 m of casing was installed in preparation for the planned deep drilling to ~1800 m. No coring was performed due to mechanical failures, and the site was abandoned without further activity except for installation of a reentry cone. Coring at Site U1504 on the OMH, located ~45 km east of Site U1501, recovered mostly foliated, greenschist facies metamorphic rocks below late Eocene(?) carbonate rocks (partly reef debris) and early Miocene to Pleistocene sediments. At Site U1505, we cored to 480.15 m through Pleistocene to late Oligocene mainly carbonaceous ooze followed at depth by early Oligocene siliciclastic sediments. Efforts were made at every drill site to correlate the core with the seismic data and seismic stratigraphic unconformities interpreted in the Eocene to Plio–Pleistocene sedimentary sequence prior to drilling. The predrilling interpretation of ages of these unconformities was in general confirmed by drilling results, although some nontrivial corrections can be expected from detailed postexpedition work on integrating seismic stratigraphic interpretations with detailed bio- and lithostratigraphy. As a result of the limited length of drill string that could be deployed during the later part of Expedition 368, the secondary expedition objectives addressing the environmental history of the SCS and Southeast Asia received more focus than originally planned, allowing Site U1505 (alternate to Site U1501) to be included. Despite this change in focus, Expedition 367/368 provided solid evidence for a process of breakup that included vigorous synrift magmatism as opposed to the often-favored interpretation of the SCS margin as a magma-starved margin or a margin possibly overprinted at a much later stage by plume-related magmatism. In this broader perspective, Expedition 367/368 accomplished a fundamental objective of the two-expedition science program. 
    more » « less
  5. We consider the following general network design problem. The input is an asymmetric metric (V, c), root [Formula: see text], monotone submodular function [Formula: see text], and budget B. The goal is to find an r-rooted arborescence T of cost at most B that maximizes f(T). Our main result is a simple quasi-polynomial time [Formula: see text]-approximation algorithm for this problem, in which [Formula: see text] is the number of vertices in an optimal solution. As a consequence, we obtain an [Formula: see text]-approximation algorithm for directed (polymatroid) Steiner tree in quasi-polynomial time. We also extend our main result to a setting with additional length bounds at vertices, which leads to improved [Formula: see text]-approximation algorithms for the single-source buy-at-bulk and priority Steiner tree problems. For the usual directed Steiner tree problem, our result matches the best previous approximation ratio but improves significantly on the running time. For polymatroid Steiner tree and single-source buy-at-bulk, our result improves prior approximation ratios by a logarithmic factor. For directed priority Steiner tree, our result seems to be the first nontrivial approximation ratio. Under certain complexity assumptions, our approximation ratios are the best possible (up to constant factors). 
    more » « less