skip to main content


Title: Fault-Tolerant Consensus with an Abstract MAC Layer
In this paper, we study fault-tolerant distributed consensus in wireless systems. In more detail, we produce two new randomized algorithms that solve this problem in the abstract MAC layer model, which captures the basic interface and communication guarantees provided by most wireless MAC layers. Our algorithms work for any number of failures, require no advance knowledge of the network participants or network size, and guarantee termination with high probability after a number of broadcasts that are polynomial in the network size. Our first algorithm satisfies the standard agreement property, while our second trades a faster termination guarantee in exchange for a looser agreement property in which most nodes agree on the same value. These are the first known fault-tolerant consensus algorithms for this model. In addition to our main upper bound results, we explore the gap between the abstract MAC layer and the standard asynchronous message passing model by proving fault-tolerant consensus is impossible in the latter in the absence of information regarding the network participants, even if we assume no faults, allow randomized solutions, and provide the algorithm a constant-factor approximation of the network size.  more » « less
Award ID(s):
1733842
NSF-PAR ID:
10067143
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the International Symposium on Distributed Computing (DISC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Solving quantum many-body problems, such as finding ground states of quantum systems, has far-reaching consequences for physics, materials science, and chemistry. Classical computers have facilitated many profound advances in science and technology, but they often struggle to solve such problems. Scalable, fault-tolerant quantum computers will be able to solve a broad array of quantum problems but are unlikely to be available for years to come. Meanwhile, how can we best exploit our powerful classical computers to advance our understanding of complex quantum systems? Recently, classical machine learning (ML) techniques have been adapted to investigate problems in quantum many-body physics. So far, these approaches are mostly heuristic, reflecting the general paucity of rigorous theory in ML. Although they have been shown to be effective in some intermediate-size experiments, these methods are generally not backed by convincing theoretical arguments to ensure good performance. RATIONALE A central question is whether classical ML algorithms can provably outperform non-ML algorithms in challenging quantum many-body problems. We provide a concrete answer by devising and analyzing classical ML algorithms for predicting the properties of ground states of quantum systems. We prove that these ML algorithms can efficiently and accurately predict ground-state properties of gapped local Hamiltonians, after learning from data obtained by measuring other ground states in the same quantum phase of matter. Furthermore, under a widely accepted complexity-theoretic conjecture, we prove that no efficient classical algorithm that does not learn from data can achieve the same prediction guarantee. By generalizing from experimental data, ML algorithms can solve quantum many-body problems that could not be solved efficiently without access to experimental data. RESULTS We consider a family of gapped local quantum Hamiltonians, where the Hamiltonian H ( x ) depends smoothly on m parameters (denoted by x ). The ML algorithm learns from a set of training data consisting of sampled values of x , each accompanied by a classical representation of the ground state of H ( x ). These training data could be obtained from either classical simulations or quantum experiments. During the prediction phase, the ML algorithm predicts a classical representation of ground states for Hamiltonians different from those in the training data; ground-state properties can then be estimated using the predicted classical representation. Specifically, our classical ML algorithm predicts expectation values of products of local observables in the ground state, with a small error when averaged over the value of x . The run time of the algorithm and the amount of training data required both scale polynomially in m and linearly in the size of the quantum system. Our proof of this result builds on recent developments in quantum information theory, computational learning theory, and condensed matter theory. Furthermore, under the widely accepted conjecture that nondeterministic polynomial-time (NP)–complete problems cannot be solved in randomized polynomial time, we prove that no polynomial-time classical algorithm that does not learn from data can match the prediction performance achieved by the ML algorithm. In a related contribution using similar proof techniques, we show that classical ML algorithms can efficiently learn how to classify quantum phases of matter. In this scenario, the training data consist of classical representations of quantum states, where each state carries a label indicating whether it belongs to phase A or phase B . The ML algorithm then predicts the phase label for quantum states that were not encountered during training. The classical ML algorithm not only classifies phases accurately, but also constructs an explicit classifying function. Numerical experiments verify that our proposed ML algorithms work well in a variety of scenarios, including Rydberg atom systems, two-dimensional random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases. CONCLUSION We have rigorously established that classical ML algorithms, informed by data collected in physical experiments, can effectively address some quantum many-body problems. These rigorous results boost our hopes that classical ML trained on experimental data can solve practical problems in chemistry and materials science that would be too hard to solve using classical processing alone. Our arguments build on the concept of a succinct classical representation of quantum states derived from randomized Pauli measurements. Although some quantum devices lack the local control needed to perform such measurements, we expect that other classical representations could be exploited by classical ML with similarly powerful results. How can we make use of accessible measurement data to predict properties reliably? Answering such questions will expand the reach of near-term quantum platforms. Classical algorithms for quantum many-body problems. Classical ML algorithms learn from training data, obtained from either classical simulations or quantum experiments. Then, the ML algorithm produces a classical representation for the ground state of a physical system that was not encountered during training. Classical algorithms that do not learn from data may require substantially longer computation time to achieve the same task. 
    more » « less
  2. Scheideler, Christian (Ed.)
    Nakamoto’s consensus protocol works in a permissionless model, where nodes can join and leave without notice. However, it guarantees agreement only probabilistically. Is this weaker guarantee a necessary concession to the severe demands of supporting a permissionless model? This paper shows that, at least in a benign failure model, it is not. It presents Sandglass, the first permissionless con- sensus algorithm that guarantees deterministic agreement and termination with probability 1 under general omission failures. Like Nakamoto, Sandglass adopts a hybrid synchronous communication model, where, at all times, a majority of nodes (though their number is unknown) are correct and synchronously connected, and allows nodes to join and leave at any time. 
    more » « less
  3. In this paper, we study the fundamental problem of gossip in the mobile telephone model: a recently introduced variation of the classical telephone model modified to better describe the local peer-to-peer communication services implemented in many popular smartphone operating systems. In more detail, the mobile telephone model differs from the classical telephone model in three ways: (1) each device can participate in at most one connection per round; (2) the network topology can undergo a parameterized rate of change; and (3) devices can advertise a parameterized number of bits about their state to their neighbors in each round before connection attempts are initiated. We begin by describing and analyzing new randomized gossip algorithms in this model under the harsh assumption of a network topology that can change completely in every round. We prove a significant time complexity gap between the case where nodes can advertise 0 bits to their neighbors in each round, and the case where nodes can advertise 1 bit. For the latter assumption, we present two solutions: the first depends on a shared randomness source, while the second eliminates this assumption using a pseudorandomness generator we prove to exist with a novel generalization of a classical result from the study of two-party communication complexity. We then turn our attention to the easier case where the topology graph is stable, and describe and analyze a new gossip algorithm that provides a substantial performance improvement for many parameters. We conclude by studying a relaxed version of gossip in which it is only necessary for nodes to each learn a specified fraction of the messages in the system. We prove that our existing algorithms for dynamic network topologies and a single advertising bit solve this relaxed version up to a polynomial factor faster (in network size) for many parameters. These are the first known gossip results for the mobile telephone model, and they significantly expand our understanding of how to communicate and coordinate in this increasingly relevant setting. 
    more » « less
  4. Virtual machine (VM) replication is an effective technique in cloud data centers to achieve fault-tolerance, load-balance, and quick-responsiveness to user requests. In this paper we study a new fault-tolerant VM placement problem referred to as FT-VMP. Given that different VM has different fault-tolerance requirement (i.e., different VM requires different number of replica copies) and compatibility requirement (i.e., some VMs and their replicas cannot be placed into some physical machines (PMs) due to software or platform incompatibility), FT-VMP studies how to place VM replica copies inside cloud data centers in order to minimize the number of PMs storing VM replicas, under the constraints that i) for fault-tolerant purpose, replica copies of the same VM cannot be placed inside the same PM and ii) each PM has a limited amount of storage capacity. We first prove that FT-VMP is NP-hard. We then design an integer linear programming (ILP)-based algorithm to solve it optimally. As ILP takes time to compute thus is not suitable for large scale cloud data centers, we design a suite of efficient and scalable heuristic fault-tolerant VM placement algorithms. We show that a) ILP-based algorithm outperforms the state-of-the-art VM replica placement in a wide range of network dynamics and b) that all our fault-tolerant VM placement algorithms are able to turn off significant number of PMs to save energy in cloud data centers. In particular, we show that our algorithms can consolidate (i.e., turn off) around 100 PMs in a small data center of 256 PMs and 700 PMs in a large data center of 1028PMs. 
    more » « less
  5. null (Ed.)
    Recent work has pinned down the existentially optimal size bounds for vertex fault-tolerant spanners: for any positive integer k, every n-node graph has a (2k – 1)-spanner on O(f^{1–1/k} n^{1+1/k}) edges resilient to f vertex faults, and there are examples of input graphs on which this bound cannot be improved. However, these proofs work by analyzing the output spanner of a certain exponential-time greedy algorithm. In this work, we give the first algorithm that produces vertex fault tolerant spanners of optimal size and which runs in polynomial time. Specifically, we give a randomized algorithm which takes Õ(f^{1–1/k} n^{2+1/k} + mf2) time. We also derandomize our algorithm to give a deterministic algorithm with similar bounds. This reflects an exponential improvement in runtime over [Bodwin-Patel PODC '19], the only previously known algorithm for constructing optimal vertex fault-tolerant spanners. 
    more » « less