<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Finding Invariants of Distributed Systems: It’s a Small (Enough) World After All</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>04/12/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10302529</idno>
					<idno type="doi"></idno>
					<title level='j'>Symposium on Networked Systems Design and Implementation - NSDI 2021</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Travis Hance</author><author>Marijn Heule</author><author>Ruben Martins</author><author>Bryan Parno</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Today's distributed systems are increasingly complex, leading to subtle bugs that are difficult to detect with standard testing methods. Formal verification can provably rule out such bugs, but historically it has been excessively labor intensive. For distributed systems, recent work shows that, given a correct inductive invariant, nearly all other proof work can be automated; however, the construction of such invariants is still a difficult manual task.In this paper, we demonstrate a new methodology for automating the construction of inductive invariants, given as input a (formal) description of the distributed system and a desired safety condition. Our system performs an exhaustive search within a given space of candidate invariants in order to find and verify inductive invariants which suffice to prove the safety condition. Central to our ability to search efficiently is our algorithm's ability to learn from counterexamples whenever a candidate fails to be invariant, allowing us to check the remaining candidates more efficiently. We hypothesize that many distributed systems, even complex ones, may have concise invariants that make this approach practical, and in support of this, we show that our system is able to identify and verify inductive invariants for the Paxos protocol, which proved too complex for previous work.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The world increasingly relies on distributed computer systems, but the correctness and reliability of these systems depend on the imperfect coverage of testing. To obtain strong guarantees, developers are starting to turn to formal verification techniques, both in research <ref type="bibr">[3,</ref><ref type="bibr">21,</ref><ref type="bibr">24,</ref><ref type="bibr">31,</ref><ref type="bibr">46,</ref><ref type="bibr">54]</ref> and industry <ref type="bibr">[38,</ref><ref type="bibr">58]</ref>. These techniques can, in theory, prove that all possible execution traces of the system conform to a high-level safety specification (e.g., all nodes agree on the next input value, or no two nodes hold a lock at the same time).</p><p>However, verifying the safety of distributed systems can be extremely labor-intensive, especially when using generalpurpose theorem provers <ref type="bibr">[21,</ref><ref type="bibr">45,</ref><ref type="bibr">54]</ref>. For example, Hawblitzel et al. <ref type="bibr">[21]</ref> report that the effort to build and verify two distributed systems (including their protocols and implementations) required 3.7 person-years, and their safety proofs (over 19K lines of code) account for &#8764;40% of the total codebase.</p><p>To reduce this cost, some work has developed specialized languages that either restrict the kinds of systems that can be encoded <ref type="bibr">[12,</ref><ref type="bibr">13,</ref><ref type="bibr">36,</ref><ref type="bibr">53</ref>] (e.g., the protocol must proceed in synchronous rounds), or restrict the language used to describe the systems' properties <ref type="bibr">[42]</ref>. In exchange for these restrictions, much of the safety proof can be dispatched automatically.</p><p>In all of these systems, the core of the safety proof requires identifying system invariants, and even with tools that can automate all other proof work, finding these invariants still relies on human labor and ingenuity. Anecdotally, this is a challenging task even for researchers <ref type="bibr">[34]</ref>, requiring days for toy systems and months for complex protocols like Paxos <ref type="bibr">[29]</ref>.</p><p>Ideally, we would automate invariant discovery, but theoretical results show that even in languages where checking an invariant is decidable <ref type="bibr">[42]</ref>, finding such an invariant is not decidable <ref type="bibr">[40]</ref>, meaning that no algorithm can guarantee that it will find an invariant for arbitrary protocols, even if they are indeed safe. Hence, we must turn to domain-specific insights to develop a methodology which can apply to the specific kinds of distributed systems developed in practice.</p><p>Recent approaches have been proposed to automatically infer invariants for distributed systems <ref type="bibr">[27,</ref><ref type="bibr">34]</ref>. For instance, the I4 <ref type="bibr">[34]</ref> system observes that the invariants for some distributed systems are scale-invariant. Hence they ask the developer to specify appropriate finite-model parameters and to concretize some of the protocol variables. I4 then invokes a specialized model checker <ref type="bibr">[19]</ref> that can deduce invariants on finite, fixedsize instances. It then employs various heuristics to generalize those invariants to arbitrary instances. Unfortunately, I4 cannot discover invariants that use existential quantifiers. Furthermore, since the model checker is a black box, the developer has no recourse when it fails. More recently, Koenig et al. <ref type="bibr">[27]</ref> extend the IC3/PDR algorithm <ref type="bibr">[6,</ref><ref type="bibr">14]</ref> to infer invariants with existential quantifiers. However, neither they nor I4 can handle complex protocols like Paxos <ref type="bibr">[29]</ref>, for which the simplest known safety proofs require many invariants, including some with existential quantifiers.</p><p>To tackle complex protocols like Paxos, we explore an alternate approach based on our "small world" hypothesis: in many practical systems, the invariants should be relatively concise. After all, these protocols are designed by humans who have some (finite) intuition for why the protocol is correct. They are clearly not beyond human comprehension. If accurate, this hypothesis suggests we are faced with a finite space of possible invariants, which we can potentially search exhaustively.</p><p>Of course, finite is not the same as small. For a protocol like Paxos, there could be over 100 billion candidate invariants with just six terms. Even if it only took a millisecond to test each invariant, a brute force search of the entire space would require over three computation-years.</p><p>Hence, this paper presents the Small World Invariant Search System (SWISS), our system for automatically proving the safety of distributed systems by efficiently searching through the space of succinct invariants. At its core, SWISS learns from counterexample models obtained from failed invariant candidates to speed up future invariant checks. SWISS uses an SMT solver to perform these checks. SWISS also exploits various symmetries and parallelism to reduce search time. Further, for a class of invariants (described formally in &#167;3.4) SWISS is guaranteed to find an invariant when one exists. SWISS also supports user-supplied guidance to make the search more efficient. Moreover, unlike prior approaches, even when SWISS does not produce a safety proof within a given time limit, the user still benefits from partial invariants that SWISS did find, as they can use those invariants as a starting point for finding complete invariants. The net effect is that SWISS is the first approach that can automatically prove the safety of Paxos, and it does so in around 4 hours on an 8-core machine. If the user provides some light guidance ( &#167;5. <ref type="bibr">3.3)</ref>, then the time decreases to 20 minutes.</p><p>We compare SWISS to I4 <ref type="bibr">[34]</ref> and Koenig et al. <ref type="bibr">[27]</ref> on a large variety of protocols ( &#167;5). None of the approaches fully dominates the others, neither in protocols solved nor in runtime. For instance, for protocols that only require invariants without existential quantifiers, I4 is generally fastest. However, for some complex distributed protocols with many invariants that contain existential quantifiers, SWISS outperforms the other tools, and in particular, it is the first to automatically prove the safety of Paxos and two variants of Paxos: Multi-Paxos and Flexible-Paxos <ref type="bibr">[22]</ref>. There are still some variants of Paxos that SWISS cannot fully prove in a reasonable amount of time. However, even in these cases, SWISS still finds many partial invariants that may help the user complete the proof.</p><p>In summary, our key contributions are as follows.</p><p>&#8226; We propose and evaluate the Small World Hypothesis: that many distributed systems we care for can be proven safe using a sequence of concisely specified invariants. &#8226; We present SWISS, a methodology for automatically proving the safety of distributed systems by efficiently searching through the space of candidate invariants, guided by knowledge learned from counterexamples encountered during the search. We report both our successful and unsuccessful optimizations. &#8226; To our knowledge, SWISS is the first approach to identify and verify inductive invariants that prove the safety of Paxos in an automated fashion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background</head><p>We briefly present background on formalizing and verifying distributed systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Proving Safety Conditions Via Inductive Invariants</head><p>We aim to prove safety conditions. A safety condition is a desired property that ought to hold true at any point in a system's execution. For an exclusion lock, for example, the safety condition might be that no two agents hold a lock at the same time. For a consensus protocol such as Paxos, the condition would be that no two machines decide on different results. These safety conditions are in contrast with liveness conditions, which say that the system eventually performs a useful action. We do not currently consider liveness conditions.</p><p>To formalize a distributed system, one must formally describe the internal state of the participating nodes, the state of the network (packets in-flight between nodes), the initial conditions of the system, and the ways the system can evolve. Following standard practice, we formalize the latter as a sequence of atomic actions that update the system state <ref type="bibr">[28]</ref>.</p><p>Given this formal description, our goal then becomes to prove that the safety condition will hold for any reachable state in any possible execution of the system. The typical strategy for such a proof is to find an inductive invariant. An inductive invariant should (i) hold on all possible initial states of the system and (ii) continue to hold when the system transitions to a new state from a state where the invariant held.</p><p>If we could prove that the safety condition was inductive, we would be done; unfortunately, this is rarely the case for nontrivial systems. Instead, we typically must find an invariant which is stronger than the safety condition and then show this invariant is both inductive and implies the safety condition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">A Running Example: Simple Decentralized Lock</head><p>To make the verification process more concrete, we present a toy example: a Simple Decentralized Lock (SDL) protocol. SDL supports a single mutual-exclusion lock that is shared among multiple nodes. A node with the lock can send the lock to another machine via a message over the network, which, in this simple example only, does not permit packet duplication.</p><p>Figure <ref type="figure">1</ref> formally describes the SDL protocol. The state at a snapshot in time is represented by two relations: message and lock. The value message(src, dst) indicates a message in the network from machine src to machine dst, while lock(node) indicates that node believes it holds the lock.</p><p>In the protocol's initial state (the init lines), the network is empty, and only one node (start node) holds the lock.</p><p>The transitions are written in RML <ref type="bibr">[42]</ref>, an abstract language for describing system transitions. SDL has two transition actions: send and recv. In send, a node with the lock can atomically release the lock and send a packet to another machine. In recv, a node can receive a packet (removing it from the network) and accept the lock.</p><p>The safety condition (Figure <ref type="figure">2</ref>) we wish to verify for SDL is </p><p>Figure <ref type="figure">2</ref>: Safety Condition and Invariants for the SDL Figure <ref type="figure">3</ref>: Two models which demonstrate that the safety condition (Figure <ref type="figure">2</ref>) for SDL is not inductive on its own. M and M each have a domain of two nodes, n 1 and n 2 . In M, n 2 holds the lock, while a packet is in-flight from n 2 to n 1 . M can transition to M via the recv action. M does not violate the safety condition, but M does (n 1 and n 2 both hold the lock). that no two nodes ever hold the lock at the same time. Figure <ref type="figure">3</ref> shows that this condition is not inductive. There is a state, M, which satisfies the safety condition (only one machine holds the lock) which can transition to a state, M (via a recv on y) which does not satisfy the safety condition (since two machines hold the lock). Figure <ref type="figure">2</ref> shows a stronger predicate which also rules out the first state. This predicate is both inductive and (trivially) implies the safety condition; thus, it completes our safety proof. It states that (i) no two machines hold the lock, (ii) no machine holds the lock while a message exists, and (iii) no two messages exist at the same time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Formal Notation</head><p>To keep our subsequent discussion precise we introduce some additional notation. Formally, a transition system is a triple T = (&#931;, INIT, T R), where &#931; defines the types and relations representing the state of a system, and INIT and T R are predicates describing the initial state of the system and allowed transitions of the system, respectively. In our SDL example, &#931; contains the type node and the relations lock and message.</p><p>Since T R relates two states, we will use primes to indicate the new state, e.g., T R := (x = x + 1) represents a transition that increments x by one.</p><p>A model M represents a single possible state of the system. It contains an assignment for each variable and each possible evaluation of the system's relations. For a predicate P, we write M |= P if the predicate P evaluates to true on model M. When P is a predicate over two states, we write (M, M ) |= P, e.g., we write (M, M ) |= T R if M can transition to M .</p><p>We say that M is reachable if there exists a sequence M 0 , . . . , M k where M 0 |= INIT , M k = M, and (M i , M i+1 ) |= T R for all i. A formula S is said to be safe if for all reachable M, we have M |= S.</p><p>We say that I (a formula over &#931;) is inductive if we can prove that INIT =&#8658; I and T R &#8743; I =&#8658; I (i.e., if I is true and the system can take a transition to a new state, then I holds there as well). Here, we use I to denote the predicate I evaluated on the second state. It is clear that any inductive invariant I will be safe. Therefore, proving that S is safe amounts to finding an inductive invariant I such that I =&#8658; S; equivalently, to find a formula I such that I &#8743; S is an inductive invariant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Decidability of Inductiveness</head><p>Given a candidate invariant I, we must prove that (i) INIT =&#8658; I, (ii) T R &#8743; I =&#8658; I , and (iii) I =&#8658; S. We call these verification conditions. To check that a verification condition P holds for all possible models, we can show that &#172;P is unsatisfiable; i.e., there is no model M such that M |= &#172;P.</p><p>Checking the validity of arbitrary first-order logic formulas is undecidable <ref type="bibr">[52]</ref>, but prior work <ref type="bibr">[41,</ref><ref type="bibr">42]</ref> shows that many distributed systems, including multiple variants of Paxos, can be encoded in RML <ref type="bibr">[42]</ref>, which can be translated to a restricted class of formulas where satisfiability is decidable. In particular, the class of effectively propositional (EPR) formulas, also known as the Bernays-Sch&#246;nfinkel class, are the class of formulas that can be written in a form with quantifier prefix &#8707; * &#8704; * and no function symbols. Satisfiability for this class of formulas is decidable <ref type="bibr">[33,</ref><ref type="bibr">43]</ref>.</p><p>By ensuring our verification conditions lie within this class, we can always either verify the predicate INIT =&#8658; I or find a satisfying instance for INIT &#8743; &#172;I. Likewise, we can either verify the predicate T R &#8743; I =&#8658; I or find a pair (M, M ) where (M, M ) |= T R &#8743; I &#8743; &#172;I (e.g., the pair in Figure <ref type="figure">3</ref>). In either Figure <ref type="figure">4</ref>: SWISS Overview. The intended workflow for using SWISS to synthesize an invariant and safety proof. A rectangle indicates a machine-readable, human-supplied input. An oval represents a collection of first-order predicates. case, we obtain a concrete counterexample.</p><p>EPR, as stated, is a bit too restrictive; it allows only relations (not general functions), and it does not allow for invariants with quantifier alternation. However, EPR can be extended to include stratified function symbols-functions for which the edges from input types to output types form no cycles-while maintaining its decidability. In the same way, we can allow stratified alternation of universal and existential quantifiers in a fragment called the extended EPR fragment <ref type="bibr">[41]</ref>. To keep our verification conditions within this class, we must impose some restrictions on the quantifier alternations which appear in I; these restrictions are determined by the shape of the protocol under consideration, in particular, the quantifier alternations which appear in INIT and T R.</p><p>3 Overview: The SWISS Algorithm SWISS is an algorithm for inferring inductive invariants of a protocol in order to prove a desired safety condition. The intended usage of SWISS is shown in Figure <ref type="figure">4</ref>. First, the user provides an RML-encoding <ref type="bibr">[42]</ref> of the protocol, a desired safety condition, and a specification of the search space ( &#167;4.1.1). SWISS either succeeds with an invariant that proves the safety condition, or it fails, with only partial invariants generated. In that case, the user may choose how to continue: they might try SWISS again with a different search configuration, or they might continue with other means, e.g., the interactive invariant-finding tool IVy <ref type="bibr">[42]</ref>, using the partial invariants as a starting point.</p><p>For example, suppose the user is interested in the SDL protocol ( &#167;2) and wants to prove the lock-exclusitivity safety condition. They would first encode the SDL protocol into a machine-readable RML specification (Figure <ref type="figure">1</ref>). Protocols are often concise, although in some cases it is challenging to produce a specification where the inductive invariant will be in EPR <ref type="bibr">[41]</ref>. However, we consider this out of scope for SWISS.</p><p>Next, the user would write the exclusitivity condition as a predicate (Figure <ref type="figure">2</ref>). They would also choose the space of predicates to search over ( &#167;4.1.1). If the user knows nothing about the protocol, they would likely choose the most general option, to generate templates automatically. If SWISS succeeds, then they will know that the lock-exclusitivity safety condition is true, and they can use the invariants from SWISS's output to validate that SWISS ran correctly.</p><p>If SWISS does not succeed, there are a few possible paths. For example, it might be that SWISS does not complete in a reasonable amount of time, in which case the user might choose to restrict the search space. For example, they might have some idea of what the invariant should look like because they have worked with similar locking protocols previously. Alternatively, if SWISS completed quickly but did not succeed, the user might choose a broader search space.</p><p>Finally, they might choose to take the incomplete invariants generated by SWISS and attempt to complete them through other means. Even though SWISS did not succeed, the user could learn useful information about the protocol from these incomplete invariants.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">High-Level Algorithm</head><p>We begin with a high-level overview of our algorithm for finding the inductive invariants needed to prove safety conditions. Section 4 then explains how we make the algorithm scale to large search spaces. SWISS takes as input (i) a transition system T encoded via RML ( &#167;2.2) (ii) a safety condition S, and (iii) a configuration of the search space ( &#167;4.1.1). In this section, we refer to search spaces with the symbols B and F , which here may be viewed as arbitrary sets of first-order predicates.</p><p>SWISS is designed to exploit our hypothesis that a distributed system designed by humans will have a concise invariant, or a larger invariant composed of concise invariants. After all, the designer presumably has a finite intuition for the correctness of their system, either as a whole or as the conjunction of correct subsystems or subproperties.</p><p>Internally, SWISS uses different algorithms to target these two possible invariant styles. One algorithm, Finisher ( &#167;3.2), tries to directly find one inductive invariant that proves the safety condition. Hence any invariant it finds will necessarily complete the safety proof. Using the safety condition as a target helps Finisher search the invariant space efficiently. However, for complex protocols, searching for the entire system invariant in one shot is infeasible; e.g., a human-derived invariant for Paxos has 10 conjuncts with 34 terms, corresponding to a search space of over 10 75 candidate invariants.</p><p>Hence, SWISS employs a second algorithm, Breadth ( &#167;3.3), that greedily searches for as many protocol invariants as possible within a finite space B, without requiring that they directly prove the safety condition. We run Breadth multiple times so that invariants may build on each other: once Breadth finds an invariant P, the next run can then find an invariant Q that is inductive relative to P, even when Q might not be inductive on its own. More formally, we say that Q is relatively inductive with respect to return Finisher(T , invariants, S, F ) &#8746; invariants</p><p>The Breadth loop ultimately builds an invariant of the form</p><p>In practice ( &#167;5), without the safety condition guiding it, Breadth is slower than Finisher, but it can incrementally construct a larger invariant than Finisher can.</p><p>To benefit from the strengths of both Breadth and Finisher, SWISS's top-level Solve (Algorithm 1) combines them. It takes as input a transition system, T , and a desired safety condition S. It also takes in two spaces of candidate invariants, B and F , for Breadth and Finisher, respectively, to search. In our implementation, these spaces are defined through a combination of the protocol description and (potentially) user input ( &#167;4.1). Solve runs Breadth over B until no new invariants are produced, and then it runs Finisher, if necessary, to find an additional invariant needed to complete the safety proof. Section 3.4 summarizes SWISS's coverage guarantee.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">The Finisher Algorithm</head><p>Finisher aims to find a single invariant P which proves a safety condition S. More formally, it tries to solve: Task 1 (Conjecture-proving task.) Given a transition system T , formulas I 1 , ..., I n , already established (or assumed) to be invariant, and a conjectured safety condition S, find an invariant predicate P such that P &#8743; S is inductive relative to</p><p>Evaluating a candidate invariant P requires checking the </p><p>Since the INIT =&#8658; S condition does not depend on P, we can check it once in advance of evaluating any candidat invariant. Since we consider protocols expressed in RML ( &#167;2.4), checking the validity of these VCs is decidable, and in practice, typically quite efficient with modern SMT solvers. Hence, for a candidate invariant P, we can run a subroutine CheckVCsF to determine either that the VCs above hold, or that a finite counterexample shows they do not. As Algorithm 2 shows, rather than simply check the VCs above for each candidate predicate P in F , Finisher accumulates a collection of counterexamples from failed candidates. As we describe in &#167;4.2, we use these counterexamples to filter subsequent candidates, as our counterexample check is orders of magnitude faster than the VC check.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">The Breadth Algorithm</head><p>In the Breadth algorithm, our goal is simply to find as many invariants as possible. More formally, we wish to solve the following task. Task 2 (Invariant-finding task.) Given a transition system T and formulas I 1 , ..., I n , already established (or assumed) to be invariant, find any invariant predicate P which is inductive relative to</p><p>The corresponding VCs are as follows.</p><p>Naively, we could start with an algorithm similar to Finisher, but use the VCs above instead of Finisher's. However, this would result in a highly inefficient search since as stated, the invariant-finding task permits many tautological solutions, such as true or I i . More generally, if P is any predicate such that I 1 &#8743; &#8226; &#8226; &#8226; &#8743; I n =&#8658; P, then P will be invariant, but we do not actually learn anything about the protocol by finding P: P does not rule out any states which were not ruled out by the I i . We call such a P a redundant invariant with respect to I 1 &#8743; &#8226; &#8226; &#8226; &#8743; I n . For example, &#8704;x. f (x) &#8744; g(x) is redundant with respect to &#8704;x. f (x). As the number of terms in our search space increases, the number of redundant invariants increases exponentially. Furthermore, since any redundant invariant is, in fact, inductive, it will always pass our counterexample filters, guaranteeing an expensive VC check for each.</p><p>We devised two ways to cope with redundant invariants. First, we maintain a set indInv of non-redundant invariants, and whenever we find a new invariant, we explicitly check whether it is redundant with indInv. Second, we also track allInv, i.e., all invariants we find-including the redundant ones-and use these to syntactically filter future candidates by performing quick checks for logical implication using a subroutine FastImplies(I, P) described in &#167;4.4. These checks are vastly cheaper than SMT calls.</p><p>As described thus far, Breadth works without any knowledge of the safety condition S that we aim to prove about our system. However, we observe that we will eventually need S to be an invariant of the system, so we strengthen the second VC above to assume S is true in the initial state.</p><p>Algorithm 3 brings all of this together. This algorithm finds exactly the invariants from the space B which are invariant with respect to the original input invariants. More precisely, if P &#8712; B is invariant with respect to the inputs, then Breadth will output a set of predicates whose conjunction implies P.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">SWISS Coverage</head><p>SWISS is guaranteed to prove a safety condition S as long as the system invariant conforms to the following form.</p><p>Claim 1 Solve(T , S, B,F ) will always succeed at proving the conjectured safety condition S provided there exist invariants I 1 , &#8226; &#8226; &#8226; , I n such that:</p><p>The claim follows from inspection of Algorithms 1-3 and the fact that our counterexample and implication filters will never eliminate a valid invariant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Making Invariant Exploration Scale</head><p>While the algorithms described in &#167;3 would theoretically suffice to find invariants, implemented naively they are impracti-cally slow. Hence, we present crucial steps we take to reduce the search space ( &#167;4.1- &#167;4.4) and optimize the overall process ( &#167;4.5). For the sake of completeness, we also present three optimizations that our evaluation has shown do not improve performance in this domain (Appendix A).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Exploiting User Guidance &amp; Candidate Symmetries</head><p>The Finisher and Breadth algorithms each take as input a space of candidate invariants to explore ( &#167;4.1.1). These spaces often contain many invariants that are identical modulo symmetries, so symmetry pruning is critical ( &#167;4.1.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1">Defining Candidate Spaces</head><p>SWISS searches for invariants based on invariant templates. Intuitively, a template defines the rough shape of a class of invariants (e.g., the number and types of the quantified variables, and a bound on the formula's size).</p><p>By default, SWISS automatically defines invariant templates based on the protocol description and the safety condition, as well as user-supplied upper bounds on the formula size. SWISS then enumerates all template spaces within these constraints, and the search space B or F is defined as the union of these spaces. Finisher prioritizes these templates in order of increasing size. Thus, if a small invariant exists, Finisher will find it without having to search the entire space.</p><p>For many protocols, this fully automatic approach suffices to produce a safety proof. For protocols where this automated search runs slower than desired, the user can specify a particular template T , rather than have SWISS enumerate all templates. As &#167;5.3.3 demonstrates, such user guidance can dramatically speed up the search process, even if the user guesses a few incorrect templates before the right one.</p><p>More formally, every candidate invariant P that we consider is a sequence of quantifiers followed by a quantifier-free expression E. The expression E is a tree of conjunctions and disjunctions of terms C, expressed over values V , which is either a name and a type (e.g., x : t), or a function application.</p><p>V</p><p>The template T is simply a formula P with a wildcard for the quantifier-free expression E. We define TemplateSpace(T, k, d) to be the set of formulas that match T when the wildcard is instantiated with any quantifier-free expression E that has at most k terms and a conjunctiondisjunction tree of depth at most d. For instance, an expression of the form</p><p>Hence, a valid candidate space might be defined by, T = &#8704;r : round, v : value. &#8707;q : quorum. &#8704;n 1 , n 2 : node. * with k = 3, and d = 1, which would contain all invariants with the quantifiers in T and disjunctions of up to 3 terms. SWISS expects the user to specify the maximum values of k, d, m (the maximum total number of quantified variables) and q (the maximum number of existentially quantified variables). The user must also specify a quantifier nesting order for the types defined by the protocols: a fixed nesting order is required so that the resulting verification conditions remain in EPR.</p><p>To enumerate the formulas in TemplateSpace(T, k, d), we first enumerate a set of possible terms, C , and then arrange them in every possible tree shape. While succinct to describe, the size of the space grows exponentially, so we take steps to prune the space as rapidly as possible.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2">Symmetry-Breaking</head><p>As defined above, a candidate space contains many logically equivalent candidates, such as:</p><p>all identical up to term reordering and variable renaming. When enumerating candidate formulas, SWISS aims to only consider one representative from each class of identical candidates. However, checking for such logical equivalence via SMT calls would be prohibitively expensive. Hence, we use a sound set of syntactic constraints to break these symmetries far more efficiently.</p><p>Specifically, SWISS assigns an arbitrary ordering to each possible base term (e.g., leq) and will only produce formulas where, within any conjunction or disjunction, the terms are in increasing order. SWISS also produces formulas such that for any set of quantified variables which are interchangeable (e.g., r 1 and r 2 in the example above), their first appearances are in increasing order of quantifier nesting index.</p><p>These two steps efficiently break symmetries arising from term ordering and variable permutation. In practice, they reduce the size of the search space by over two orders of magnitude ( &#167;5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Filtering Based on Counterexamples</head><p>As explained in &#167;2.4, when a candidate invariant P fails a verification condition, we can extract a counterexample, specifically a concrete model where the condition failed. For example, if INIT =&#8658; P fails to hold, then we can extract a concrete model M such that M |= INIT &#8743; &#172;P; i.e., the initial conditions hold for M but P does not. Since the initial conditions hold on M, if any other formula P fails to hold for M, then it cannot be an invariant either. SWISS exploits this observation by remembering the models from failed VC checks and using those models to quickly rule out future candidates.</p><p>More technically, we define a counterexample filter (cex) as a model or a pair of models which demonstrate that a given candidate P fails to be an inductive invariant. We consider three types of counterexample filters: We construct counterexample filters based on the model(s) returned when a candidate P fails a VC check. The specific type of filter constructed depends on which type of check fails. Our construction guarantees that (i) P does not pass the filter cex, and (ii) any formula P which does pass the same verification condition will pass the filter cex. The three possible constructions are as follows.</p><p>&#8226; A failed verification condition of the form A =&#8658; P yields a model M such that M |= A &#8743; &#172;P, so we produce a counterexample filter True(M). In other words, P does not evaluate to true on M; but any valid invariant should. &#8226; A failed verification condition of the form A &#8743; P =&#8658; B yields models M and M such that (M, M ) |= A &#8743; P &#8743; &#172;B , which gives a counterexample filter False(M) (since the model represents a state M that P failed to reject and that all valid invariants ought to reject). &#8226; A failed verification condition of the form A &#8743; P =&#8658; P yields models M and M such that (M, M ) |= A &#8743; P &#8743; &#172;P , which gives a counterexample filter Transition(M, M ). Essentially, these counterexample filters allow us to learn from each type of failed induction check. Efficient Implementation. Counterexample filtering is only useful if we can apply our filters faster than executing the original VC checks. Hence, whenever we add a new model M to our set of counterexample filters, we precompute the evaluation of each term c &#8712; C on M. C is the set of base terms ( &#167;4.1.1) produced for every possible instantiation of the quantifier variables in the template T within the model M. The evaluations are saved in a bitstring, so when we encounter a new candidate Q, the evaluation of Q on M is computed quickly with bitwise operations. As a result, counterexample filters are orders of magnitude faster than VC checks ( &#167;5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Checking Verification Conditions</head><p>We encode our verification conditions into SMT formulas in a manner similar to prior work <ref type="bibr">[42]</ref>. The encoding ensures that it is decidable to evaluate the validity of the formulas. Our encoding conforms to the standardized smtlib2 format, although for better performance, our implementation includes the SMT solver as a library so that it can directly invoke its API rather than communicating via files.</p><p>In practice, we find that thanks to the community's large effort invested in optimizing SMT solvers, our validity queries are not just decidable, but rapidly decidable, typically in tens of milliseconds. However, our initial experiments found occasional outliers that consumed minutes, or more. Since a SWISS run may perform thousands of SMT calls, these outliers become an issue. After some iteration, we settled on a routine where we retry a problem instance with a different SMT solver if our first attempt exceeds a given time limit ( &#167;5.1). If the SMT instance continues to time out, we skip the invariant in question. While not a perfect solution, we found that this routine satisfactorily avoids stalling SWISS's execution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Filtering Redundant Invariants</head><p>Since the Breadth algorithm searches rather indiscriminately for any possible invariant, it can easily waste time on redundant invariants, i.e., invariants already implied by previously established invariants.</p><p>To efficiently prune many such redundant invariants, SWISS includes a rapid FastImplies filter to see if an existing invariant Q already implies a candidate P. For correctness, FastImplies(Q, P) can return true only when Q =&#8658; P. Our implementation of FastImplies(Q, P) always detects the case where Q and P can be written as,</p><p>where a 1 , . . . , a j is a subsequence of b 1 , . . . , b k up to variable renaming. This condition certainly ensures that Q =&#8658; P, meaning P is redundant if Q is already known to be invariant. Our implementation also has limited support for substituting universally-quantified variables for existential ones. Efficient Implementation. To implement this filter efficiently, we store a set of sequences representing known invariants and query if any sequence in this set is a subsequence of a candidate sequence. We use a trie <ref type="bibr">[10]</ref> for efficient queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Additional Optimizations</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.1">Minimizing Models</head><p>When extracting a counterexample from a failed VC check, we can either take the first model produced by the SMT solver, whatever it may be, or we can try to make the model as small as possible. Smaller models are faster to evaluate with our filters, but come at the cost of additional SMT queries.</p><p>In our minimal-models optimization, we always attempt to find a minimal model that satisfies a given SMT query, i.e., a model where there is no other satisfying model with a smaller domain. To bound the cost of the SMT queries made to check minimality, we apply an aggressive time limit to each query. This is an essential optimization for complex protocols; without it, SWISS takes significantly longer ( &#167;5.3.5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.2">Parallelism</head><p>Since our algorithm is based on exhaustive search, we expect it to be parallelizable. To parallelize across n threads we do the following: for each run of either the Breadth or Finisher algorithm, we split the space (B or F ) into n parts, each randomly permuted. The random permutation attempts to ensure that each thread has a roughly equal amount of work. We then run our algorithm on each partition independently and combine the results. In the future, we plan to explore a more complex approach in which the threads communicate at a finer granularity, e.g., sharing counterexample filters and invariants as they are discovered.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6">Failed Optimizations</head><p>In the interest of full disclosure, and to save others unnecessary work, in Appendix A we briefly summarize three optimizations we implemented and evaluated, only to find that they provide little benefit or actively hurt performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Evaluation</head><p>Our evaluation seeks to answer two key questions:</p><p>&#8226; How does SWISS compare to prior work ( &#167;5.2)?</p><p>&#8226; How effective are SWISS's various optimizations ( &#167;5.3)?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Experimental Setup and Implementation Details</head><p>Benchmark Suite. To evaluate SWISS, we apply it to a test suite of well-established protocols from three sources: (i) the I4 benchmarks <ref type="bibr">[34]</ref> ) for Finisher, However, these choices were not ideal for all benchmarks. In particular, if we found that a benchmark completed in under six hours without proving the safety condition, we enlarged the F space by increasing k, and then increasing q. On the other hand, if a benchmark timed out after six hours without completing the breadth phase, we made the B space smaller by decreasing its parameters. See &#167;5.2.1. Implementation. We implement SWISS in approximately 14,000 lines of C++ code. To check the verification conditions, we use the Z3 SMT solver <ref type="bibr">[11]</ref> version 4.8.8 with default settings. If Z3 times out after 45 seconds, then we use the CVC4 <ref type="bibr">[2]</ref> SMT solver version 1.8-prerelease with the --finite-model-find option, an option tuned for finding finite models <ref type="bibr">[44]</ref>. We found that for some satisfiable instances, Z3 would often take a very long time to return a model, whereas CVC4, with this specific option, was much more efficient. When applying model minimization, we set a time limit of 45 seconds; if the time limit is reached, we use the current partially minimized model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Top-Level Protocol Results</head><p>Table <ref type="table">1</ref> summarizes the results of running SWISS on our benchmark suite. It also compares SWISS's performance with that of the two most closely related systems (see &#167;6 for details), namely the I4 <ref type="bibr">[34]</ref> and FOL <ref type="bibr">[27]</ref> tools. Below, we first discuss the comparative results, followed by SWISS-specific analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Comparative Results</head><p>We analyze our protocol results by bucketing them into coarsegrained categories. Note that I4 can only generate invariants containing universal quantifiers, and hence cannot succeed for benchmarks that require existentially quantified invariants. Paxos Variants. Unlike prior work, SWISS automatically finds all of the invariants for Paxos and Flexible Paxos, which were previously painstakingly constructed by hand <ref type="bibr">[41]</ref>. Furthermore, it succeeds at Multi-Paxos if the user provides the correct templates; however, when we attempted to use the automatically-enumerated templates, we found that Breadth does not complete in time. All three protocols require Finisher to find an invariant with six terms and an existential. Unfortunately, neither I4, FOL, nor SWISS can prove the safety of Fast Paxos <ref type="bibr">[30]</ref>, Stoppable Paxos <ref type="bibr">[35]</ref>, or Vertical Paxos <ref type="bibr">[32]</ref>. Among the known, handwritten invariants, Fast Paxos and Vertical Paxos each have two 8-term invariants; Stoppable Paxos has three 6-term invariants. SWISS cannot currently synthesize invariants this large-indeed, as it stands, it would need several orders of magnitude more compute time to handle even a single 8-term invariant-and we did not observe it finding equivalent, smaller invariants for these protocols. However, Table <ref type="table">1</ref> shows that SWISS is at least able to synthesize partial invariants equivalent to many of the handwritten ones (discussed in more detail below - &#167;5.2.2).</p><p>Finally, we note one interesting occurrence during our preliminary testing: we initially found that SWISS succeeded on Fast Paxos far too quickly and with invariants that looked at a glance to be far too strong. This allowed us to identify a typo in the spec which caused an action to never be enabled.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mutual-Exclusion Protocols.</head><p>Protocols such as sdl, distributed-lock, lock-server-sync, lock-server-async, and sharded-kv have safety conditions asserting mutual-exclusion properties, such as those of locks. The invariants for these protocols tend to be large conjuncts of smaller, mutually inductive invariants. SWISS struggles with these, since it is forced to discover the entire collection at once, as none are inductive individually. For some, Finisher succeeds by inferring all of these mutually inductive invariants as a single large invariant. For sharded-kv and lock-server-async, we needed to increase the maximum value of k from our default configuration. For distributed-lock, the invariant was simply too large for SWISS to find. FOL was also unable to solve this benchmark, although I4 solves it quite handily, as it was able to infer the mutually inductive invariants on a small finite instance of the problem. Finally, for sharded-kv-no-lost-keys we needed to use q = 2 to allow more existentials. Learning Switch. We have two different benchmarks called "learning-switch," one from I4 and one from FOL. We found that unlike I4 and FOL, SWISS succeeds on learning-switchquad quite handily. In learning-switch-ternary, we found that Breadth takes far too long in the default configuration, due in part to a large number of redundant invariants not filtered by our FastImplies test and thus requiring SMT calls to identify as redundant. We reduced the size of B by configuring q = 0 (i.e., searching for universal invariants only) for this one benchmark, allowing SWISS to also be able to complete it efficiently.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">SWISS-Specific Analysis</head><p>Invariants. In some cases, SWISS uncovered invariants that were simpler than the ones written by the original researchers. For instance, for the ring-election protocol, SWISS's Breadth algorithm identifies three 3-term invariants (9 terms total) similar to the ones from prior work <ref type="bibr">[42]</ref>; however, if we run Finisher on its own rather than the full SWISS algorithm, then it instead generates a single 5-term inductive invariant. For the Paxos protocol, we initially expected that the Breadth algorithm would need to run with k = 5; however, we found that SWISS succeeded even with Breadth at k = 3. This shows that mechanized search can find simpler invariants that may be missed by trained researchers. Partial Progress. When an execution of SWISS exceeds its time limit, it still generates many invariants which may be useful. To measure how successful this "partial progress" is, we computed how many of the handwritten invariants SWISS finds. More precisely, we count how many of these invariants are logical implications of SWISS's invariants. These results are shown in Table <ref type="table">1</ref>. Notice that in many cases, SWISS finds a majority of the invariants; e.g., for our largest benchmark, Vertical Paxos, SWISS finds 79%. Furthermore, for the chain benchmark, which has two safety conditions (one for linearizability and one for atomicity), SWISS solves the latter.</p><p>We do not report any partially solved invariants for the other tools in our comparison. For one, the IC3 approach does not immediately allow extraction of any inductive invariants; a partially-completed IC3 execution would provide a different kind of information: constraints on the possible states of a system after a finite number of transitions. In the case of I4, we did not find any case where it constructed a nontrivial inductive invariant for any benchmark where it did not succeed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Understanding SWISS</head><p>To better understand SWISS, we evaluate the effectiveness of our various design choices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1">The Utility of Combining Breadth and Finisher</head><p>In &#167;3, we argue that Breadth and Finisher target different kinds of invariants and hence are more effective when used together. To validate this claim, we re-ran SWISS on our benchmark suite, first using only Breadth, and then using only Finisher. In both cases, we limited their execution time to that taken by the full SWISS algorithm.</p><p>Compared to the 19 benchmarks solved by the full SWISS algorithm, the Breadth algorithm alone solves only 10, while Finisher alone solves only 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.2">Execution Times of Breadth Versus Finisher</head><p>In Table <ref type="table">1</ref>, Breadth is fairly quick, always less than 15 minutes, compared to Finisher which can vary from quite quick (a few seconds) to several hours. In fact, this is not an accident and follows directly from the fact that the space B is chosen to be small, whereas Finisher is designed to keep going until it finds a solution or times out. Thus, the runtime of Finisher depends on how large that solution turns out to be.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.3">Impact of User Guidance</head><p>While SWISS is designed to automatically search through the space of invariant templates, users can provide more specific guidance via one or more templates. For example, for Paxos, considering the Finisher portion, the user might suggest a template like (1) or (2) in Table <ref type="table">2</ref>. Compared to searching the automatically generated space of templates, this guidance cuts the search space by 75% and as a result, completes in 20 minutes, a 13.4&#215; speedup.</p><p>Of course, the user may erroneously suggest a template not containing any useful invariant. To evaluate the impact of such a mistake, we ran SWISS on a template (3) of comparable size to the "correct" templates, as well as some which are much smaller <ref type="bibr">(4)</ref><ref type="bibr">(5)</ref><ref type="bibr">(6)</ref>. The results suggest the user can afford to make some incorrect guesses and still outperform the full auto mode.</p><p>Finally, template <ref type="bibr">(7)</ref> is the largest template generated automatically with our default configuration. It takes significantly longer, indicating the importance of exploring templates in increasing size order when using auto mode.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.4">Is it a Small World?</head><p>Our Small World Hypothesis holds that many of the protocols we care about can be solved using a sequence of invariants I 1 , . . . , I n , which are individually concise. We analyze this hypothesis by measuring the size of the invariants in each of our benchmark protocols. This, in turn, helps us understand Table <ref type="table">2</ref>: User-Provided Templates. Experiments running Finisher on Paxos with the given template as input. Each experiment starts with many invariants provided as input, simulating the scenario that Breadth has already completed. The 'Inv?' column indicates whether the template contains an invariant that proves the safety condition. Time is given in seconds. Each experiment is run to completion, exploring the entire space, even if an invariant is found.</p><p>which protocols SWISS will likely succeed at by the degree to which they meet the Small World Hypothesis. For a given protocol, we examine an invariant I = I 1 &#8743; &#8226; &#8226; &#8226; &#8743; I n which proves the desired safety condition, chosen so that I 1 &#8743; . . . &#8743; I j is inductive relative to the safety condition (as in Claim 1). The m n column in Table <ref type="table">1</ref> gives the maximum size of any invariant out of I 1 , . . . , I n . Here, the "size" of a predicate is measured as its number of terms.</p><p>We do not claim that our numbers are the minimal possiblewe simply use the smallest out of any I that we know of. These may be from invariants synthesized by SWISS itself; in other cases, we use human-determined invariants.</p><p>We can see that 25 out of our 27 benchmarks have m n &#8804; 8, and 22 of them have even fewer. There were two exceptions, distributed-lock (m n = 12) and vertical-paxos (m n = 16). These two invariants are much bigger than any single invariant we have seen SWISS synthesize.</p><p>However, it may be worth noting that these larger invariants are conjuncts of smaller, mutually inductive invariants. For example, the 16-term invariant of vertical-paxos is actually the conjunct of two 8-term invariants (although these invariants are still on the larger side). Thus, these protocols would score much better on a weaker Small World Hypothesis, one where mutual invariants were counted separately. Other approaches may be able to take advantage of this.</p><p>To more fully understand SWISS's behavior, we also measure m n-1 , the maximum number of terms among I 1 , . . . , I n-1 . This statistic is interesting here because for SWISS to succeed in its current form, these n -1 invariants must be in B.</p><p>To rough approximation, we see that SWISS can succeed approximately when m n &#8804; 6 and m n-1 &#8804; 3. In some cases, SWISS does go beyond: some benchmarks succeed with up to m n = 8, and SWISS can solve multi-paxos (where m n-1 = 4) if the search space is restricted in other ways.  To evaluate filtering efficiency, we measure the total time spent on filtering versus SMT inductivity checks in a Paxos benchmark. We find that Finisher (using Template (1) of Table 2) performs 155 inductivity checks via SMT. The average SMT call takes 96.5 ms, with a median of 7 ms and a 95 th percentile of 55 ms. In contrast, filtering a single candidate takes 74 nanoseconds on average. Notably, both measures are important characteristics of SWISS, as some workloads are dominated by filtering and others by SMT calls (Appendix B.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.5">Impact of Filtering</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.6">Additional Analysis</head><p>See Appendix B for further analysis of SWISS's performance, e.g., the impact of optimizations, parallelism, and SMT calls.</p><p>6 Related Work</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Verifying Distributed Systems</head><p>The research community has long recognized the challenges of designing correct distributed systems. Manually written proofs <ref type="bibr">[25,</ref><ref type="bibr">31]</ref> and model checking <ref type="bibr">[23,</ref><ref type="bibr">26,</ref><ref type="bibr">56]</ref> increase assurance, but struggle with practical distributed systems <ref type="bibr">[4]</ref>.</p><p>Recent work applies general-purpose software-verification tools to the verification of distributed systems <ref type="bibr">[21,</ref><ref type="bibr">46,</ref><ref type="bibr">55]</ref>. These tools offer flexibility at the price of substantial human effort. For example, verifying Raft required over 50,000 lines of Coq proof for the protocol and its 520-line implementation <ref type="bibr">[55]</ref>, and Hawblitzel et al. used 12,000 lines of proof for the safety and liveness proofs of their Paxos protocol <ref type="bibr">[21]</ref>.</p><p>These human costs motivate the search for domain-specific languages (DSLs) and tools that reduce proof effort by re-stricting the class of encodable systems <ref type="bibr">[12,</ref><ref type="bibr">13,</ref><ref type="bibr">36,</ref><ref type="bibr">53]</ref>. For example, tools based on the heard-of model <ref type="bibr">[12,</ref><ref type="bibr">13,</ref><ref type="bibr">36]</ref> need a run-time system to bridge the gap between an asynchronous network and the synchronous semantics assumed by the verification tool, which can lead to performance bottlenecks. Similarly, pretend synchrony <ref type="bibr">[53]</ref> precludes classic optimizations, e.g., request batching in Paxos implementations.</p><p>All the works above, even the DSLs, rely on the developer to intuit invariants, which can be hard even for experienced researchers <ref type="bibr">[34]</ref>. Recent work tries to reduce this cost via a restricted logic (EPR - &#167;2.4) which makes invariant checking decidable; i.e., given a correct invariant, no further human work is needed. Even within EPR, finding the invariant remains undecidable <ref type="bibr">[40]</ref>, so Padon et al.'s IVy tool <ref type="bibr">[42]</ref> interactively aids the developer: IVy iteratively checks if a candidate invariant is inductive. If not, IVy presents a concrete counterexample, and the developer strengthens the candidate to eliminate it. This repeats until she derives an inductive invariant.</p><p>In contrast, Ma et al.'s I4 tool <ref type="bibr">[34]</ref> aims to be fully automatic. I4 first runs a custom model checker <ref type="bibr">[19]</ref> on an artificially small example of the protocol (e.g., with two nodes) to produce an invariant for the small system. I4 then attempts to generalize the invariant to the unbounded setting. When it succeeds, I4 requires no human intervention. In practice, Ma et al. report manually specifying concrete bounds for all of their benchmarks (to avoid exhaustive parameter searches) and concretizations of certain variables for several benchmarks. Ultimately, I4 is limited by the abilities of its model checker, which does not support existential quantifiers (which &#167;5.2 shows rules out a wide swath of protocols), and is unable to scale to more complex protocols like Paxos. In such cases, the developer is left with little recourse. However, I4 is frequently faster than SWISS and able to synthesize larger invariants for protocols where universally quantified invariants exist.</p><p>In recent work, Koenig et al. (FOL) <ref type="bibr">[27]</ref> develop an algorithm capable of synthesizing invariants containing existentials. Their algorithm relies on the IC3/PDR algorithm <ref type="bibr">[6,</ref><ref type="bibr">14]</ref> for constructing invariants incrementally. Like SWISS, it iteratively produces counterexamples, but it uses those counterexamples as constraints in a SAT encoding of predicates to be synthesized. SWISS verifies protocols, like Paxos, that FOL does not and verifies some faster than FOL. The reverse is also true, suggesting some complementarity of the approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">General-Purpose Invariant Synthesis</head><p>Extensive research <ref type="bibr">[7-9, 15, 17, 18, 20, 37, 39, 49, 57]</ref> studies loop-invariant inference for proving program correctness, but this remains challenging. Most approaches are limited to single-loop programs; only a few handle multiple loops or existential invariants. Approaches include abstract interpretation <ref type="bibr">[8]</ref>, interpolation <ref type="bibr">[37]</ref>, IC3 <ref type="bibr">[16,</ref><ref type="bibr">17]</ref>, templates and constraint solvers <ref type="bibr">[20]</ref>, counterexample-guided invariant generation (CEGIR) <ref type="bibr">[18,</ref><ref type="bibr">39,</ref><ref type="bibr">47]</ref>, trace analysis <ref type="bibr">[9,</ref><ref type="bibr">15]</ref>, and machine learning <ref type="bibr">[48,</ref><ref type="bibr">49,</ref><ref type="bibr">57]</ref>.</p><p>More closely related work uses templates <ref type="bibr">[7,</ref><ref type="bibr">20]</ref> to restrict the search to invariants of a given shape. In contrast to these approaches, we automatically construct a large set of templates and search for invariants of larger sizes. CEGIR approaches <ref type="bibr">[18,</ref><ref type="bibr">39,</ref><ref type="bibr">47]</ref> use enumeration and exploit the fact that guessing a candidate and checking if it is invariant is easier than inferring a loop invariant directly from code. They often employ dynamic analyses to infer candidates from execution traces and use a verifier to check invariant validity. The idea of learning from counterexamples has also been applied to program synthesis in the form of counterexample-guided inductive synthesis (CEGIS) <ref type="bibr">[51]</ref>, where the synthesizer generates a candidate program and the verifier uses the failed cases to prune the search space. SWISS's approach is inspired by techniques from search-based program synthesis <ref type="bibr">[1]</ref> and the CEGIS framework. Although CEGIS is a general framework, it cannot be used as a black-box since it requires a custom synthesizer, verifier, and learner for each domain. SWISS differs from prior approaches in how it uses the counterexamples to prune the search space ( &#167;4.2) and how it applies a CEGIS-style approach to infer more complex invariants than prior work.</p><p>Program sketching <ref type="bibr">[50]</ref> allows a programmer to sketch a program, i.e., write a program with "holes." A synthesizer fills the holes such that a specification is satisfied. In SWISS, the user can similarly provide templates to restrict the search, but even if such a template is not provided, SWISS can automatically generate a set of templates and search all of them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions and Future Work</head><p>We explore the hypothesis that the safety of most distributed systems can be proven via relatively small invariants (or conjunctions thereof), using our system SWISS, which incorporates novel optimizations to efficiently search the space of candidate invariants. We find that in many cases our hypothesis holds, and SWISS is able to automatically prove their safety, including several, such as Paxos, beyond the reach of prior work. Our results leave open the question of inferring large, mutually inductive invariants. They also illustrate that SWISS and its most recent predecessors often have complementary coverage of the benchmarks. Exploring ways to combine the strengths of each is an intriguing direction for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Failed Optimizations</head><p>In the interest of full disclosure, and to save others unnecessary work, we briefly summarize three optimizations we implemented and evaluated, only to find that they provide little benefit or actively hurt performance.</p><p>A.1 Formula Synthesis SWISS's current implementation explicitly enumerates and evaluates candidate invariants. Initially, however, we adopted a strategy from a prior system, MemSynth <ref type="bibr">[5]</ref>, which synthesizes memory models from a small collection of examples. Instead of explicitly enumerating formulas, MemSynth encodes the desired shape of candidate formulas as constraints on its SMT queries. In our context, this means creating an SMT query that says, "Find a formula that satisfies this template and complies with our accumulated counterexamples." However, our early experiments showed that synthesizing the formula via a solver was considerably slower than our combination of a custom enumerator and counterexample filters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2 Bounded Model Checking</head><p>Prior work on finding invariants for distributed systems <ref type="bibr">[42]</ref>, found some benefit from using bounded model checking to try to quickly rule out candidate invariants. In our context, this means checking not just the condition INIT =&#8658; P, but whether there are any violations of P in states reachable after taking a fixed number of steps from an initial state. If such a violation existed on a model M, we can use the counterexample filter True(M). Likewise, to build False filters, we consider states a fixed number of steps away from violating safety.</p><p>The hypothesis for this optimization is that it would produce more and "higher quality" filters to rule out future candidates. To pay off, these savings must offset the cost of the additional SMT calls that compute the bounded model checks. Sadly, this optimization rarely boosts performance significantly ( &#167;5.3.5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3 Aggressively Accumulating Invariants</head><p>As it executes, Breadth finds predicates that are invariant with respect to the input invariants I 1 , . . . , I n ; these new invariants are fed into the next iteration of Breadth. This suggests an obvious improvement: treat newly found invariants as input invariants immediately in order to uncover even more invariants than the ones Breadth is guaranteed to find, leading to fewer total loops. We call this variation BreadthAccumulative.</p><p>However, this variant introduces several complications. Most critically, it interferes with the FastImplies optimization. For example, suppose we process predicates f , g, h 1 , . . . , h n in order and (i) g is invariant; (ii) f is invariant with respect to g; and (iii) FastImplies( f , h i ) holds for all i. BreadthAccumulative, would pass over f (not invariant), then find and add g, causing all h i to become invariant. Since the h i are not filtered out by the FastImplies check, BreadthAccumulative's aggressive addition of the h i causes the number of invariants to explode. In contrast, Breadth would only add g at the end; then in the next call to Breadth, f would be added to allInv, excluding the h i via FastImplies.</p><p>To prevent BreadthAccumulative from adding such spurious invariants, we added a strengthening step, where the first h i found would be strengthened to f . However, strengthening comes at a cost, and in the end, we found the BreadthAccumulative optimization to be unhelpful ( </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B Additional Evaluation</head><p>In this section, we provide some additional measurements of the impact of SWISS's design decisions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.1 Impact of Optimizations</head><p>Model Minimization. To evaluate the effectiveness of model minimization ( &#167;A.2), we measure several benchmarks with and without it (Figure <ref type="figure">5</ref>). While it adds some overhead for simple protocols, it helps significantly for more complex protocols; in the best case, we found that it improved the Flexible-Paxos experiment by 1.6&#215;. Bounded Model Checking. For our BMC variant ( &#167;A.2), we again ran several experiments with and without it (Figure <ref type="figure">5</ref>). However, the results show that the cost of the required SMT calls was not sufficiently offset by gains from "higher quality" filters. Hence we disable it in SWISS's default configuration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.2 Impact of Parallelism</head><p>To evaluate our parallelism strategy ( &#167;4.5.2), we measure runtimes with varying numbers of threads, studying Breadth and Finisher independently. For each run, we break down the running time of the longest-running thread in each iteration to see which components of the algorithm parallelize well. We run our experiments on the Paxos protocol.</p><p>For the purposes of this experiment, unlike in standard runs, we do not terminate the Finisher algorithm when it finds an invariant which proves the safety condition; instead, we let it search the entire search space F . This removes variance Breadth terminates when the last (top) iteration fails to find any invariant, so it takes one less iteration than shown to find all invariants Breadth can find. In the bottom row, the runtimes are broken down into (i) filtering candidates with counterexamples, (ii) computing counterexamples of noninvariants, and (iii) processing candidates that are invariant. from the random ordering of the search space, leading to more controlled experimental data.</p><p>Our results are shown in Figure <ref type="figure">6</ref>. Finisher's runtime is dominated by enumerating and filtering, which splits fairly evenly across threads. Overall, SWISS on a single template is 2.0&#215; faster with 2 threads than with 1 thread, and it is 8.6&#215; faster with 8 threads than with 1.</p><p>Breadth, meanwhile, does not parallelize as well, since its runtime is dominated by time spent constructing counterexample filters via SMT calls, which is essentially a fixed cost per thread. At best, we saw a speedup of 1.7&#215; with 8 threads.</p><p>We also evaluated BreadthAccumulative ( &#167;A.3) while varying the number of threads (Figure <ref type="figure">6</ref>). Our hypothesis that BreadthAccumulative would require fewer iterations was confirmed: each run required only one iteration to find all invariants in B (plus one iteration to confirm no further invariants exist). By contrast, Breadth requires two iterations (plus one) on the same benchmark. However, BreadthAccumulative still performs worse than Breadth due its other costs (e.g., strengthening - &#167;A.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.3 Hard SMT instances</head><p>As described in &#167;4.3, SMT queries are often rapid, but there are occasional outliers that slow down execution. To measure how problematic these outliers are, we measured the prevalence of hard instances, defined as any instance exceeding forty-five seconds and triggering our retry strategy. In particular, we measured the fraction of total computation time spent on these hard instances. Among all protocols that SWISS was able to solve, this fraction was greatest for the paxos benchmark, which spent 10.2% of its computation time on hard instances, which accounted for 0.17% of its SMT instances.</p><p>However, among all protocols that SWISS was not able to solve, this fraction was greatest for the fast-paxos benchmark, which spent 98.7% of its computation time on hard instances, which accounted for 3.7% of its SMT instances. We currently do not have a good understanding of what makes this protocol's SMT instances difficult for SMT solvers, but the numbers suggest that improvement to SWISS's SMT strategy could make it much faster on harder protocols.</p></div></body>
		</text>
</TEI>
