<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>DuoAI: Fast, Automated Inference of Inductive Invariants for Verifying Distributed Protocols</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>07/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10356075</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Jianan Yao</author><author>Runzhou Tao</author><author>Ronghui Gu</author><author>Jason Nieh</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Distributed systems are complex and difficult to build correctly. Formal verification can provably rule out bugs in such systems, but finding an inductive invariant that implies the safety property of the system is often the hardest part of the proof. We present DuoAI, an automated system that quickly finds inductive invariants for verifying distributed protocols by reducing SMT query costs in checking invariants with existential quantifiers. DuoAI enumerates the strongest candidate invariants that hold on validate states from protocol simulations, then applies two methods in parallel, returning the result from the method that succeeds first. One checks all candidate invariants and weakens them as needed until it finds an inductive invariant that implies the safety property. Another checks invariants without existential quantifiers to find an inductive invariant without the safety property, then adds candidate invariants with existential quantifiers to strengthen it until the safety property holds. Both methods are guaranteed to find an inductive invariant that proves desired safety properties, if one exists, but the first reduces SMT query costs when more candidate invariants with existential quantifiers are needed, while the second reduces SMT query costs when few candidate invariants with existential quantifiers suffice. We show that DuoAI verifies more than two dozen common distributed protocols automatically, including various versions of Paxos, and outperforms alternative methods both in the number of protocols it verifies and the speed at which it does so, including solving Paxos more than two orders of magnitude faster than previous methods.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The world relies on distributed systems, but these systems are increasingly complex and hard to design and implement correctly. To address this problem, developers are starting to turn to formal verification techniques to prove the correctness of distributed systems <ref type="bibr">[11,</ref><ref type="bibr">20,</ref><ref type="bibr">35]</ref>. This involves formally verifying that desired safety properties hold for the distributed protocol. A safety property is an invariant that should hold true at any point in a system's execution. It ensures the protocol does not reach invalid or dangerous states. For example, the safety property for a distributed lock protocol <ref type="bibr">[11]</ref> is that no two nodes in the system hold a lock at the same time. The proof requires finding an invariant that implies the safety property, then proving that it is inductive. An invariant is inductive if it holds for all initial states of the system, and is preserved on all valid transitions so that it holds for any reachable state of the system.</p><p>Unfortunately, finding an inductive invariant is often the hardest part of the proof <ref type="bibr">[21]</ref>. Invariants can be expressed as logical formulas consisting of universal (&#8704;) and existential (&#8707;) quantifiers with a certain number of variables, and a set of logical relations among the variables. Recent work has focused on automating the process of finding an inductive invariant, but has various limitations. I4 <ref type="bibr">[21]</ref> was the first to automate the process, but provides no guarantee that it can find the inductive invariant and does not work for invariants with existential quantifiers. Our previous work DistAI <ref type="bibr">[38]</ref> provides speed advantages over I4 and a guarantee of finding an &#8707;-free inductive invariant if one exists, but also does not work for invariants with existential quantifiers. FOL-IC3 <ref type="bibr">[13]</ref> was the first to handle existential quantifiers, but is inefficient due to its heavy use of expensive SMT queries. It often fails to find invariants for protocols that can be solved by other approaches such as I4 and DistAI. SWISS <ref type="bibr">[10]</ref> can successfully find an inductive invariant for Paxos, but does not work for more complex protocols such as stoppable Paxos <ref type="bibr">[27]</ref>. It fails or is much slower than I4 and DistAI for many protocols without existential quantifiers.</p><p>We present DuoAI, an automated system to quickly find inductive invariants for verifying distributed protocols, with and without existential quantifiers, including complex versions of Paxos. Even though a distributed protocol may be used in very large systems, its invariants are likely to be concise, as protocols need to be designed and understood by humans to be correct. As a result, DuoAI operates in formula space and considers smaller formulas first to enumerate candidate invariants, which are then checked by an SMT solver. Formula size is defined by a maximum number of quantified variables (a variable and its quantifier &#8704; or &#8707;) and relations. If DuoAI does not succeed with smaller formulas, it increases the formula size and repeats the process until an inductive invariant is found. Although the formula space within a given formula size is finite, checking all possible invariants for even a modest size formula is prohibitively expensive, especially since SMT solvers are particularly inefficient at checking invariants with existential quantifiers. It is crucial to avoid too many SMT queries and SMT queries that are too complex. Based on this observation, DuoAI introduces and combines new techniques that avoid the limitations of SMT solvers in checking invariants with existential quantifiers.</p><p>First, DuoAI runs protocol simulations at various instance sizes and logs the reached protocol states, which we call samples. Instance size refers to the size of distributed system (number of nodes, packets, etc.) running the protocol. These simulations are fast to execute. DuoAI directly checks candidate invariants against the samples, pruning those that do not hold to reduce the number of invariants checked by an SMT solver. To do this systematically, DuoAI introduces the minimum implication graph, which for a given invariant, shows all its implied weaker invariants. It then selects the strongest candidate invariants in the graph that hold for the samples.</p><p>Second, DuoAI combines the strongest candidate invariants with the safety property and feeds them to an SMT solver to check if the conjunction is inductive. If the check fails, it monotonically weakens the invariants using the graph and repeats the process until an inductive invariant is found. If the number of candidate invariants is not too large and most are required in the final solution, this method will be effective at reducing the number of SMT queries by feeding all of the candidate invariants to the SMT solver at once. Third, DuoAI feeds the strongest candidate universal invariants, those without existential quantifiers, from the graph to an SMT solver to check if the conjunction, without the safety property, is inductive. If the check fails, it monotonically weakens the invariants using the graph, only considering candidate universal invariants, and repeats the process until the conjunction is inductive. We call this set of inductive &#8704;-only invariants the universal core. It then strengthens the universal core by iteratively adding a small subset of the strongest candidate invariants with existential quantifiers from the graph until the conjunction with the safety property is inductive. If the number of candidate invariants with existential quantifiers is large and most are not in the final solution, this method will be effective at avoiding too complex SMT queries, because it only feeds a few invariants to the SMT solver each time.</p><p>DuoAI runs these two methods for refining candidate invariants in parallel, a top-down refinement that weakens the candidates and a bottom-up refinement that strengthens the candidates, returning the result from the method that succeeds first. We prove that both methods are guaranteed to find the inductive invariant that proves the desired safety property, but they may have very different running times and resource requirements depending on the distributed protocol being verified. Using both methods together provides the best of both worlds in addressing the inefficiencies of SMT solvers. We evaluated DuoAI using 27 widely-used distributed protocols in a head-to-head comparison against other approaches, including I4, DistAI, FOL-IC3, and SWISS. DuoAI outperforms all of the other approaches in terms of both the number of protocols for which it finds an inductive invariant and the speed at which it does so. DuoAI solves Paxos more than two orders of magnitude faster than any other approach, and is the only system that can solve more complex versions of Paxos including multi-Paxos, stoppable Paxos, and fast Paxos.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Overview</head><p>We use a simplified consensus protocol as an example to show how DuoAI works. Figure <ref type="figure">1</ref> shows the protocol written in Ivy <ref type="bibr">[28]</ref>, a language and tool for specifying, modeling, and verifying distributed protocols built on top of the Z3 SMT solver. Each node can vote for another node to be the leader, and when a node receives votes from a quorum of nodes, it can become the leader and decide on a value. The protocol state at any moment is represented by five relations (Lines 5-9). vote(n 1 ,n 2 ) indicates whether node n 1 has voted for node n 2 . voted(n) indicates whether node n has ever casted a vote. leader(n) indicates if n is the leader among nodes. decided(n,v) indicates whether node n has decided on value v. member(n,q) indicates if node n belongs to quorum q, where each quorum is a set of nodes. The axiom (Line 10) dictates a property of the member relation: any two quorums of nodes must have at least one node in common. After initialization (Lines 13-16), the protocol can non-deterministically transition from one state to another as described by the three actions cast_vote, become_leader, and decide (Lines 19-34). For example, cast_vote(n 1 ,n 2 ) lets a node n 1 vote for another node n 2 , under the precondition that n 1 has not voted before (Line 20). Then the protocol will transition to a new state where vote(n 1 ,n 2 ) = true and voted(n 1 ) = true. Finally, the safety property (Line 36) encodes the desired property of correctness of the protocol that the system cannot decide on two different values.</p><p>The safety property is an invariant of the protocol, but is not inductive as taking an action from a state satisfying the safety property may result in a new state that breaks the safety property. To verify the protocol, we need four additional invariants:</p><p>The first invariant says that if a node has voted for another node, then it must be recorded as voted in the protocol. The second says that one node cannot vote for two different nodes. The third says that a leader must be endorsed by a quorum of nodes. More specifically, we can find a quorum Q that every node N 2 in the quorum must have voted for the leader N 1 . The fourth says that only a leader can decide on a value. The conjunction of the four invariants and the safety property is inductive.</p><p>To find this inductive invariant, DuoAI simulates the protocol using different instance sizes and logs the samples. It then builds a minimum implication graph, a small fragment of which is shown in Figure <ref type="figure">2</ref>. The full graph for simplified consensus has over 35K nodes and 170K edges. Nodes represent formulas and edges represent implication between formulas. A stronger formula will have a directed edge to an implied weaker formula. DuoAI enumerates possible candidate invariants following the graph and adds it to the candidate invariant set if it holds on the samples. For example, DuoAI checks the root node in Figure <ref type="figure">2</ref> and it does not hold on the samples. DuoAI then checks its implied weaker formulas, the two nodes in the second layer, iteratively going down the graph. For the simplified consensus protocol, enumeration ends with 19 candidate invariants, including equivalent forms of Eq. ( <ref type="formula">1</ref>), ( <ref type="formula">2</ref>), (3), and (4).</p><p>After enumeration, DuoAI runs top-down and bottom-up refinement in parallel. Top-down refinement feeds all candidate invariants and the safety property to Ivy to see if their conjunction is inductive. For simplified consensus, the conjunction is inductive, so no further weakening is required.</p><p>Bottom-up refinement feeds all &#8704;-only invariants from the initial candidate set to Ivy then weakens them until the set of invariants is itself inductive, but may not imply the safety property. For simplified consensus, this universal core includes three invariants Eq. ( <ref type="formula">1</ref>), (2), and (4). DuoAI then tries to search a small number of &#8707;-included invariants to add to the universal core along with the safety property so that the resulting set is inductive. DuoAI uses counterexamples from Ivy to guide the search for additional invariants and eventually identifies invariant (3) for the simplified consensus protocol, forming an inductive invariant set. For simplified consensus, top-down refinement succeeds more quickly than bottom-up refinement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Minimum Implication Graph</head><p>The backbone of DuoAI is the minimum implication graph, which encodes implication relations among formulas. The graph is used to determine the order of formulas to be enumerated, and how invariants are weakened. We present formulas in prenex normal form, where the quantified variables, called the prefix, appear at the beginning of the formula followed by quantifier-free relations, called the matrix. The matrix is required to be in disjunctive normal form (DNF).</p><p>For simplicity, here we only consider predicate symbols with equality. The methods can be extended to uninterpreted functions in the same manner as DistAI <ref type="bibr">[38]</ref>.</p><p>A formula P is strictly stronger than</p><p>For two formulas P,Q &#8712; S, where S is a finite formula search space, there is a directed edge from P to Q in the minimum implication graph if and only if P is strictly stronger than Q and there is no formula R which is strictly weaker than P while strictly stronger than Q. For example, the fragment of the minimum implication graph in Figure <ref type="figure">2</ref> includes three formulas:</p><p>&#8707;N. vote(N,N)</p><p>Eq. ( <ref type="formula">5</ref>) &#8658; (6) since if vote(N,N) is true for all N, there must exist some N for which it is true. Eq. ( <ref type="formula">6</ref>) &#8658; (7) since if vote(N,N) is true for some N, there must exist some N 1 = N 2 for which vote(N 1 ,N 2 ) is true. Because Eq. (5) &#8658; (6) &#8658; (7), there is an edge from Eq. ( <ref type="formula">5</ref>) to ( <ref type="formula">6</ref>), an edge from Eq. ( <ref type="formula">6</ref>) to <ref type="bibr">(7)</ref>, but no edge from Eq. ( <ref type="formula">5</ref>) to <ref type="bibr">(7)</ref>, because Eq. ( <ref type="formula">6</ref>) is between them.</p><p>DuoAI defines the search space S as all formulas in disjunctive normal form for a given set of quantified variables and formula size. The formula size is defined by four parameters: max_exists sets the maximum number of existentially quantified variables, max_literal sets the upper bound of the total number of literals in the formula, max_and sets the maximum number of literals connected by AND, and max_or sets the maximum number of conjunctions connected by OR.</p><p>The minimum implication graph has two important properties as stated in Lemmas 1 and 2: Lemma 1. The minimum implication graph is a directed acyclic graph (DAG).</p><p>Proof. Suppose there is a cycle P 1 &#8594; P 2 &#8594; ... &#8594; P k &#8594; P 1 . The edges P 1 &#8594; P 2 , ... , P k-1 &#8594; P k imply that P 1 &#8658; P 2 , ... , P k-1 &#8658; P k . From the transitivity of &#8658; we know P 1 &#8658; P k . Since there is an edge from P k to P 1 , we know P 1 &#824; &#8658; P k , a contradiction. Lemma 2. For any P,Q &#8712; S, there is a path from P to Q in the minimum implication graph if and only if P &#8658; Q&#8743;Q &#824; &#8658; P.</p><p>Proof. We first prove the "if" direction by induction on the number of formulas in S that are strictly weaker than P while strictly stronger than Q. For the base case, if there are zero such formulas, then by definition there is an edge from P to Q. Next we prove the induction step. Suppose for any P,Q &#8712; S, if P &#8658; Q &#8743; Q &#824; &#8658; P, and there is no more than n formulas that are strictly weaker than P while strictly stronger than Q, then there is a path from P to Q. Now consider the case that there are n+1 formulas that are strictly weaker than P while strictly stronger than Q. Let R be one of the n+1 formulas. We know P &#8658; R&#8743;R &#824; &#8658; P, and there can be no more than n formulas that are strictly weaker than P while strictly stronger than R. By the induction hypothesis, there is a path from P to R. In the same manner, we can show there is a path from R to Q. Then we concatenate the two paths and get a path from P to Q.</p><p>Next we prove the "only if" direction. If there is a path from P to Q, Let P, F 1 , F 2 , ..., F k , Q be the path. We know P &#8658; F 1 , ... , F k &#8658; Q, so P &#8658; Q. We prove Q &#824; &#8658; P by contradiction. Suppose Q &#8658; P, then P &#8660; Q, so there must be an edge from F k to P. This forms a cycle P,F 1 ,...,F k ,P, a contradiction to Lemma 1.</p><p>To build the minimum implication graph, we need to determine the "root" nodes in the graph, that is, formulas with no predecessors since they cannot be implied by any other formula, and how to find their successors. In DuoAI, a formula P &#8712; S is added to the set of root nodes if it falls into one of two cases:</p><p>1. P has no &#8707;-quantified variable and no logical OR. For example:</p><p>&#8704;N : node. vote(N,N)&#8743;leader(N).</p><p>2. P has unique &#8707;-quantified variables and no logical OR. For example:</p><p>Intuitively, if a formula has an &#8707;, then by changing it to a &#8704;, we can get a stronger formula. If a formula has a logical OR, then by removing the OR and any literals followed by it, we can get a stronger formula. So in general, a root formula should have no &#8707; and no OR, such as Eq. ( <ref type="formula">8</ref>). There is one exception, represented by Eq. ( <ref type="formula">9</ref>). At first sight Eq. ( <ref type="formula">9</ref>) has a predecessor &#8704;N 1 ,N 2 :</p><p>However, this formula is a contradiction because &#8704;N 1 ,N 2 : node. N 1 &#824; = N 2 cannot be true. The minimum implication graph does not include tautologies and contradictions, so Eq. ( <ref type="formula">9</ref>) itself is a root formula.</p><p>Starting from the root nodes, DuoAI incrementally builds the minimum implication graph. For formulas P,Q &#8712; S, DuoAI adds an edge from P to Q if the shapes of P and Q fall into one of five cases: 1. P and Q share the same matrix. Q replaces the &#8704;-quantified variables of one type with &#8707;-quantified variables. For example:</p><p>2. P and Q share the same prefix. Q has one less ANDed literal than P. For example: P = Eq.( <ref type="formula">8</ref>) Q = Eq.(5).</p><p>3. P and Q share the same prefix. Q has one more ORed conjunction than P. For example:</p><p>DuoAI requires that the ORed conjunction be maximal, which means it contains the maximum number of literals for the search space. The conjunction voted(N)&#8743;leader(N) in Q is maximal if max_and = 2 or max_literal = 3. For example, Q &#8242; = &#8704;N : node. vote(N,N) &#8744; voted(N) also adds one more ORed conjunction from P, but DuoAI does not add an edge from P to Q &#8242; , because Q is strictly stronger than Q &#8242; .</p><p>4. Starting from P, Q projects two &#8704;-quantified variables of the same type into one variable. For example:</p><p>5. Starting from Q, P projects two &#8707;-quantified variables of the same type into one variable. For example:</p><p>The graph constructed in this way may differ slightly from the exact minimum implication graph due to equivalent formulas. For example, formulas &#8704;X. p(X) &#8744; (&#172;p(X) &#8743; q(X)) and &#8704;X. p(X)&#8744;q(X) fall into the second case, so there is an edge in the constructed graph. However, the two formulas are equivalent so there is no edge in the exact graph. We call the graph constructed by DuoAI an approximate minimum implication graph, whose properties are formalized in Lemmas 3, 4, and 5: Lemma 3. The approximate minimum implication graph is a directed acyclic graph (DAG).</p><p>Proof. For all of the five cases, we can show that for formulas along any path in the approximate minimum implication graph, there exists one function that is strictly increasing, so there can be no cycle. For example, this is true for the function</p><p>where # denotes "the number of" (e.g., (# &#8744;) is the number of logical OR in a formula). Lemma 4. For any P,Q &#8712; S, there is a path from P to Q in the approximate minimum implication graph only if P &#8658; Q.</p><p>Proof. From the transitivity of &#8658;, we only need to show that if there is an edge from P to Q in the approximate minimum implication graph, then P &#8658; Q. This can be proved by showing P &#8658; Q holds in each of the five cases. The first three cases are trivial. For the fourth case, in general P =</p><p>Similarly, for the fifth case, P = ... &#8707;X 1 ... matrix(X 1 ,X 1 ) and Q</p><p>Lemma 5. For any formula P &#8712; S that is not a tautology or a contradiction, there exists a directed path from a root node Q &#8712; S to P in the approximate minimum implication graph.</p><p>Proof. We prove this by construction. For a &#8707;-free formula P, if it includes no logical OR, then it is a root formula itself. Otherwise, we find the root formula Q by removing all but one ORed conjunctions. Starting from Q, we can iteratively apply the second and third cases to add conjunctions and remove literals until we reach P. For a &#8707;-included formula P, if it includes unique &#8707;-quantified variables then it is a root formula itself. Otherwise, we iteratively find a predecessor by replacing &#8707; with &#8704; for quantified variables of each type, until the formula becomes the &#8707;-free P &#8242; . The first case guarantees that there is a path from P &#8242; to P, and we have already shown for the &#8707;-free P &#8242; , there exists a path from a root node Q. Putting it together, we have a path from Q to P in the approximate minimum implication graph.</p><p>In other words, the approximate minimum implication graph is as useful and complete as the exact graph. DuoAI uses the approximate minimum implication graph, which, for simplicity, we will continue to refer to as the minimum implication graph unless otherwise specified. DuoAI requires that formulas in S must be in a decidable fragment of first-order logic. In general, satisfiability in firstorder logic is undecidable <ref type="bibr">[23]</ref>, so an SMT solver can get stuck in infinite instantiations and never give the sat/unsat answer. DuoAI ensures that the formulas are decidable by enforcing a fixed order of types if there is quantifier alternation (i.e., alternating &#8704; and &#8707;) <ref type="bibr">[1]</ref>. If type A is ordered before type B, then for any formula, if there exists a quantified variable V of type A, any quantified variable of type B can only occur after V if there is quantifier alternation. For example, if type node is ordered before type packet, then &#8704;N : node. &#8707;P : packet and &#8707;N : node. &#8704;P : packet are allowed while &#8704;P : packet. &#8707;N : node and &#8707;P : packet. &#8704;N : node are not. DuoAI tries to infer the order of types from the protocol specification and obtains input from the user when necessary. For example, for the simplified consensus protocol, DuoAI can infer from Line 10 that type quorum must be ordered before type node, then ask the user to place type value in the order. Absent user input, DuoAI will try different possible orders in parallel.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Candidate Invariant Enumeration</head><p>Similar to DistAI <ref type="bibr">[38]</ref>, DuoAI first repeatedly simulates the distributed protocol using various instance sizes, and records the reached states as samples. For example, DuoAI simulates the simplified consensus protocol on concrete instances of different numbers of values, quorums, and nodes. The simulations of different instance sizes are done in parallel and yield samples of different lengths. DuoAI follows the minimum implication graph to enumerate candidate invariants, but rather than feeding all of them to an inefficient SMT solver, it checks them directly on the samples first. A correct invariant must hold on every reachable protocol state and thus on every sample. A key difference between DuoAI and DistAI is that DuoAI keeps the original variable-length samples and uses them in invariant enumeration, while DistAI projects all samples to fixed-length vectors that it calls subsamples. The problem is that DistAI is not exhaustive in its subsampling, so that a formula with existential quantifiers that holds for DistAI's subsamples may not actually hold for the original samples. DuoAI avoids this problem by effectively considering all possible subsamples that can be derived from the original samples.</p><p>Algorithm 1 shows the enumeration algorithm, in which pending is a queue whose elements are formulas that will be checked on the samples, candidates is the set of formulas that hold on all the samples and invalidated is the set of formulas that do not hold on at least one of the samples. Both candidates and invalidated are initially empty (Lines 2-3), and pending initially consists of the root nodes of the minimum implication graph, that is, formulas that cannot be implied by any other formula. In each iteration, a formula f is popped from the pending queue (Line 5). If one of f 's ancestors in the graph has already been added to candidates, DuoAI will not check f on the samples or add f to the candidates invariants (Lines 6-7). Oth- Step 2: The root node A is invalidated by the samples. We add its two successors B and C to the pending queue.</p><p>Step 3: Formula B holds on the samples thus being added to candidates, while formula C is invalidated and its two successors E and F are added to the pending queue.</p><p>Step 4: Formula E has an ancestor B which is already in candidates, so E is simply skipped instead of being checked on the samples. Formula F holds on the samples and is added to candidates. pending.enqueue(next_ f ) 15: return candidates erwise, DuoAI will check f on the samples and if it holds, add it to candidates (Lines 8-9). If f does not hold on at least one sample, DuoAI will add it to invalidated (Line 11), and add its successors, which are formulas weaker than f , to the pending queue if they have not already been added (Lines 12-14). Figure <ref type="figure">3</ref> shows an example of invariant enumeration using the graph in Figure <ref type="figure">2</ref>. DuoAI starts from the root nodes, iteratively goes down the minimum implication graph, and checks formulas against the samples. Because of this design, formulas D and E are never checked against the samples and are not added to the candidates, because their predecessor B, a formula stronger than both D and E, is already a candidate invariant. This design not only saves time checking formulas on samples, but also avoids burdening the SMT solver later with checking the inductiveness of redundant invariants. More importantly, this procedure guarantees that the resulting invariants, formulas B and F in this example, are the strongest candidate invariants that hold on the samples, which is formally stated in the following theorem: Theorem 1. For any correct invariant I &#8712; S held by the protocol P , at the end of invariant enumeration, either 1) I &#8712; candidates, or 2) one of I's ancestors I anc &#8712; candidates.</p><p>Proof. Consider three cases: 1) I has been checked on the samples, 2) I has been added to the pending queue but was not checked on samples, and 3) I has been never added to the pending queue. In the first case, since I is a correct invariant held by the protocol, it must hold on all the samples and will be added to candidates (Lines 8-9), so I &#8712; candidates. In the second case, after I is popped from the pending queue, there must be an ancestor I anc of I already in candidates (Line 6), otherwise I will be checked on the samples, so I anc &#8712; candidates. In the third case, we show that an ancestor I anc &#8712; candidates exists. From Lemma 5, there must be a path from a root node I 0 to I, namely I 0 ,I 1 ,...,I. On Line 3 the root node I 0 is added to the pending queue. Since I 0 is added to the pending queue and I is not, let I k be the last formula on the path I 0 ,I 1 ,...,I that is ever added to the pending queue. After I k is dequeued, there are three possible branches to take: Lines 6-7, Lines 8-9, or Lines 10-14. If it takes Lines 6-7, then there is an ancestor I anc of I k such that I anc &#8712; candidates. If it takes Lines 8-9, then I k will be added to candidates so I k can be the ancestor I anc of I such that I anc &#8712; candidates. If it takes Lines 10-14, its successors will be added to the pending queue unless the branch condition at Line 13 evaluates to false. From our hypothesis, I k is last formula on path I 0 ,...,I k ,I k+1 ,...,I that is ever added to the pending queue. Thus, the branch condition for I k+1 must evaluate to false, so either I k+1 &#8712; candidates or I k+1 &#8712; invalidated. However, I k+1 must be added to the pending queue before it can be added to either candidates or invalidated, a contradiction. its ancestor (a stronger formula) in the candidate invariants. Figure <ref type="figure">4</ref> gives an illustration. A direct corollary is that, the set of candidate invariants after the enumeration is at least as strong as the correct invariant in S.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theorem 1 says that any correct invariant has either itself or</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Top-down Invariant Refinement</head><p>Based on Theorem 1, the candidate invariants can only be too strong, so DuoAI can monotonically weaken the candidate invariants until a correct inductive invariant is reached, which we refer to as top-down invariant refinement. Algorithm 2 shows the top-down refinement algorithm. In each iteration, DuoAI feeds the current candidate invariants to Ivy. Ivy invokes the Z3 SMT solver to check the inductiveness of each candidate invariant and the safety property. Ivy will return which invariants fail the check; if there are none, the correct inductive invariant has been found (Lines 4-5). If the safety property fails, there is no point to weaken it, and the system returns NotProvable (Lines 6-7). If one of the candidate invariants fails, DuoAI moves it from candidates to invalidated (Lines 9-10), then adds its successors (i.e., formulas that can be implied by the failed invariant) to candidates so long as the successor does not have a reachable ancestor in candidates and has not already been invalidated (Lines 12-13).</p><p>An ancestor of a node is reachable if there is a path from the ancestor to the node along which no node is invalidated. Figure <ref type="figure">5</ref> shows an example of top-down refinement. Suppose the current candidate invariants include formulas B and F, and by invoking the Z3 SMT solver, Ivy indicates that B is not inductive. Formulas D and E are not in candidates, because they can be implied by formula B which is already in candidates. After B is invalidated, both D and E will be added to candidates to let Ivy decide their inductiveness in future iterations. Alternatively, if formula F is invalidated by Ivy, no formula will be added to candidates because F has no successor in the minimum implication graph of search space S. candidates := candidates &#8746; {next_inv} 14: return NotProvable By weakening failed invariants based on the minimum implication graph rather than discarding them, DuoAI can guarantee that it never overweakens invariants to bypass the correct invariants in between. In other words, top-down refinement has a theoretical guarantee to eventually find an inductive invariant if one exists in the search space, as stated in the following theorem: Theorem 2. For any protocol P and finite search space S, if there exists an inductive invariant II * &#8834; S that can prove the safety property, then Algorithm 1 followed by Algorithm 2 will output such an inductive invariant II in finite time.</p><p>Proof. The key is to prove that the while loop (Lines 2-13) maintains the following loop invariant: For any invariant I &#8712; II * , either 1) I &#8712; candidates, or 2) there exists a reachable ancestor I anc of I such that I anc &#8712; candidates. The loop invariant says that after any rounds of invariant weakening, the candidate invariants must be still at least as strong as the correct invariants. If Algorithm 2 terminates, it is impossible to have the safety property fail (Line 7). The only possibility is that a correct inductive invariant is returned (Line 5).</p><p>Theorem 1 guarantees that the loop invariant holds before entering the loop. We only need to prove that if this loop invariant holds at the beginning of round k of invariant weakening, it must still hold at the beginning of round k+1. This proof is done by construction for each I &#8712; II * . From the induction hypothesis, at the beginning of round k, either 1) I &#8712; candidates, or 2) a reachable ancestor I anc &#8712; candidates. In the first case, I cannot have been invalidated during round k because I &#8712; II * , so I &#8712; candidates still holds at the beginning of iteration k+1. In the second case, the invalidated invariant must either be on or not on the path from I anc to I. If it is not on the path, I anc remains a reachable ancestor of I and I anc &#8712; candidates still holds at the beginning of iteration k+1. If it is on the path, let I d be the successor of the invalidated invariant on the path. From Lines 11-13, either I d is added to candidates, in which case I d can be the new I anc for iteration k+1, or I d has a reachable ancestor I e &#8712; candidates, in which case we choose I e as the new I anc for iteration k+1. In all cases, we can find either I or a reachable ancestor I anc in the candidate set, therefore the loop invariant holds. Now we only need to prove that Algorithm 2 terminates, which follows from three observations: 1) In each loop iteration a formula is removed from candidates (Line 9); 2) each formula can only be added to candidates once (Lines 10 &amp; 12); and 3) the formula search space S is finite.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Bottom-up Invariant Refinement</head><p>Although top-down refinement provides a strong theoretical guarantee of finding an inductive invariant, it may take too long or run out of memory given limited computing resources if there are too many unnecessary invariants to consider. For the simplified consensus protocol in Figure <ref type="figure">1</ref>, besides the four invariants (1)(2)(3)(4), many other invariants hold for the protocol but are unnecessary to prove the inductiveness of the safety property, for example, &#8704;V : value, Q : quorum. &#8707;N : node. member(N,Q)&#8743;(leader(N)&#8744;&#172;decided(N,V )). <ref type="bibr">(10)</ref> Invariants such as Eq. ( <ref type="formula">10</ref>) do not affect the soundness of DuoAI, but they will significantly slow down the validation of candidate invariants by the SMT solver. If there are m candidate invariants, validating each invariant takes O(m) time in the worst case, since the inductiveness of one invariant can depend on any other invariant, so checking all candidate invariants can take O(m 2 ) time. Adding unnecessary invariants can increase validation time quadratically.</p><p>The key issue though is not just how many unnecessary invariants there are, but whether they have quantifier alternation (i.e., alternating &#8704; and &#8707;), which we observe causes  return result 14: return NotProvable SMT solvers to struggle. For the Paxos protocol, a correct inductive invariant set of size 14 can be validated in less than a second. If we add 10 correct but unnecessary invariants with quantifier alternation, the validation will take 5 minutes. If we add 20 such invariants, the validation will take over 3 hours.In contrast, the chord ring maintenance protocol <ref type="bibr">[21]</ref> with 149 &#8704;-only invariants only takes 8 seconds to validate.</p><p>However, a correct distributed protocol typically has a clear and human-understandable intuition, which leads to concise invariants <ref type="bibr">[10]</ref>. This motivates our bottom-up invariant refinement algorithm shown in Algorithm 3. In essence, the algorithm tries to identify a small set of correct and helpful invariants that can eventually prove the safety property. &#167;8 shows that the combination of bottom-up with top-down refinement provides fast performance for finding inductive invariants across a wide-range of protocols.</p><p>Algorithm 3 consists of two procedures. In Procedure 1, DuoAI first extracts all the &#8704;-only invariants from the candidate invariants (Line 1), which are guaranteed to be the strongest &#8704;only invariants that hold on the samples. Then, DuoAI runs the top-down refinement algorithm (Line 2) using only the universal invariants and the universal portion of the minimum implication graph by removing all nodes representing existentially quantified formulas. The safety property is neglected in this topdown refinement. In this way, the &#8704;-only invariants are monotonically weakened until they become inductive, regardless of whether the safety property can be proved (it probably cannot). Recall that we call the now inductive &#8704;-only invariants the universal inductive core. DuoAI then puts every enumerated candidate invariant that is not in the universal inductive core into noncore (Line 3). noncore mainly consists of formulas with existential quantifiers, but also includes &#8704;-only formulas that are not in the core, whose inductiveness may depend on &#8707;included invariants. For example, for the simplified consensus protocol, the universal inductive core includes five candidate invariants, which are exactly the equivalent forms of Eq. ( <ref type="formula">1</ref>), (2), and (4). There are 14 non-core candidate invariants, 13 of which have quantifier alternation, including Eq. ( <ref type="formula">3</ref>) and <ref type="bibr">(10)</ref>.</p><p>Based on our observation that SMT solvers struggle with quantifier alternation, we expect noncore formulas will have a much higher cost of checking. Procedure 2 aims to identify a small subset of noncore to strengthen the candidate invariants, such that the conjunction of the universal inductive core and the subset (denoted as core&#8746;sub), or their weaker forms, can prove the safety property. Procedure 2 enumerates each subset sub of noncore (Line 5), and runs the monotonic weakening algorithm (Algorithm 2) on core&#8746;sub (Line 8). If Algorithm 2 returns NotProvable (Line 9), DuoAI moves on to consider the next subset. Otherwise, Algorithm 2 outputs a correct inductive invariant (Line 13). The enumeration of subsets is conducted in increasing order of size, starting from the / 0, followed by all single formulas from noncore, then pairs, triples, and so on.</p><p>Whenever Algorithm 2 finds the safety property failed and reports NotProvable, Ivy returns a counterexample of inductiveness s a -&#8594; s &#8242; (Line 10), which means starting from a protocol state s satisfying the safety property and the candidate invariants, and taking an action a, the system reaches a new state s &#8242; where the safety property is violated. <ref type="foot">1</ref>If we view the samples from protocol simulation as positive samples on which the invariants must hold, then we can view these counterexample states s as negative samples which the invariants must exclude. DuoAI needs to identify and include another invariant I that does not hold on s, so that the counterexample s a -&#8594; s &#8242; can be excluded. When enumerating a subset of noncore, Procedure 2 first checks if the subset can exclude all counterexamples seen so far (Line 6). If there exists one counterexample state s on which all invariants in the subset hold, or in other words, the counterexample cannot be excluded, the monotonic weakening algorithm is bound to fail, because if a stronger invariant cannot exclude the counterexample, then its weaker forms cannot either. So Procedure 2 simply moves on to enumerate the next subset (Line 7).</p><p>For the simplified consensus protocol, when sub = / 0, the safety property fails and Ivy gives the counterexample s</p><p>does not hold on s, so it can exclude this counterexample. In contrast, Eq. (10) holds on s, so the counterexample will persist even if Eq. ( <ref type="formula">10</ref>) is added to the candidate set. Therefore, DuoAI will skip Eq. ( <ref type="formula">10</ref>) and try Eq. ( <ref type="formula">3</ref>), and run Algorithm 2 on its conjunction with the universal core, which gives a correct inductive invariant set consisting of Eq. (1)(2)(3)(4).</p><p>Although counterexamples can be used for top-down refinement, DuoAI currently does not because Ivy cannot return counterexamples in batch. When Ivy is configured to return a counterexample, it terminates once it identifies the first broken invariant. This is inefficient for top-down refinement, but for bottom-up refinement, counterexamples are only needed for the safety property, so DuoAI puts the safety property on top of other invariants and Ivy will give the desired counterexample.</p><p>Like top-down refinement, bottom-up refinement has a theoretical guarantee to eventually find an inductive invariant if one exists in the search space, as stated in the following theorem: Theorem 3. For any protocol P and finite search space S, if there exists an inductive invariant II * &#8834; S that can prove the safety property, then Algorithm 1 followed by Algorithm 3 will output such an inductive invariant II in finite time.</p><p>Proof. We first prove that Algorithm 3 terminates in finite time. This directly follows from three facts: 1) powerset(noncore) is a finite set so the for loop (Line 5) has a finite number of iterations; 2) In each loop iteration, there is at most one invocation of Algorithm 2 (Line 8); and 3) From Theorem 2, Algorithm 2 terminates in finite time.</p><p>To prove the soundness of Algorithm 3, we first observe that if Algorithm 3 outputs an invariant, it must be a correct inductive invariant, because the output must come from Algorithm 2, in which the output can only occur when the safety property is proved. Now we prove that there will be an output invariant eventually. Observe that noncore &#8712; powerset(noncore) (Line 5). When sub = noncore, we have CI &#8834; core&#8746;sub, then Line 8 degenerates to Algorithm 2 in &#167;5. From Theorem 2, we know a correct inductive invariant will be outputted.</p><p>For both the top-down and bottom-up refinement, if NotProvable is returned, we know the protocol cannot be verified using invariants in the search space S. DuoAI will try a larger search space by increasing either max_literal, max_or, max_and, or max_exists, or the per-domain number of quantified variables. By default, DuoAI alternates among the five in a round-robin manner. DuoAI sets the initial max_literal = 4, max_or = max_and = 3, and max_exists = 1 unless the safety property already involves k &#8805; 2 existentially quantified variables, in which case DuoAI sets max_exists = k. DuoAI sets the initial number of quantified variables for domain T as the maximum number of variables of type T in any relation. For example, the relation vote(N1:node,N2:node) guarantees type node has at least two variables.</p><p>Because SMT solvers are much less efficient at checking invariants with existential quantifiers, and many distributed protocols are provable by &#8704;-only invariants <ref type="bibr">[21]</ref>, DuoAI runs a &#8704;-only instance (i.e., max_exists = 0) in parallel. The &#8704;-only instance only runs top-down refinement, as bottom-up refinement degenerates to the same top-down refinement (Line 2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Optimizations Based on Mutual Implication</head><p>In using the minimum implication graph, DuoAI introduces several optimizations based on mutual implication relations among formulas. These relations further prune the search space and avoid redundant candidate invariants. DuoAI considers two kinds of mutual implication relations, 1) P 1 &#8743; P 2 &#8743; ... &#8743; P k &#8658; Q, and 2) P 1 &#8743; P 2 &#8743; ... &#8743; P k &#8660; Q. Although the latter is a special case of the former, DuoAI treats them differently. We refer to Q as a conjunction implied formula in the former and an equivalently decomposable formula in the latter. Since checking inductiveness has a quadratic complexity with the number of invariants, these optimizations have a significant improvement on efficiency.</p><p>Conjunction implied formulas. DuoAI identifies conjunction implied formulas to avoid redundant candidate invariants. Given a mutual implication relation P 1 &#8743; P 2 &#8743; ... &#8743; P k &#8658; Q, if all P 1 ,P 2 ,...P k are already in the candidate invariants, DuoAI will mark Q as a conjunction implied formula and not add Q to the candidate invariants. Later during refinement, if one of P 1 ,P 2 ,...P k is invalidated by Ivy, then the conjunction implied invariant Q is no longer redundant and will be added to the candidate invariants.</p><p>For example, suppose we have a disk replication protocol with the following three invariants:</p><p>One can check that among Eq. ( <ref type="formula">11</ref>)(12) <ref type="bibr">(13)</ref>, no formula can imply another. But the conjunction of Eq. ( <ref type="formula">11</ref>) and ( <ref type="formula">12</ref>) can imply Eq. ( <ref type="formula">13</ref>). This is because Eq. ( <ref type="formula">12</ref>) says that for every epoch E, there must be a readable replica R. Then from Eq. ( <ref type="formula">11</ref>), the readable replica R cannot be crashed. Therefore, for every epoch E, there must be a replica R that does not crash, which is expressed by Eq. ( <ref type="formula">13</ref>). If Eq. ( <ref type="formula">11</ref>) and ( <ref type="formula">12</ref>) are already candidate invariants, DuoAI will mark Eq. ( <ref type="formula">13</ref>) as a conjunction implied formula and not add it to the candidate invariants.</p><p>There are many classes of mutual implication relations in first-order logic. DuoAI identifies three classes of conjunction implied formulas to prune candidate invariants; in each class, the first two formulas mutually imply the third:</p><p>1. Replace a literal with a weaker literal, as discussed in the example Eq. ( <ref type="formula">11</ref>)(12)(13):</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.</head><p>Conjunct a literal with a weaker literal:</p><p>(r(X)&#8743;s(X)&#8743;...)&#8744;...</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">"</head><p>Merge" a &#8704; formula and an &#8707; formula:</p><p>In all three classes, r and s can be generalized to conjunctions (e.g., r 1 (X) &#8743; &#172;r 2 (X), &#172;s 1 (X) &#8743; s 2 (X) &#8743; s 3 (X)). A key advantage of this optimization is that given a finite search space, DuoAI can identify conjunction implied formulas based on invariants within that search space, even though the conjunction of invariants is not in that search space.</p><p>Equivalently decomposable formulas. DuoAI also identifies equivalently decomposable formulas to avoid redundant candidate invariants. Given a mutual implication relation P 1 &#8743; P 2 &#8743; ... &#8743; P k &#8660; Q, DuoAI will mark Q as an equivalently decomposable formula and never add Q to the candidate invariants. Later during refinement, if one of P 1 ,P 2 ,...P k is invalidated by Ivy, Q will also be invalidated and therefore there is never any reason to consider Q further as a candidate invariant. For example, suppose the disk replication protocol has invariant:</p><p>There is no need to ever include such an invariant in the candidate set, because it is equivalently decomposable to invariants <ref type="bibr">(15)</ref> and <ref type="bibr">(16)</ref>.</p><p>However, suppose we slightly modify invariant <ref type="bibr">(14)</ref> to one of the following two formulas:</p><p>These invariants are not equivalently decomposable to invariants <ref type="bibr">(15)</ref> and ( <ref type="formula">16</ref>). Take Eq. ( <ref type="formula">17</ref>) as an example. One can verify that (17) &#8658; (15)&#8743;( <ref type="formula">16</ref>), but (15)&#8743;( <ref type="formula">16</ref>) &#824; &#8658; (17), because Eq. ( <ref type="formula">17</ref>) requires the same replica to be both readable and writable, while for Eq. ( <ref type="formula">15</ref>) and ( <ref type="formula">16</ref>), it is possible to have one readable replica and a different writable replica. DuoAI identifies if a formula is equivalently decomposable based on the structure of the formula itself by considering two classes of equivalently decomposable formulas. One class is embodied by the following lemma: Lemma 6. For a formula F in prenex and disjunctive normal form, we build a graph G C for each conjunction C in F. G C has one node for each literal, and an edge between two literals if and only if they share an existentially quantified variable. If for some C in F, the graph G C has k &#8805; 2 connected components, then F is equivalently decomposable into k formulas.</p><p>Proof. For simplicity, here we show the proof for k = 2 (i.e., the literals in C form two connected components). If there are k &gt; 2 connected components, then the same analysis below will show that formula F is equivalently decomposable into k formulas.</p><p>Literals that share &#8707;-variables must be in the same connected component. So we can divide the &#8707;-variables into two sets {Y 1 ,...,Y m } and {Z 1 ,...,Z n }. The first connected component can only include &#8707;-variables Y 1 ,...Y m , while the second can only include Z 1 ,...,Z n . Assume the quantifier prefix has shape &#8704;X</p><p>and Z 1 ... Z n . The proof can be generalized to any alternating &#8704;/&#8707; using skolemization.</p><p>Let f ( &#8407; X, &#8407; Y ) be the disjunction of literals in the first connected component, and g( &#8407; X, &#8407; Z) be the disjunction of literals in the second connected component. Let h( &#8407; X, &#8407; Y , &#8407; Z) be the disjunction of all conjunctions other than C in formula F. Then formula F can be rewritten as:</p><p>We now show that, Eq. ( <ref type="formula">19</ref>) is equivalently decomposable into:</p><p>It is trivial that Eq. ( <ref type="formula">19</ref>) implies both Eq. ( <ref type="formula">20</ref>) and <ref type="bibr">(21)</ref>. We now show the interesting direction -the conjunction of Eq. ( <ref type="formula">20</ref>) and ( <ref type="formula">21</ref>) implies Eq. <ref type="bibr">(19)</ref>. Suppose both Eq. ( <ref type="formula">20</ref>) and ( <ref type="formula">21</ref>) hold. For any &#8407; X, consider two cases:</p><p>, so Eq. ( <ref type="formula">19</ref>) still holds. Putting the two cases together, when both Eq. ( <ref type="formula">20</ref>) and ( <ref type="formula">21</ref>) are true, Eq. ( <ref type="formula">19</ref>) must be true.</p><p>The proof also gives an algorithm to find the k decomposed formulas. Figure <ref type="figure">6</ref> shows how to apply Lemma 6 on the aforementioned formulas. For Eq. ( <ref type="formula">14</ref>), the two literals readable(E, R 1 ) and writable(E, R 2 ) share no &#8707;-quantified variable (E is &#8704;-quantified), so there is no edge between the two literals. The graph has two connected components, so Eq. ( <ref type="formula">14</ref>) is equivalently decomposable into two formulas (Eq. (15)( <ref type="formula">16</ref>)). For Eq. ( <ref type="formula">17</ref>), the two literals share an &#8707;-quantified variable R 1 , so there is an edge between the two literals and the graph is connected, which means Eq. ( <ref type="formula">17</ref>) cannot be decomposed in this way. The same analysis can be applied to Eq. <ref type="bibr">(18)</ref>.</p><p>We note a corollary of Lemma 6. For an &#8707;-free formula, it is equivalently decomposable if it has any conjunction. For example, &#8704;X. p(X) &#8743; q(X) is equivalent with the pair &#8704;X. p(X) and &#8704;X. q(X). This indicates that we do not need to consider any conjunction when enumerating &#8704;-only formulas, a significant reduction in search space.</p><p>The other class of equivalently decomposable formulas that DuoAI identifies is embodied in the following:</p><p>One can check that Eq. ( <ref type="formula">22</ref>) is equivalently decomposable to Eq. ( <ref type="formula">23</ref>) and ( <ref type="formula">24</ref>), and will therefore not be added as a candidate invariant. In general, DuoAI only considers formulas whose leading &#8704;-quantified variables are unique. Similar optimizations have been used in DistAI <ref type="bibr">[38]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Evaluation</head><p>Experimental setup. To demonstrate the performance of DuoAI, we implemented and evaluated DuoAI on 27 distributed protocols from multiple sources <ref type="bibr">[12,</ref><ref type="bibr">13,</ref><ref type="bibr">21,</ref><ref type="bibr">27,</ref><ref type="bibr">28]</ref>, including those that can only be proved by inductive invariants with &#8707;-quantifiers. The DuoAI implementation consists of 6.1K lines of C++ code for invariant enumeration and refinement, compiled by gcc 7.5.0, and 2.3K lines of Python code running with Python 3.8.10 for protocol simulation.</p><p>For comparison, we also ran 6 other invariant inference tools: SWISS <ref type="bibr">[10]</ref>, IC3PO <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>, FOL-IC3 <ref type="bibr">[13]</ref>, DistAI <ref type="bibr">[38]</ref>, UPDR <ref type="bibr">[12]</ref>, and I4 <ref type="bibr">[21]</ref>. All experiments were performed on a Dell Precision 5829 workstation with a 4.3GHz 28-core Intel Xeon W-2175, 62GB RAM, and a 512GB Intel SSD Pro 600p, running Ubuntu 18.04. We configured the alternative invariant inference tools following their best practices. SWISS requires the user to bound the search space by specifying 4 parameters, including the number of existentially quantified variables and the number of literals in a formula. For every protocol, we use the same parameter settings as in SWISS's own evaluation <ref type="bibr">[10]</ref>. IC3PO and I4 require the user to specify a finite instance size for their model checkers to work on. For IC3PO, we only specified the minimum size and the tool itself could determine how to increase the instance size. For I4, we started from the minimum size where the protocol can function and iteratively increased the instance size upon failure (e.g., node = 2,node = 3,...). FOL-IC3 provides a &#8704;-only mode and a default mode. We ran both and report the runtime of whichever succeeded first.</p><p>Results summary. Table <ref type="table">1</ref> shows the running time in seconds for each tool on each protocol. For each protocol, we also report the number of relations and lines of code in its Ivy specification; for example, Figure <ref type="figure">1</ref> is a simplified version of consensus epr. The top portion of the table shows protocols provable with a &#8704;-only inductive invariant, while the bottom portion shows protocols that can only be proved with a &#8707;included inductive invariant. We allowed each tool to spend up to an entire week trying to solve each protocol. For protocols that a tool fails to solve, we report "fail" if the tool terminated without an inductive invariant, "error" if the tool itself returned an error, "Z3 error" if the underlying SMT solver used returned an error, "memout" if the tool ran out of memory and terminated, and "timeout" if the tool did not complete within a week.</p><p>DuoAI dominates all other tools in the number of protocols it solves, solving all but 1 of the 27 protocols. SWISS cannot solve 8 protocols, FOL-IC3 and IC3PO cannot solve 9 protocols, DistAI and UPDR cannot solve 13 protocols, while I4 cannot solve 15 protocols. DuoAI is the only tool that solves all &#8704;-only protocols, is the only tool that solves Paxos as well as all other non-Paxos protocols with &#8707; quantifiers, and is the only tool that solves 3 of the more complex Paxos variants, including multi-Paxos, stoppable Paxos, and fast Paxos. There were no protocols solvable by another tool that were not solved by DuoAI.</p><p>DuoAI also dominates all other tools in how fast it solves the protocols, solving 15 of the protocols faster than any other tool. DuoAI is faster than SWISS on all but 3 of the protocols solved by SWISS, is faster than IC3PO on all but 3 of the protocols solved by IC3PO, is faster than FOL-IC3 on all but 2 of the protocols solved by FOL-IC3, and is faster than UPDR on all of the protocols solved by UPDR. DuoAI is up to 3 orders of magnitude faster than each of these protocols. DuoAI is faster than DistAI on all but 5 of the protocols solved by DistAI, and is faster than I4 on all but 2 of the protocols solved by I4. DuoAI is up to an order of magnitude faster than either DistAI or I4. The speed differences versus DistAI and I4 appear less in part because neither could solve any of the protocols with existential quantifiers. In most cases in which DuoAI is slower than other protocols, it is by at most a few seconds.</p><p>Detailed comparison and discussion. For the protocols provable with &#8704;-only invariants, DuoAI is the only tool that solves all 15 protocols. On &#8704;-only protocols, DuoAI's &#8704;-only instance is similar to DistAI, without subsampling and with mutual implication optimization and parallelism in simulation. DuoAI beats DistAI on 10 protocols. Unlike DuoAI, DistAI times out on ticket lock, which we discovered is due to a bug in the implementation of its protocol simulation. Chord ring maintenance is the only protocol on which DuoAI is much slower than DistAI. DistAI only allows invariants as disjunction of literals, and implements an invariant as a vector&lt;int&gt;. In contrast, DuoAI considers invariants in disjunctive normal form, so an invariant is a disjunction of conjunction of literals, implemented as a set&lt;vector&lt;int&gt;&gt;. This makes invariant operation slower in DuoAI. Chord ring maintenance is the only &#8704;-provable protocol that takes significant time on candidate invariant enumeration so the overhead is exacerbated.</p><p>For the protocols that require invariants with &#8707;-quantifiers to prove, DuoAI solves 11 out of 12 protocols, more than any other tool. DuoAI only fails on vertical paxos, which other tools also fail on. DistAI, I4, and UPDR fail on all of these protocols because they can only generate &#8704;-only invariants. IC3PO solves 4 protocols and fails on 3 protocols, but runs out of memory on 6 protocols, because it requires model checking to infer invariants on a finite instance. For more complex protocols like fast Paxos, the model checker requires too large of an instance size. In contrast, DuoAI searches in formula space and its performance does not depend (exponentially) on instance size. As the only other tool that solves Paxos, it is instructive to compare SWISS with DuoAI. Similar to DuoAI, SWISS also enumerates candidate invariants given a bounded search space and checks their inductiveness using the SMT solver. However, it has two fundamental differences compared with DuoAI. First, SWISS relies exclusively on the SMT solver to tell the correctness of invariants, while DuoAI also uses the samples from protocol simulation to filter out invalid invariants. As we demonstrated in &#167;6, SMT calls can be expensive with quantifier alternation and will negatively affect performance. Second, SWISS struggles to find mutually inductive invariants, i.e., a bundle of invariants that are inductive together but none are inductive individually. This is because SWISS can only build the invariant set by adding one and only one invariant each time and keep the set inductive. In the lock server async and the sharded key-value store protocols, where mutually inductive invariants are required to prove the safety property, SWISS has to manually increase the maximum number of literals from 6 to 9. This allows the mutually inductive invariants to be conjuncted into one big invariant, but results in a much larger search space and long runtimes of 44 and 128 minutes, respectively. DuoAI enumerates candidate invariants following the minimum implication graph and generates the strongest candidate invariants. The mutually inductive invariants (or their stronger forms) are guaranteed to be in the candidate invariants together. DuoAI solved both protocols within 2 seconds.</p><p>For the vertical paxos protocol, the human-expert inductive invariants include an invariant with 8 literals. Even after the optimization based on mutual implication, it still has 7 i The IC3PO authors <ref type="bibr">[8,</ref><ref type="bibr">9]</ref> reported that IC3PO succeeded on client server db ae (17 s), hybrid reliable broadcast (587 s), Paxos (568 s), and flexible Paxos (561 s). However, they retrofitted the protocols and manually provided clauses with quantifier alternation that could appear in the invariants, which is difficult to do without first knowing the ground-truth invariants. The 4 protocols have much simpler inductive invariants when expressed on top of these clauses, with all except the simplest, client server db ae, becoming &#8707;-free. Ivy fails when checking the invariants generated by IC3PO for Paxos and flexible Paxos. The IC3PO authors <ref type="bibr">[9]</ref> imply that the invariants had to be manually checked against the human-expert invariants. literals. Under the minimum per-domain number of quantified variables that can encode the human-expert invariants, there are 60 predicates that can appear in the invariants. Considering their negations, the size of the invariant search space is at the magnitude of 120 7 &#8776; 4e14, well exceeding the computational power of a normal workstation. In comparison, for fast paxos, the largest invariant includes 5 literals, and there are 38 predicates. The size of the search space is at the magnitude of 3e9. For vertical Paxos, DuoAI ends with a universal core and a set of checked non-core invariants when exhausting memory. These invariants are inductive and can be utilized as hints, although they cannot imply the safety property. As explained in &#167;6, DuoAI runs top-down refinement, bottom-up refinement, and a &#8704;-only instance in parallel. Not surprisingly, the &#8704;-only instance generates the inductive invariants first for all 15 protocols that do not require existential quantifiers. Among the 11 protocols solved by DuoAI that require existential quantifiers, the top-down refinement gives the inductive invariants first for the 5 simpler protocols -client server ae, client server db ae, toy consensus epr, consensus epr, and sharded kv no lost keys. The bottom-up refinement also succeeded but took longer. For example, for client server db ae, there are 8 candidate invariants in noncore. A subset of size 3 was sufficient to prove the safety property. However, the bottom-up refinement would first enumerate and fail on all single invariants and pairs before enumerating the correct triple. This takes more than 3 times longer than top-down refinement, in which after a single round of weakening, DuoAI found an inductive invariant.</p><p>For the 6 more complicated protocols with existential quantifiers, including hybrid reliable broadcast and the 5 Paxos-family protocols solved, only the bottom-up refinement generated the inductive invariants. The top-down refinement got stuck at checking the inductiveness of the invariants. For example, for multi-Paxos, after enumeration, DuoAI has a candidate invariant set of size 615, and 581 of them have quantifier alternation. Checking inductiveness of this many formulas is a hopeless task for the Z3 SMT solver. However, to prove the safety property, only 2 of the 581 candidate invariants are needed. In the bottom-up refinement, each time only a small subset of noncore invariants conjuncted with the &#8704;-only core are fed to Ivy, so the Z3 SMT solver could handle the candidate invariants. For Paxos, flexible Paxos, multi-Paxos, and stoppable Paxos, a subset of size 2 were sufficient, while a subset of size 6 was needed for fast Paxos. This also validates our assumption that real-world distributed protocols should have concise invariants, and should not require too many invariants with quantifier alternation to verify.</p><p>Limitations By requiring quantifier alternation to conform to a fixed order of types, DuoAI ensures that the verification condition is in a decidable fragment of first-order logic. However, without decidability, an SMT solver may still succeed. For client server db ae and hybrid reliable broadcast, the invariants written by human experts are not in a decidable fragment, yet they can be efficiently verified by the SMT solver. For both protocols, DuoAI found alternative inductive invariants within the decidable fragment. If a protocol cannot be verified in decidable logic, DuoAI will fail to prove it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">Related Work</head><p>Many studies <ref type="bibr">[11,</ref><ref type="bibr">17,</ref><ref type="bibr">20,</ref><ref type="bibr">31,</ref><ref type="bibr">[34]</ref><ref type="bibr">[35]</ref><ref type="bibr">[36]</ref> verify the correctness of distributed protocols with manually given inductive invariants. Early systems <ref type="bibr">[12,</ref><ref type="bibr">21,</ref><ref type="bibr">29,</ref><ref type="bibr">38]</ref> for automatically inferring inductive invariants do not work for invariants with &#8707;-quantifiers which are required for protocols such as Paxos, though pdH <ref type="bibr">[29]</ref> can find inductive invariants for retrofitted Paxos-family protocols without &#8707;-quantifiers.</p><p>Recent systems consider invariants with &#8707;-quantifiers. FOL-IC3 <ref type="bibr">[13]</ref> generates an invariant candidate that can separate a positive and negative example set, and iteratively adds more examples until the invariant is correct. It has no theoretical guarantee of success. Its heavy use of SMT queries to validate as well as synthesize invariants makes it slow in practice, timing out on even protocols with &#8704;-only inductive invariants. SWISS <ref type="bibr">[10]</ref> iteratively strengthens an invariant by adding small inductive formulas until the invariant is strong enough to prove the safety property. It was the first tool to automatically verify Paxos. Its inefficiency in exploring the search space and inability to infer mutually inductive invariants make it fail on several protocols solved by alternative tools.</p><p>IC3PO <ref type="bibr">[8,</ref><ref type="bibr">9]</ref> applies model checking on a finite instance similar to I4, while adding support to generalize existentially quantified invariants. The model checker functions well on small instances, but frequently exhausts memory on complex protocols that require larger instances, as shown in Table <ref type="table">1</ref>.</p><p>P-FOL-IC3 <ref type="bibr">[14]</ref> is concurrent work that extends FOL-IC3 by exploiting parallelism in formula search, introducing the invariant-friendly k-Term Pseudo-DNF to bound the search space, and randomly guessing some not-yet-inductive formulas to be eventually inductive, forcing their counterexamples to be excluded. P-FOL-IC3 has no theoretical guarantee and is less robust in practice due to its randomized nature; it failed in three out of five trials on stoppable Paxos, and two out of five trials on fast Paxos.</p><p>The tools discussed above, along with DuoAI, only verify safety properties of distributed protocols. Complementary work has explored connecting verification of protocols to their practical implementations <ref type="bibr">[11,</ref><ref type="bibr">31]</ref>, and verifying liveness properties of distributed protocols <ref type="bibr">[26]</ref>.</p><p>AutoML <ref type="bibr">[5,</ref><ref type="bibr">18,</ref><ref type="bibr">33]</ref> searches for machine learning models and hyperparameters, which may appear similar to finding inductive invariants. However, the inductiveness of invariants has strong correlation, which is difficult to capture for AutoML.</p><p>Many automated invariant inference tools have been built for other domains, mostly on learning loop invariants to verify sequential programs <ref type="bibr">[3,</ref><ref type="bibr">4,</ref><ref type="bibr">6,</ref><ref type="bibr">7,</ref><ref type="bibr">15,</ref><ref type="bibr">24,</ref><ref type="bibr">25,</ref><ref type="bibr">30,</ref><ref type="bibr">32,</ref><ref type="bibr">37,</ref><ref type="bibr">39]</ref>. Invariant inference has been used to prove properties on inductive algebraic data types <ref type="bibr">[16,</ref><ref type="bibr">22]</ref>, integer linear dynamical systems <ref type="bibr">[19]</ref>, and deep neural networks <ref type="bibr">[2]</ref>. None of these methods consider nondeterminism in concurrent or distributed settings, thus they cannot be directly applied to distributed protocols.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10">Conclusions</head><p>DuoAI automatically and efficiently infers inductive invariants for verifying distributed protocols by reducing SMT costs. It introduces the minimum implication graph to define the structure of the invariant search space. This enables efficient enumeration of possible invariants, which are checked on samples from protocol simulation to reduce SMT queries. DuoAI guarantees that the enumerated candidate invariants are at least as strong as any correct invariants. DuoAI then runs top-down and bottom-up refinement in parallel. The former monotonically weakens the candidate invariants following the minimum implication graph. The latter divides the candidate invariants into an SMT-friendly universal inductive core and other noncore invariants, and searches for a small subset of noncore invariants that can be added to the core to prove the safety property of the protocol. Both top-down and bottom-up refinement have strong theoretical guarantees for finding inductive invariants, and their combination is effective at reducing SMT query costs for invariants with existential quantifiers. DuoAI dominates alternative tools in both the number of protocols it verifies and the speed at which it does so, including giving automated proofs for several Paxos variants.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>P = Eq.(6) Q = Eq.(7).</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>In general, other than showing an invariant is not inductive, a counterexample may also show an invariant does not hold at the protocol initial state. But this cannot happen to the safety property, unless the protocol is wrong.</p></note>
		</body>
		</text>
</TEI>
