skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols.
Grammar-based fuzzing is a technique used to find soft- ware vulnerabilities by injecting well-formed inputs generated following rules that encode application semantics. Most grammar-based fuzzers for network protocols rely on human experts to manually specify these rules. In this work we study automated learning of protocol rules from textual specifications (i.e. RFCs). We evaluate the automatically extracted protocol rules by applying them to a state-of-the-art fuzzer for transport protocols and show that it leads to a smaller number of test cases while finding the same attacks as the system that uses manually specified rules.  more » « less
Award ID(s):
1815219
NSF-PAR ID:
10099364
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the ... Innovative Applications of Artificial Intelligence Conference
Volume:
11
Issue:
1
ISSN:
2154-8080
Page Range / eLocation ID:
28-32
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Designing and implementing distributed systems correctly is a very challenging task. Recently, formal verification has been successfully used to prove the correctness of distributed systems. At the heart of formal verification lies a computer-checked proof with an inductive invariant. Finding this inductive invariant, however, is the most difficult part of the proof. Alas, current proof techniques require inductive invariants to be found manually—and painstakingly—by the developer. In this paper, we present a new approach, Incremental Inference of Inductive Invariants (I4), to automatically generate inductive invariants for distributed protocols. The essence of our idea is simple: the inductive invariant of a finite instance of the protocol can be used to infer a general inductive invariant for the infinite distributed protocol. In I4, we create a finite instance of the protocol; use a model checking tool to automatically derive the inductive invariant for this finite instance; and generalize this invariant to an inductive invariant for the infinite protocol. Our experiments show that I4 can prove the correctness of several distributed protocols like Chord, 2PC and Transaction Chains with little to no human effort. 
    more » « less
  2. Hols, Thorsten Holz ; Ristenpart, Thomas (Ed.)
    Automated attack discovery techniques, such as attacker synthesis or model-based fuzzing, provide powerful ways to ensure network protocols operate correctly and securely. Such techniques, in general, require a formal representation of the protocol, often in the form of a finite state machine (FSM). Unfortunately, many protocols are only described in English prose, and implementing even a simple network protocol as an FSM is time-consuming and prone to subtle logical errors. Automatically extracting protocol FSMs from documentation can significantly contribute to increased use of these techniques and result in more robust and secure protocol implementations.In this work we focus on attacker synthesis as a representative technique for protocol security, and on RFCs as a representative format for protocol prose description. Unlike other works that rely on rule-based approaches or use off-the-shelf NLP tools directly, we suggest a data-driven approach for extracting FSMs from RFC documents. Specifically, we use a hybrid approach consisting of three key steps: (1) large-scale word-representation learning for technical language, (2) focused zero-shot learning for mapping protocol text to a protocol-independent information language, and (3) rule-based mapping from protocol-independent information to a specific protocol FSM. We show the generalizability of our FSM extraction by using the RFCs for six different protocols: BGPv4, DCCP, LTP, PPTP, SCTP and TCP. We demonstrate how automated extraction of an FSM from an RFC can be applied to the synthesis of attacks, with TCP and DCCP as case-studies. Our approach shows that it is possible to automate attacker synthesis against protocols by using textual specifications such as RFCs. 
    more » « less
  3. Fuzz testing has been gaining ground recently with substantial efforts devoted to the area. Typically, fuzzers take a set of seed inputs and leverage random mutations to continually improve the inputs with respect to a cost, e.g. program code coverage, to discover vulnerabilities or bugs. Following this methodology, fuzzers are very good at generating unstructured inputs that achieve high coverage. However fuzzers are less effective when the inputs are structured, say they conform to an input grammar. Due to the nature of random mutations, the overwhelming abundance of inputs generated by this common fuzzing practice often adversely hinders the effectiveness and efficiency of fuzzers on grammar-aware applications. The problem of testing becomes even harder, when the goal is not only to achieve increased code coverage, but also to nd complex vulnerabilities related to other cost measures, say high resource consumption in an application. We propose Saffron an adaptive grammar-based fuzzing approach to effectively and efficiently generate inputs that expose expensive executions in programs. Saffron takes as input a user-provided grammar, which describes the input space of the program under analysis, and uses it to generate test inputs. Saffron assumes that the grammar description is approximate since precisely describing the input program space is often difficult as a program may accept unintended inputs due to e.g., errors in parsing. Yet these inputs may reveal worst-case complexity vulnerabilities. The novelty of Saffron is then twofold: (1) Given the user-provided grammar, Saffron attempts to discover whether the program accepts unexpected inputs outside of the provided grammar, and if so, it repairs the grammar via grammar mutations. The repaired grammar serves as a specification of the actual inputs accepted by the application. (2) Based on the refined grammar, it generates concrete test inputs. It starts by treating every production rule in the grammar with equal probability of being used for generating concrete inputs. It then adaptively refines the probabilities along the way by increasing the probabilities for rules that have been used to generate inputs that improve a cost, e.g., code coverage or arbitrary user-defined cost. Evaluation results show that Saffron significantly outperforms state-of-the-art baselines. 
    more » « less
  4. Work on optimal protocols for \emph{Eventual Byzantine Agreement} (EBA)---protocols that, in a precise sense, decide as soon as possible in every run and guarantee that all nonfaulty agents decide on the same value---has focused on full-information protocols} (FIPs), where agents repeatedly send messages that completely describe their past observations to every other agent. While it can be shown that, without loss of generality, we can take an optimal protocol to be an FIP, full information exchange is impractical to implement for many applications due to the required message size. We separate protocols into two parts, the information-exchange protocol and the action protocol, so as to be able to examine the effects of more limited information exchange. We then define a notion of optimality with respect to an information-exchange protocol. Roughly speaking, an action protocol P is optimal with respect to an information-exchange protocol E if, with P, agents decide as soon as possible among action protocols that exchange information according to E. We present a knowledge-based EBA program for omission failures all of whose implementations are guaranteed to be correct and are optimal if the information exchange satisfies a certain safety condition. We then construct concrete programs that implement this knowledge-based program in two settings of interest that are shown to satisfy the safety condition. Finally, we show that a small modification of our program results in an FIP that s both optimal and efficiently implementable, settling an open problem posed by Halpern, Moses, and Waarts (SIAM J. Comput., 2001). 
    more » « less
  5. Verification is often regarded as a one-time procedure undertaken after a protocol is specified but before it is implemented. However, in practice, protocols continually evolve with the addition of new capabilities and performance optimizations. Existing verification tools are ill-suited to “tracking” protocol evolution and programmers are too busy (or too lazy?) to simultaneously co-evolve specifications manually. This means that the correctness guarantees determined at verification time can erode as protocols evolve. Existing software quality techniques such as regression testing and root cause analysis, which naturally support system evolution, are poorly suited to reasoning about fault tolerance properties of a distributed system because these properties require a search of the execution schedule rather than merely replaying inputs. This paper advocates that our community should explore the intersection of testing and verification to better ensure quality for distributed software and presents our experience evolving a data replication protocol at Elastic using a novel bug-finding technology called Lineage Driven Fault Injection (LDFI) as evidence. 
    more » « less