skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automated Attack Synthesis by Extracting Finite State Machines from Protocol Specification Documents
Automated attack discovery techniques, such as attacker synthesis or model-based fuzzing, provide powerful ways to ensure network protocols operate correctly and securely. Such techniques, in general, require a formal representation of the protocol, often in the form of a finite state machine (FSM). Unfortunately, many protocols are only described in English prose, and implementing even a simple network protocol as an FSM is time-consuming and prone to subtle logical errors. Automatically extracting protocol FSMs from documentation can significantly contribute to increased use of these techniques and result in more robust and secure protocol implementations.In this work we focus on attacker synthesis as a representative technique for protocol security, and on RFCs as a representative format for protocol prose description. Unlike other works that rely on rule-based approaches or use off-the-shelf NLP tools directly, we suggest a data-driven approach for extracting FSMs from RFC documents. Specifically, we use a hybrid approach consisting of three key steps: (1) large-scale word-representation learning for technical language, (2) focused zero-shot learning for mapping protocol text to a protocol-independent information language, and (3) rule-based mapping from protocol-independent information to a specific protocol FSM. We show the generalizability of our FSM extraction by using the RFCs for six different protocols: BGPv4, DCCP, LTP, PPTP, SCTP and TCP. We demonstrate how automated extraction of an FSM from an RFC can be applied to the synthesis of attacks, with TCP and DCCP as case-studies. Our approach shows that it is possible to automate attacker synthesis against protocols by using textual specifications such as RFCs.  more » « less
Award ID(s):
1801546
PAR ID:
10366064
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Hols, Thorsten Holz; Ristenpart, Thomas
Date Published:
Journal Name:
2022 IEEE Symposium on Security and Privacy (SP)
Page Range / eLocation ID:
51 to 68
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Distributed protocols should be robust to both benign malfunction (e.g. packet loss or delay) and attacks (e.g. message replay). In this paper we take a formal approach to the automated synthesis of attackers, i.e. adversarial processes that can cause the protocol to malfunction. Specifically, given a formal threat model capturing the distributed protocol model and network topology, as well as the placement, goals, and interface of potential attackers, we automatically synthesize an attacker. We formalize four attacker synthesis problems - across attackers that always succeed versus those that sometimes fail, and attackers that may attack forever versus those that may not - and we propose algorithmic solutions to two of them. We report on a prototype implementation called KORG and its application to TCP as a case-study. Our experiments show that KORG can automatically generate well-known attacks for TCP within seconds or minutes. 
    more » « less
  2. We develop 5GBaseChecker— an efficient, scalable, and dynamic security analysis framework based on differential testing for analyzing 5G basebands' control plane protocol interactions. 5GBaseChecker first captures basebands' protocol behaviors as a finite state machine (FSM) through black-box automata learning. To facilitate efficient learning and improve scalability, 5GBaseChecker introduces novel hybrid and collaborative learning techniques. 5GBaseChecker then identifies input sequences for which the extracted FSMs provide deviating outputs. Finally, 5GBaseChecker leverages these deviations to efficiently identify the security properties from specifications and use those to triage if the deviations found in 5G basebands violate any properties. We evaluated 5GBaseChecker with 17 commercial 5G basebands and 2 open-source UE implementations and uncovered 22 implementation-level issues, including 13 exploitable vulnerabilities and 2 interoperability issues. 
    more » « less
  3. null (Ed.)
    Automated reasoning tools for security protocols model protocols as non-deterministic processes that communicate through a Dolev-Yao attacker. There are, however, a large class of protocols whose correctness relies on an explicit ability to model and reason about randomness. Although such protocols lie at the heart of many widely adopted systems for anonymous communication, they have so-far eluded automated verification techniques. We propose an algorithm for reasoning about safety properties for randomized protocols. The algorithm is implemented as an extension of Stochastic Protocol ANalyzer (Span), the mechanized tool that reasons about the indistinguishability properties of randomized protocols. Using Span, we conduct the first automated verification on several randomized security protocols and uncover previously unknown design weaknesses in several of the protocols we analyzed. 
    more » « less
  4. Grammar-based fuzzing is a technique used to find soft- ware vulnerabilities by injecting well-formed inputs generated following rules that encode application semantics. Most grammar-based fuzzers for network protocols rely on human experts to manually specify these rules. In this work we study automated learning of protocol rules from textual specifications (i.e. RFCs). We evaluate the automatically extracted protocol rules by applying them to a state-of-the-art fuzzer for transport protocols and show that it leads to a smaller number of test cases while finding the same attacks as the system that uses manually specified rules. 
    more » « less
  5. One main challenge in the full automation of building code compliance checking is in the extraction and transformation of building code requirements into computable representations. Semantic rule-based approach has been taken mainly due to its expected better performance than machine learning-based approach on this particular task. With the recent advancement in deep learning AI, particularly the launch of ChatGPT by OpenAI, there is a potential for this landscape to be shifted given the highly regarded capabilities of ChatGPT in processing (i.e., understanding and generating) natural language texts and computer codes. In this paper, the author preliminarily explored the use of ChatGPT in converting (i.e., extracting and transforming) building code requirements into computer codes, and compared it with the results from cutting-edge semantic rule-based approach. It was found that comparing to the semantic rule-based approach, the conversion results from ChatGPT still has limitations, but there is a great potential for it to help speed up the implementation and scale-up of automated building code compliance checking systems. 
    more » « less