skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Trustworthy Formal Natural Language Specifications
Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness. This paper shows that it is possible to build support for specifications written in expressive subsets of natural language, within existing proof assistants, consistent with the principles used to establish trust and auditability in proof assistants themselves. We implement a means to provide specifications in a modularly extensible formal subset of English, and have them automatically translated into formal claims, entirely within the Lean proof assistant. Our approach is extensible (placing no permanent restrictions on grammatical structure), modular (allowing information about new words to be distributed alongside libraries), and produces proof certificates explaining how each word was interpreted and how the sentence's structure was used to compute the meaning. We apply our prototype to the translation of various English descriptions of formal specifications from a popular textbook into Lean formalizations; all can be translated correctly with a modest lexicon with only minor modifications related to lexicon size.  more » « less
Award ID(s):
2220991
PAR ID:
10469926
Author(s) / Creator(s):
;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400703881
Page Range / eLocation ID:
50 to 70
Subject(s) / Keyword(s):
Formal specification natural language programming natural language specification categorial grammars proof assistants
Format(s):
Medium: X
Location:
Cascais Portugal
Sponsoring Org:
National Science Foundation
More Like this
  1. Fill, Hans-Georg; Mayo, Francisco; van Sinderen, Marten; Maciaszek, Leszek (Ed.)
    Safety-critical real-time systems require correctness to be validated beyond the design phase. In these systems, response time is as critical as correct functionality. Runtime verification is a promising approach for validating the correctness of system behaviors during runtime using monitors derived from formal system specifications. However, practitioners often lack formal method backgrounds, and no standard notation exists to capture system properties that serve their needs. To encourage the adoption of formal methods in industry, we present GROOT, a runtime monitoring tool for real-time systems that automatically generates efficient monitors from structured English statements. GROOT is designed with two branches, one for functional requirements and one for specifications with metric time constraints, which use appropriate formalisms to synthesize monitors. This paper introduces TIMESPEC, a structured English dialect for specifying timing requirements. Our tool also automates formal analysis to certify the C monitors’ construction. We apply GROOT to timing specifications from an industrial component and a simulated autonomous system in Simulink. 
    more » « less
  2. Communication tools such as email facilitate communication and collaboration between speakers of different languages, who use two primary strategies—English as a common language and machine translation (MT) tools—to help them overcome language barriers. However, each of these communication strategies creates its own challenges for cross-lingual communication. In this paper, we compare how people’s interpretations of an email sender’s social intention, and their evaluation of the email and the senders, differ when using a common language versus MT in email communication. We conducted an online experiment in which monolingual native English speakers read and rated request emails written by native English speakers, emails written by bilingual Chinese speakers in English, and emails written in Chinese then machine-translated into English. We found that participants interpreted the social intentions of the email sender less accurately for machine-translated emails than for emails written by non-native speakers in English. Participants also rated the senders and emails less positively overall for machine-translated emails compared to emails written by non-native speakers in English. Based on these findings, we suggest design possibilities that could better aid multilingual communication. 
    more » « less
  3. null (Ed.)
    We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language. Our methodology is to (1) generate training data automatically in the target language by augmenting machine-translated datasets with local entities scraped from public websites, (2) add a few-shot boost of human-translated sentences and train a novel XLMR-LSTM semantic parser, and (3) test the model on natural utterances curated using human translators. We assess the effectiveness of our approach by extending the current capabilities of Schema2QA, a system for English Question Answering (QA) on the open web, to 10 new languages for the restaurants and hotels domains. Our model achieves an overall test accuracy ranging between 61% and 69% for the hotels domain and between 64% and 78% for restaurants domain, which compares favorably to 69% and 80% obtained for English parser trained on gold English data and a few examples from validation set. We show our approach outperforms the previous state-of-the-art methodology by more than 30% for hotels and 40% for restaurants with localized ontologies for the subset of languages tested. Our methodology enables any software developer to add a new language capability to a QA system for a new domain, leveraging machine translation, in less than 24 hours. Our code is released open-source. 
    more » « less
  4. A growing body of work studies how to answer a question or verify a claim by generating a natural language “proof:” a chain of deductive inferences yielding the answer based on a set of premises. However, these methods can only make sound deductions when they follow from evidence that is given. We propose a new system that can handle the underspecified setting where not all premises are stated at the outset; that is, additional assumptions need to be materialized to prove a claim. By using a natural language generation model to abductively infer a premise given another premise and a conclusion, we can impute missing pieces of evidence needed for the conclusion to be true. Our system searches over two fringes in a bidirectional fashion, interleaving deductive (forward-chaining) and abductive (backward-chaining) generation steps. We sample multiple possible outputs for each step to achieve coverage of the search space, at the same time ensuring correctness by filtering low-quality generations with a round-trip validation procedure. Results on a modified version of the EntailmentBank dataset and a new dataset called Everyday Norms: Why Not? Show that abductive generation with validation can recover premises across in- and out-of-domain settings. 
    more » « less
  5. Formal verification of real-time systems is attractive because these systems often perform critical operations. Unlike non real-time systems, latency and response time guarantees are of critical importance in this setting, as much as functional correctness. Nevertheless, formal verification of real-time OSes usually stops the scheduling analysis at the policy level: they only prove that the scheduler (or its abstract model) satisfies some scheduling policy. In this paper, we go further and connect together Prosa, a verified schedulability analyzer, and RT-CertiKOS, a verified single-core sequential real-time OS kernel. Thus, we get a more general and extensible schedulability analysis proof for RT-CertiKOS, as well a concrete implementation validating Prosa models. It also showcases that it is realistic to connect two completely independent formal developments in a proof assistant. 
    more » « less