skip to main content


Title: Automating Design Requirement Extraction From Text With Deep Learning
Abstract

Nearly every artifact of the modern engineering design process is digitally recorded and stored, resulting in an overwhelming amount of raw data detailing past designs. Analyzing this design knowledge and extracting functional information from sets of digital documents is a difficult and time-consuming task for human designers. For the case of textual documentation, poorly written superfluous descriptions filled with jargon are especially challenging for junior designers with less domain expertise to read. If the task of reading documents to extract functional requirements could be automated, designers could actually benefit from the distillation of massive digital repositories of design documentation into valuable information that can inform engineering design. This paper presents a system for automating the extraction of structured functional requirements from textual design documents by applying state of the art Natural Language Processing (NLP) models. A recursive method utilizing Machine Learning-based question-answering is developed to process design texts by initially identifying the highest-level functional requirement, and subsequently extracting additional requirements contained in the text passage. The efficacy of this system is evaluated by comparing the Machine Learning-based results with a study of 75 human designers performing the same design document analysis task on technical texts from the field of Microelectromechanical Systems (MEMS). The prospect of deploying such a system on the sum of all digital engineering documents suggests a future where design failures are less likely to be repeated and past successes may be consistently used to forward innovation.

 
more » « less
Award ID(s):
1854833
NSF-PAR ID:
10340893
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 47th Design Automation Conference (DAC)
Volume:
Volume 3B
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Anwer, Nabil (Ed.)
    Design documentation is presumed to contain massive amounts of valuable information and expert knowledge that is useful for learning from the past successes and failures. However, the current practice of documenting design in most industries does not result in big data that can support a true digital transformation of enterprise. Very little information on concepts and decisions in early product design has been digitally captured, and the access and retrieval of them via taxonomy-based knowledge management systems are very challenging because most rule-based classification and search systems cannot concurrently process heterogeneous data (text, figures, tables, references). When experts retire or leave a design unit, industry often cannot benefit from past knowledge for future product design, and is left to reinvent the wheel repeatedly. In this work, we present AI-based Natural Language Processing (NLP) models which are trained for contextually representing technical documents containing texts, figures and tables, to do a semantic search for the retrieval of relevant data across large corpora of documents. By connecting textual and non-textual data through the use of an associative database, the semantic search question-answering system we developed can provide more comprehensive answers in the context of users’ questions. For the demonstration and assessment of this model, the semantic search question-answering system is applied to the Intergovernmental Panel on Climate Change (IPCC) Special Report 2019, which is more than 600 pages long and difficult to read and understand, even by most experts. Users can input custom queries relating to climate change concerns and receive evidence from the report that is contextually meaningful. We expect this method can transform current repositories of design documentation of heterogeneous data forms into structured knowledge-bases which can return relevant information efficiently as well as can evolve to embody manageable big data for the true digital transformation of design. 
    more » « less
  2. Tang, P. ; Grau, D. ; El Asmar, M. (Ed.)
    Existing automated code checking (ACC) systems require the extraction of requirements from regulatory textual documents into computer-processable rule representations. The information extraction processes in those ACC systems are based on either human interpretation, manual annotation, or predefined automated information extraction rules. Despite the high performance they showed, rule-based information extraction approaches, by nature, lack sufficient scalability—the rules typically need some level of adaptation if the characteristics of the text change. Machine learning-based methods, instead of relying on hand-crafted rules, automatically capture the underlying patterns of the existing training text and have a great capability of generalizing to a variety of texts. A more scalable, machine learning-based approach is thus needed to achieve a more robust performance across different types of codes/documents for automatically generating semantically-enriched building-code sentences for the purpose of ACC. To address this need, this paper proposes a machine learning-based approach for generating semantically-enriched building-code sentences, which are annotated syntactically and semantically, for supporting IE. For improved robustness and scalability, the proposed approach uses transfer learning strategies to train deep neural network models on both general-domain and domain-specific data. The proposed approach consists of four steps: (1) data preparation and preprocessing; (2) development of a base deep neural network model for generating semantically-enriched building-code sentences; (3) model training using transfer learning strategies; and (4) model evaluation. The proposed approach was evaluated on a corpus of sentences from the 2009 International Building Code (IBC) and the Champaign 2015 IBC Amendments. The preliminary results show that the proposed approach achieved an optimal precision of 88%, recall of 86%, and F1-measure of 87%, indicating good performance. 
    more » « less
  3. Abstract

    Image texture, the relative spatial arrangement of intensity values in an image, encodes valuable information about the scene. As it stands, much of this potential information remains untapped. Understanding how to decipher textural details would afford another method of extracting knowledge of the physical world from images. In this work, we attempt to bridge the gap in research between quantitative texture analysis and the visual perception of textures. The impact of changes in image texture on human observer’s ability to perform signal detection and localization tasks in complex digital images is not understood. We examine this critical question by studying task-based human observer performance in detecting and localizing signals in tomographic breast images. We have also investigated how these changes impact the formation of second-order image texture. We used digital breast tomosynthesis (DBT) an FDA approved tomographic X-ray breast imaging method as the modality of choice to show our preliminary results. Our human observer studies involve localization ROC (LROC) studies for low contrast mass detection in DBT. Simulated images are used as they offer the benefit of known ground truth. Our results prove that changes in system geometry or processing leads to changes in image texture magnitudes. We show that the variations in several well-known texture features estimated in digital images correlate with human observer detection–localization performance for signals embedded in them. This insight can allow efficient and practical techniques to identify the best imaging system design and algorithms or filtering tools by examining the changes in these texture features. This concept linking texture feature estimates and task based image quality assessment can be extended to several other imaging modalities and applications as well. It can also offer feedback in system and algorithm designs with a goal to improve perceptual benefits. Broader impact can be in wide array of areas including imaging system design, image processing, data science, machine learning, computer vision, perceptual and vision science. Our results also point to the caution that must be exercised in using these texture features as image-based radiomic features or as predictive markers for risk assessment as they are sensitive to system or image processing changes.

     
    more » « less
  4. Abstract Axiomatic Design (AD) provides a powerful thinking framework for solving complex engineering problems through the concept of design domains and diligent mapping and decomposition between functional and physical domains. Despite this utility, AD is yet to be implemented for widespread use by design practitioners solving real world problems in industry and exists primarily in the realm of academia. This is due, in part, to a high level of design expertise and familiarity with its methodology required to apply the AD approach effectively. It is difficult to correctly identify, extract, and abstract top-level functional requirements (FRs) based on early-stage design research. Furthermore, guiding early-stage design by striving to maintain functional independence, the first Axiom, is difficult at a systems level without explicit methods of quantifying the relationship between high-level FRs and design parameters (DPs). To address these challenges, Artificial Intelligence (AI) methods, specifically in deep learning (DL) assisted Natural Language Processing (NLP), have been applied to represent design knowledge for machines to understand, and, following AD principles, support the practice of human designers. NLP-based question-answering is demonstrated to automate early-stage identification of FRs and to assist design decomposition by recursively mapping and traversing down along the FR-DP hierarchical structure. Functional coupling analysis could then be conducted with vectorized FRs and DPs from NLP-based language embeddings. This paper presents a framework for how AI can be applied to design based on the principles of AD, which will enable a virtual design assistant system based on both human and machine intelligence. 
    more » « less
  5. Hols, Thorsten Holz ; Ristenpart, Thomas (Ed.)
    Automated attack discovery techniques, such as attacker synthesis or model-based fuzzing, provide powerful ways to ensure network protocols operate correctly and securely. Such techniques, in general, require a formal representation of the protocol, often in the form of a finite state machine (FSM). Unfortunately, many protocols are only described in English prose, and implementing even a simple network protocol as an FSM is time-consuming and prone to subtle logical errors. Automatically extracting protocol FSMs from documentation can significantly contribute to increased use of these techniques and result in more robust and secure protocol implementations.In this work we focus on attacker synthesis as a representative technique for protocol security, and on RFCs as a representative format for protocol prose description. Unlike other works that rely on rule-based approaches or use off-the-shelf NLP tools directly, we suggest a data-driven approach for extracting FSMs from RFC documents. Specifically, we use a hybrid approach consisting of three key steps: (1) large-scale word-representation learning for technical language, (2) focused zero-shot learning for mapping protocol text to a protocol-independent information language, and (3) rule-based mapping from protocol-independent information to a specific protocol FSM. We show the generalizability of our FSM extraction by using the RFCs for six different protocols: BGPv4, DCCP, LTP, PPTP, SCTP and TCP. We demonstrate how automated extraction of an FSM from an RFC can be applied to the synthesis of attacks, with TCP and DCCP as case-studies. Our approach shows that it is possible to automate attacker synthesis against protocols by using textual specifications such as RFCs. 
    more » « less