skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Detecting Vulnerability in Hardware Description Languages: Opcode Language Processing
Detecting vulnerable code blocks has become a highly popular topic in computer-aided design, especially with the advancement of natural language processing (NLP). Analyzing hardware description languages ( HDLs ), such as Verilog, involves dealing with lengthy code. This letter introduces an innovative identification of attack-vulnerable hardware by the use of opcode processing. Leveraging the advantage of architecturally defined opcodes and expressing all operations at the beginning of each code line, the word processing problem is efficiently transformed into opcode processing. This research converts a benchmark dataset into an intermediary code stack, subsequently classifying secure and fragile codes using NLP techniques. The results reveal a framework that achieves up to 94% accuracy when employing sophisticated convolutional neural networks (CNNs) architecture with extra embedding layers. Thus, it provides a means for users to quickly verify the vulnerability of their HDL code by inspecting a supervised learning model trained on the predefined vulnerabilities. It also supports the superior efficacy of opcode -based processing in Trojan detection by analyzing the outcomes derived from a model trained using the HDL dataset.  more » « less
Award ID(s):
2019511
PAR ID:
10529832
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Embedded Systems Letters
Volume:
16
Issue:
2
ISSN:
1943-0663
Page Range / eLocation ID:
222 to 226
Subject(s) / Keyword(s):
Convolutional neural networks (CNNs), deep learning, hardware description language (HDL), hardware security, natural language processing (NLP), opcode
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we evaluate the use of a trained Long Short-Term Memory (LSTM) network as a surrogate for a Euler–Bernoulli beam model, and then we describe and characterize an FPGA-based deployment of the model for use in real-time structural health monitoring applications. The focus of our efforts is the DROPBEAR (Dynamic Reproduction of Projectiles in Ballistic Environments for Advanced Research) dataset, which was generated as a benchmark for the study of real-time structural modeling applications. The purpose of DROPBEAR is to evaluate models that take vibration data as input and give the initial conditions of the cantilever beam on which the measurements were taken as output. DROPBEAR is meant to serve an exemplar for emerging high-rate “active structures” that can be actively controlled with feedback latencies of less than one microsecond. Although the Euler–Bernoulli beam model is a well-known solution to this modeling problem, its computational cost is prohibitive for the time scales of interest. It has been previously shown that a properly structured LSTM network can achieve comparable accuracy with less workload, but achieving sub-microsecond model latency remains a challenge. Our approach is to deploy the LSTM optimized specifically for latency on FPGA. We designed the model using both high-level synthesis (HLS) and hardware description language (HDL). The lowest latency of 1.42 µS and the highest throughput of 7.87 Gops/s were achieved on Alveo U55C platform for HDL design. 
    more » « less
  2. Hardware Description Language (HDL) is a common entry point for designing digital circuits. Differences in HDL coding styles and design choices may lead to considerably different design quality and performance-power tradeoff. In general, the impact of HDL coding is not clear until logic synthesis or even layout is completed. However, running synthesis merely as a feedback for HDL code is computationally not economical especially in early design phases when the code needs to be frequently modified. Furthermore, in late stages of design convergence burdened with high-impact engineering change orders (ECO’s), design iterations become prohibitively expensive. To this end, we propose a machine learning approach to Verilog-based Register-Transfer Level (RTL) design assessment without going through the synthesis process. It would allow designers to quickly evaluate the performance-power tradeoff among different options of RTL designs. Experimental results show that our proposed technique achieves an average of 95% prediction accuracy in terms of post-placement analysis, and is 6 orders of magnitude faster than evaluation by running logic synthesis and placement. 
    more » « less
  3. Hardware Description Language (HDL) is a common entry point for designing digital circuits. Differences in HDL coding styles and design choices may lead to considerably different design quality and performance-power tradeoff. In general, the impact of HDL coding is not clear until logic synthesis or even layout is completed. However, running synthesis merely as a feedback for HDL code is computationally not economical especially in early design phases when the code needs to be frequently modified. Furthermore, in late stages of design convergence burdened with high-impact engineering change orders (ECO’s), design iterations become prohibitively expensive. To this end, we propose a machine learning approach to Verilog-based Register-Transfer Level (RTL) design assessment without going through the synthesis process. It would allow designers to quickly evaluate the performance-power tradeoff among different options of RTL designs. Experimental results show that our proposed technique achieves an average of 95% prediction accuracy in terms of post-placement analysis, and is 6 orders of magnitude faster than evaluation by running logic synthesis and placement. 
    more » « less
  4. Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks. 
    more » « less
  5. In recent years, there has been a growing interest in and focus on the automatic detection of deceptive behavior. This attention is justified by the wide range of applications that deception detection can have, especially in fields such as criminology. This study specifically aims to contribute to the field of deception detection by capturing transcribed data, analyzing textual data using Natural Language Processing (NLP) techniques, and comparing the performance of conventional models using linguistic features with the performance of Large Language Models (LLMs). In addition, the significance of applied linguistic features has been examined using different feature selection techniques. Through extensive experiments, we evaluated the effectiveness of both conventional and deep NLP models in detecting deception from speech. Applying different models to the Real-Life Trial dataset, a single layer of Bidirectional Long Short-Term Memory (BiLSTM) tuned by early stopping outperformed the other models. This model achieved an accuracy of 93.57% and an F1 score of 94.48%. 
    more » « less