Abstract Spectre class of transient execution security attacks on modern microprocessors rely on speculative execution. Software Controlled Speculation (SCS) was proposed as a microarchitecture‐level defense in the original Spectre paper and has also been adopted by ARM. The idea with SCS is to allow a mode in the microarchitecture, where instructions that read from memory are not allowed to execute speculatively. Errors or malicious fault‐injections in the implementation of SCS can still render the microarchitecture vulnerable to Spectre attacks. A formal verification method is proposed that can check the correctness of the implementation of SCS and detect any faults. The method has been demonstrated to be very efficient on two sets of benchmarks and provides accurate detection of implementation faults in SCS.
more »
« less
Synthesizing Formal Models of Hardware from RTL for Efficient Verification of Memory Model Implementations
Modern hardware complexity makes it challenging to determine if a given microarchitecture adheres to a particular memory consistency model (MCM). This observation inspired the Check tools, which formally check that a specific microarchitecture correctly implements an MCM with respect to a suite of litmus test pro-grams. Unfortunately, despite their effectiveness and efficiency, theCheck tools must be supplied a microarchitecture in the guise of a manually constructed axiomatic specification, called a 𝜇spec model. To facilitate MCM verification—and enable the Check tools to consume processor RTL directly—we introduce a methodology and associated tool, rtl2𝜇spec, for automatically synthesizing 𝜇spec models from processor designs written in Verilog or SystemVerilog, with the help of modest user-provided design metadata. As a case study, we use rtl2𝜇spec to facilitate the Check-based verification of the four-core RISC-V V-scale (multi-V-scale) processor’s MCM implementation. We show that rtl2𝜇spec can synthesize a complete, and proven correct by construction, 𝜇spec model from the SystemVerilog design of the multi-V-scale processor in 6.84 minutes. Subsequent Check-based MCM verification of the synthesized 𝜇spec model takes less than one second per litmus test.
more »
« less
- Award ID(s):
- 2017863
- PAR ID:
- 10299361
- Date Published:
- Journal Name:
- IEEE/ACM International Symposium on Microarchitecture
- Page Range / eLocation ID:
- 679 to 694
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Energy efficiency has emerged as a key concern for modern processor design, especially when it comes to embedded and mobile devices. It is vital to accurately quantify the power consumption of different micro-architectural components in a CPU. Traditional RTL or gate-level power estimation is too slow for early design-space exploration studies. By contrast, existing architecture-level power models suffer from large inaccuracies. Recently, advanced machine learning techniques have been proposed for accurate power modeling. However, existing approaches still require slow RTL simulations, have large training overheads or have only been demonstrated for fixed-function accelerators and simple in-order cores with predictable behavior. In this work, we present a novel machine learning-based approach for microarchitecture-level power modeling of complex CPUs. Our approach requires only high-level activity traces obtained from microarchitecture simulations. We extract representative features and develop low-complexity learning formulations for different types of CPU-internal structures. Cycle-accurate models at the sub-component level are trained from a small number of gate-level simulations and hierarchically composed to build power models for complete CPUs. We apply our approach to both in-order and out-of-order RISC-V cores. Cross-validation results show that our models predict cycle-by-cycle power consumption to within 3% of a gate-level power estimation on average. In addition, our power model for the Berkeley Out-of-Order (BOOM) core trained on micro-benchmarks can predict the cycle-by-cycle power of real-world applications with less than 3.6% mean absolute error.more » « less
-
Attacks which combine software vulnerabilities and hardware vulnerabilities are emerging security problems. Although the runtime verification or remote attestation can determine the correctness of a system, existing methods suffer from inflexible security policy setup and high performance overheads. Meanwhile, they rarely focus on addressing the threat in the RISC-V architecture, which provides an open Instruction Set Architecture (ISA) of the processsor. In this paper, we propose a comprehensive software and hardware co-verification method to protect the entire RISC-V system in the runtime. The proposed method adopts the Dynamic Information Flow Tracking (DIFT) framework to implement a new Verifier and Prover security architecture for supporting runtime software and hardware coverification. We realize a FPGA prototype on the Rocket-Chip, an RISC-V open-source processor core. The framework is implemented as a co-processor which do not change the architecture of main processor core and the new security architecture can be integrated with other RISC-V processors.more » « less
-
Implementation of a new instruction set architecture (ISA) is a non-trivial task which involves significant modifications to the system software, such as the compiler, the assembler, and the linker. This task also includes modifying and verifying functional and cycle accurate simulators to facilitate correct simulation and performance evaluation of programs under the new ISA. Isolating errors in these software components becomes extremely challenging and demands automated and semi-automated mechanisms since neither the compilation infrastructure nor the simulation infrastructure can be trusted as both parties have been heavily modified. Bootstrapping a new ISA is very common in embedded systems since there is a greater variety of embedded ISAs due to often not having a need to support backward compatibility of executables. In this paper, we present the tools and the verification mechanisms we have implemented to support the development of a number of related, but distinct ISAs. These ISAs are similar in complexity to the RISC-V ISA, and range from simple pipelined and superscalar processor ISAs, to a complete VLIW ISA. Our work in developing the system software and simulators for these ISAs demonstrate that a step-by-step semi-automated approach which relies on simple invariants can facilitate effective bootstrapping of the complete system software and the simulator infrastructure.more » « less
-
Deep-Learning has become a dominant computing paradigm across a broad range of application domains. Different architectures of Deep-Networks like CNN, MLP, and RNN have emerged as the prominent machine-learning approaches for today’s application domains. These architectures are heavily data-dependent, requiring frequent access to memory. As a result, these applications suffer the most from the memory bottleneck of the von Neumann architectures. There is an imminent need for memory-centric architectures for deep-learning and big-data analytic applications that are memory intensive. Modern Field Programmable Gate Arrays (FPGAs) are ideal programmable substrates for creating customized Processor in/near Memory (PIM) accelerators. Modern FPGAs contain 100s of Mbits of dual-ported SRAM in the form of disaggregated, configurable Block RAMs (BRAMs). These BRAMs contain TB/s of available internal bandwidth. Unfortunately, developing FPGA-based accelerators for deep learning is not a simple task and demands the utilization of specialized tools provided by the FPGA vendors. It requires expertise in low-level hardware microarchitecture design. These are often not available to most researchers in the field of deep learning. Even with the ongoing improvements in High-Level Synthesis (HLS) tools, the requirement for hardware-specific design knowledge cannot be completely eliminated. This research developed a new reconfigurable memory-centric architecture and design approach that opens the advantages of FPGAs and Processor-in-Memory architecture to memory-intensive applications. Due to its high-performance and scalable memory-centric design, this architecture can deliver the highest speed and the lowest latency achievable from an FPGA overcoming the memory bottleneck.more » « less
An official website of the United States government

