skip to main content

Title: Lightweight Implementation of the LowMC Block Cipher Protected Against Side-Channel Attacks
LowMC is a parameterizable block cipher developed for use in Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE). In these applications, linear operations are much less expensive in terms of resource utilization compared to the non-linear operations due to their low multiplicative complexity. In this work, we implemented two versions of LowMC -- unrolled and lightweight. Both implementations are realized using RTL VHDL. To the best of our knowledge, we report the first lightweight implementation of LowMC and the first implementation protected against side-channel analysis (SCA). For the SCA protection, we used a hybrid 2/3 shares Threshold Implementation (TI) approach, and for the evaluation, the Test Vector Leakage Assessment (TVLA) method, also known as the T-test. Our unprotected implementations show information leakage at 10K traces, and after protection, they could successfully pass the T-test for 1 million traces. The Xilinx Vivado is used for the synthesis, implementation, functional verification, timing analysis, and programming of the FPGA. The target FPGA family is Artix-7, selected due to its widespread use in multiple applications. Based on our results, the numbers of LUTs are 867 and 3,328 for the lightweight and the unrolled architecture with unrolling factor U = 16, respectively. It takes 14.21 more » μs for the lightweight architecture and 1.29 μs for the unrolled design with U = 16 to generate one 128-bit block of the ciphertext. The fully unrolled architecture beats the best previous implementation by Kales et al. in terms of the number of LUTs by a factor of 4.5. However, this advantage comes at the cost of having 2.9 higher latency. « less
Authors:
; ; ; ; ;
Editors:
Katzenbeisser, Stefan; Schaumont, Patrick
Award ID(s):
1801512 1718434
Publication Date:
NSF-PAR ID:
10281332
Journal Name:
ASHES'20: Proceedings of the 4th ACM Workshop on Attacks and Solutions in Hardware Security
Page Range or eLocation-ID:
45 to 56
Sponsoring Org:
National Science Foundation
More Like this
  1. Authenticated ciphers are vulnerable to side-channel attacks, including differential power analysis (DPA). Test Vector Leakage Assessment (TVLA) using Welch's t-test has been used to verify improved resistance of block ciphers to DPA after application of countermeasures. However, extension of this methodology to authenticated ciphers is non-trivial, since this requires additional input and output conditions, complex interfaces, and long test vectors interlaced with protocol necessary to describe authenticated cipher operations. In this research we augment an existing side-channel analysis architecture (FOBOS) with TVLA for authenticated ciphers. We use this capability to show that implementations in the Spartan-6 FPGA of the CAESAR Round 3 candidates ACORN, ASCON, CLOC (AES and TWINE), SILC (AES, PRESENT, and LED), JAMBU (AES and SIMON), and Ketje Jr., as well as AES-GCM, are potentially vulnerable to 1st order DPA. We then implement versions of the above ciphers, protected against 1st order DPA, using threshold implementations. TVLA is used to verify improved resistance to 1st order DPA of the protected cipher implementations. Finally, we benchmark unprotected and protected cipher implementations in the Spartan-6 FPGA, and compare the costs of 1st order DPA protection in terms of area, frequency, throughput, throughput-to-area (TP/A) ratio, power, and energy per bit. Ourmore »results show that ACORN is the most energy efficient, has the lowest area (in LUTs), and has the highest TP/A ratio of DPA-resistant implementations. However, Ketje Jr. has the highest throughput.« less
  2. Lightweight cryptography offers viable security solutions for resource constrained Internet of Things (IoT) devices. However, IoT devices have implementation vulnerabilities such as side channel attacks (SCA), where observation of physical phenomena associated with device operations can reveal sensitive internal contents. The U.S. National Institute of Standards and Technology has called for lightweight cryptographic solutions to process authenticated encryption with associated data (AEAD), and is evaluating candidates for suitability in a Lightweight Cryptography (LWC) Standardization Process. Two Round 2 candidate variants, COMET-CHAM and SCHWAEMM, use Addition-Rotation-XOR (ARX) primitives. However, ARX ciphers are known to be costly to protect against certain SCA. In this work we implement side channel protected versions of COMET-CHAM and SCHWAEMM using register transfer level design. Identical protection schemes consisting of a threshold implementation (TI)-protected Kogge-Stone adder are adopted. Resistance to power side channel analysis is verified on an Artix-7 FPGA target device. Implementations comply with the Hardware API for Lightweight Cryptography, and use a custom-designed extension of the Development Package for the Hardware API for Lightweight Cryptography which enables test and evaluation of side channel resistant designs. We compare side channel protection costs of the two candidates against each other, against their unprotected counterparts, and against previousmore »side channel protected AEAD implementations. COMET-CHAM is shown to consume less area and power, while SCHWAEMM has higher throughput and throughput to area ratio, and is more energy efficient. On average, the costs of protecting these ciphers against SCA are 32% more in area and 38% more in power, compared to the average protection costs for a large selection of previously-evaluated ciphers of similar implementation. Our results highlight the costs involved in implementing side channel protected ARX-ciphers, and help to inform NIST LWC late round and final portfolio selections.« less
  3. Authenticated ciphers, which combine the cryptographic services of confidentiality, integrity, and authentication into one algorithmic construct, can potentially provide improved security and efficiencies in the processing of sensitive data. However, they are vulnerable to side-channel attacks such as differential power analysis (DPA). Although the Test Vector Leakage Assessment (TVLA) methodology has been used to confirm improved resistance of block ciphers to DPA after application of countermeasures, extension of TVLA to authenticated ciphers is non-trivial, since authenticated ciphers have expanded input and output requirements, complex interfaces, and long test vectors which include protocol necessary to describe authenticated cipher operations. In this research, we upgrade the FOBOS test architecture with capability to perform TVLA on authenticated ciphers. We show that FPGA implementations of the CAESAR Round 3 candidates ACORN, Ascon, CLOC (with AES and TWINE primitives), SILC (with AES, PRESENT, and LED primitives), JAMBU (with AES and SIMON primitives), and Ketje Jr.; as well as AES-GCM, are vulnerable to 1st order DPA. We then use threshold implementations to protect the above cipher implementations against 1st order DPA, and verify the effectiveness of countermeasures using the TVLA methodology. Finally, we compare the unprotected and protected cipher implementations in terms of area, performance (maximummore »frequency and throughput), throughput-to-area (TP/A) ratio, power, and energy per bit (E/bit). Our results show that ACORN consumes the lowest number of resources, has the highest TP/A ratio, and is the most energy-efficient of all DPA-resistant implementations. However, Ketje Jr. has the highest throughput.« less
  4. Adhikari, Avishek ; Küsters, Ralf ; Preneel, Bart (Ed.)
    The field of post-quantum cryptography aims to develop and analyze algorithms that can withstand classical and quantum cryptanalysis. The NIST PQC standardization process, now in its third round, specifies ease of protection against side-channel analysis as an important selection criterion. In this work, we develop and validate a masked hardware implementation of Saber key encapsulation mechanism, a third-round NIST PQC finalist. We first design a baseline lightweight hardware architecture of Saber and then apply side-channel countermeasures. Our protected hardware implementation is significantly faster than previously reported protected software and software/hardware co-design implementations. Additionally, applying side-channel countermeasures to our baseline design incurs approximately 2.9x and 1.4x penalty in terms of the number of LUTs and latency, respectively, in modern FPGAs.
  5. Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n log n) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales,more »etc.). In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.« less