skip to main content


Title: Highly optimized Curve448 and Ed448 design in wolfSSL and side-channel evaluation on Cortex-M4
The compact key sizes and the low computational latency of the Elliptic Curve Cryptography (ECC) family of curves sparked high interest in their integration into network protocols. The recently suggested Curve448, assuring 224-bit security, is an ideal curve choice for integrating into cryptographic libraries according to a late study on backdoors on other ECC instances compromising their security, which results in the integration of Curve448 into the TLS1.3 protocol. Curve448 and its birationally equivalent untwisted Edwards curve Ed448, used for key exchange and authentication, respectively, present a perfect fit for low-end embedded cryptographic libraries due to their minimal memory requirements. In this work, we deploy optimized Montgomery Ladder point multiplication into the widely employed IoT-focused cryptographic library wolfSSL and present side-channel robust and efficient ECDH and EdDSA based on Curve448 and Ed448. We evaluate the performance of the newly integrated architectures against the NIST recommended CortexM4 STM32F407-DK ARM-based platform. We perform thorough side-channel evaluation of the proposed Montgomery Ladder implementation via powerful TVLA analysis revealing DPA data leakage. We integrate countermeasures to protect our design, evaluate their effectiveness and analyze the latency overhead. We achieve SCA robust Curve448 and Ed448 at the cost of around 1.2MCC(1.36× the execution time). Finally, we report the performance of our fully SCA protected Curve448 and Ed448 as part of TLS1.3 wolfSSL, reporting 1.04× performance compared to the original wolfSSL code.  more » « less
Award ID(s):
2147196
PAR ID:
10507260
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Conference on Dependable and Secure Computing (DSC)
Format(s):
Medium: X
Location:
Tampa, Florida
Sponsoring Org:
National Science Foundation
More Like this
  1. The elliptic curve family of schemes has the lowest computational latency, memory use, energy consumption, and bandwidth requirements, making it the most preferred public key method for adoption into network protocols. Being suitable for embedded devices and applicable for key exchange and authentication, ECC is assuming a prominent position in the field of IoT cryptography. The attractive properties of the relatively new curve Curve448 contribute to its inclusion in the TLS1.3 protocol and pique the interest of academics and engineers aiming at studying and optimizing the schemes. When addressing low-end IoT devices, however, the literature indicates little work on these curves. In this paper, we present an efficient design for both protocols based on Montgomery curve Curve448 and its birationally equivalent Edwards curve Ed448 used for key agreement and digital signature algorithm, specifically the X448 function and the Ed448 DSA, relying on efficient low-level arithmetic operations targeting the ARM-based Cortex-M4 platform. Our design performs point multiplication, the base of the Elliptic Curve Diffie-Hellman (ECDH), in 3,2KCCs, resulting in more than 48% improvement compared to the best previous work based on Curve448, and performs sign and verify, the main operations of the Edwards-curves Digital Signature Algorithm (EdDSA), in 6,038KCCs and 7,404KCCs, showing a speedup of around 11% compared to the counterparts. We present novel modular multiplication and squaring architectures reaching ∼25% and ∼35% faster runtime than the previous best-reported results, respectively, based on Curve448 key exchange counterparts, and ∼13% and ∼25% better latency results than the Ed448-based digital signature counterparts targeting Cortex-M4 platform. 
    more » « less
  2. The elliptic curve family of schemes has the lowest computational latency, memory use, energy consumption, and bandwidth requirements, making it the most preferred public key method for adoption into network protocols. Being suitable for embedded devices and applicable for key exchange and authentication, ECC is assuming a prominent position in the field of IoT cryptography. The attractive properties of the relatively new curve Curve448 contribute to its inclusion in the TLS1.3 protocol and pique the interest of academics and engineers aiming at studying and optimizing the schemes. When addressing low-end IoT devices, however, the literature indicates little work on these curves. In this paper, we present an efficient design for both protocols based on Montgomery curve Curve448 and its birationally equivalent Edwards curve Ed448 used for key agreement and digital signature algorithm, specifically the X448 function and the Ed448 DSA, relying on efficient lowlevel arithmetic operations targeting the ARM-based Cortex-M4 platform. Our design performs point multiplication, the base of the Elliptic Curve Diffie-Hellman (ECDH), in 3,2KCCs, resulting in more than 48% improvement compared to the best previous work based on Curve448, and performs sign and verify, the main operations of the Edwards-curves Digital Signature Algorithm (EdDSA), in 6,038KCCs and 7,404KCCs, showing a speedup of around 11% compared to the counterparts. We present novel modular multiplication and squaring architectures reaching  25% and s 35% faster runtime than the previous best-reported results, respectively, based on Curve448 key exchange counterparts, and s 13% and s 25% better latency results than the Ed448-based digital signature counterparts targeting Cortex-M4 platform. 
    more » « less
  3. The demand for classical cryptography schemes continues to increase due to the exhaustive studies on their security. Thus, constant improvement of timing, power consumption, and memory requirements are needed for the most widely used classical Elliptic Curve Cryptography (ECC) primitives, suiting high- as well as low-end devices. In this work, we present the first implementation of the Edwards Curve Digital Signature Algorithm (EdDSA) based on the Ed448 targeting the ARM Cortex-M4-based STM32F407VG microcontroller, which forms a large part of the Internet of Things (IoT) world. We report timing and memory consumption results based on portable C and targetspecific hand-crafted assembly code implementations of the lowlevel finite filed arithmetics. We optimize the high-level group operations by implementing the efficient scalar multiplication over the Ed448 isogenous map to reduce the computation complexity. Furthermore, we provide a side-channel analysis (SCA) and fault attack protected design by developing point randomization, scalar blinding techniques, and repeated signature, and evaluate the performance. Our optimized architecture performs a signature and verification in 39.88ms and 51.54ms, respectively, where SCA protection can be achieved at less than 6.4% cost of performance overhead. 
    more » « less
  4. To provide safe communication across an unprotected medium such as the internet, network protocols are being established. These protocols employ public key techniques to perform key exchange and authentication. Transport Layer Security (TLS) is a widely used network protocol that enables secure communication between a server and a client. TLS is employed in billions of transactions per second. Contemporary protocols depend on traditional methods that utilize the computational complexity of factorization or (elliptic curve) logarithm mathematics problems. The ongoing advancement in the processing power of classical computers requires an ongoing increase in the security level of the underlying cryptographic algorithms. This study focuses on the analysis of Curve448 and Edwards curve Ed448, renowned for their superior security features that offer a 224-bit level of security as part of the TLSv1.3 protocol. The exponential advancement of quantum computers, however, presents a substantial threat to secure network communication that depends on classical crypto schemes, irrespective of their degree of security. Quantum computers have the capability to resolve these challenges within a feasible timeframe. In order to successfully transition to Post-Quantum secure network protocols, it is imperative to concurrently deploy both classical and post-quantum algorithms. This is done to fulfill the requirements of both enterprises and governments, while also instilling more assurance in the reliability of the post-quantum systems. This paper presents a detailed hybrid implementation architecture of the TLSv1.3 network protocol. We showcase the first deployment of Curve448 and Crystals-Kyber for the purpose of key exchanging, and Ed448 and Crystals-Dilithium for verifying the authenticity of entities and for X.509 Public Key Infrastructure (PKI). We rely upon the widely used OpenSSL library and the specific wolfSSL library for embedded devices to provide our results for server and client applications. 
    more » « less
  5. The highly secure Curve448 cryptographic algorithm has been recently recommended by NIST. While this algorithm provides 224-bit security over elliptic curve cryptography, its implementation may still be vulnerable to physical sidechannel attacks. In this paper, we present a speed-optimized implementation on a 32-bit ARM Cortex-M4 platform achieving more than 40% improvement compared to the best previous work. Our design can perform 43 scalar multiplications per second on an STM32F4 working at 168 MHz. At 24 MHz, our proposed implementation takes only 3,740k clock cycles. On the other hand, the security of Curve448 is thoroughly evaluated to have a trade-off between performance and required protection. We apply different effective countermeasures to prevent a subset of side-channel and fault injection attacks at the cost of 8%-22% overhead. 
    more » « less