skip to main content


Title: Let a Non-barking Watchdog Bite: Cliptographic Signatures with an Offline Watchdog
We study how to construct secure digital signature schemes in the presence of kleptographic attacks. Our work utilizes an offline watchdog to clip the power of subversions via only one-time black-box testing of the implementation. Previous results essentially rely on an online watchdog which requires the collection of all communicating transcripts (or active re-randomization of messages). We first give a simple but generic construction, without random oracles, in the partial-subversion model in which key generation and signing algorithms can be subverted. Then, we give the first digital signature scheme in the complete-subversion model in which all cryptographic algorithms can be subverted. This construction is based on the full-domain hash. Along the way, we enhance the recent result of Russell et al. (CRYPTO 2018) about correcting a subverted random oracle.  more » « less
Award ID(s):
1801487 1801492 1801470
NSF-PAR ID:
10113877
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IACR International Workshop on Public Key Cryptography (PKC 2019)
Volume:
11442
Page Range / eLocation ID:
221-251
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We initiate the study of public-key encryption (PKE) schemes and key-encapsulation mechanisms (KEMs) that retain security even when public parameters (primes, curves) they use may be untrusted and subverted. We define a strong security goal that we call ciphertext pseudo-randomness under parameter subversion attack (CPR-PSA). We also define indistinguishability (of ciphertexts for PKE, and of encapsulated keys from random ones for KEMs) and public-key hiding (also called anonymity) under parameter subversion attack, and show they are implied by CPR-PSA, for both PKE and KEMs. We show that hybrid encryption continues to work in the parameter subversion setting to reduce the design of CPR-PSA PKE to CPR-PSA KEMs and an appropriate form of symmetric encryption. To obtain efficient, elliptic-curve-based KEMs achieving CPR-PSA, we introduce efficiently-embeddable group families and give several constructions from elliptic-curves. 
    more » « less
  2. Memory-hard functions (MHFs) are a key cryptographic primitive underlying the design of moderately expensive password hashing algorithms and egalitarian proofs of work. Over the past few years several increasingly stringent goals for an MHF have been proposed including the requirement that the MHF have high sequential space-time (ST) complexity, parallel space-time complexity, amortized area-time (aAT) complexity and sustained space complexity. Data-Independent Memory Hard Functions (iMHFs) are of special interest in the context of password hashing as they naturally resist side-channel attacks. iMHFs can be specified using a directed acyclic graph (DAG) $G$ with $N=2^n$ nodes and low indegree and the complexity of the iMHF can be analyzed using a pebbling game. Recently, Alwen et al. [CCS'17] constructed an DAG called DRSample which has aAT complexity at least $\Omega\left( N^2/\log N\right)$. Asymptotically DRSample outperformed all prior iMHF constructions including Argon2i, winner of the password hashing competition (aAT cost $\mathcal{O}\left(N^{1.767}\right)$), though the constants in these bounds are poorly understood. We show that the the greedy pebbling strategy of Boneh et al. [ASIACRYPT'16] is particularly effective against DRSample e.g., the aAT cost is $\mathcal{O}\left( N^2/\log N\right)$. In fact, our empirical analysis {\em reverses} the prior conclusion of Alwen et al. that DRSample provides stronger resistance to known pebbling attacks for practical values of $N \leq 2^{24}$. We construct a new iMHF candidate (DRSample+BRG) by using the bit-reversal graph to extend DRSample. We then prove that the construction is asymptotically optimal under every MHF criteria, and we empirically demonstrate that our iMHF provides the best resistance to {\em known} pebbling attacks. For example, we show that any parallel pebbling attack either has aAT cost $\omega(N^2)$ or requires at least $\Omega(N)$ steps with $\Omega(N/\log N)$ pebbles on the DAG. This makes our construction the first practical iMHF with a strong sustained space-complexity guarantee and immediately implies that any parallel pebbling has aAT complexity $\Omega(N^2/\log N)$. We also prove that any sequential pebbling (including the greedy pebbling attack) has aAT cost $\Omega\left( N^2\right)$ and, if a plausible conjecture holds, any parallel pebbling has aAT cost $\Omega(N^2 \log \log N/\log N)$ --- the best possible bound for an iMHF. We implement our new iMHF and demonstrate that it is just as fast as Argon2. Along the way we propose a simple modification to the Argon2 round function which increases an attacker's aAT cost by nearly an order of magnitude without increasing running time on a CPU. Finally, we give a pebbling reduction which proves that in the parallel random oracle model (PROM) the cost of evaluating an iMHF like Argon2i or DRSample+BRG is given by the pebbling cost of the underlying DAG. Prior pebbling reductions assumed that the iMHF round function concatenates input labels before hashing and did not apply to practical iMHFs such as Argon2i, DRSample or DRSample+BRG where input labels are instead XORed together. 
    more » « less
  3. Authentication is vital for the Internet of Things (IoT) applications involving sensitive data (e.g., medical and financial systems). Digital signatures offer scalable authentication with non-repudiation and public verifiability, which are necessary for auditing and dispute resolution in such IoT applications. However, digital signatures have been shown to be highly costly for low-end IoT devices, especially when embedded devices (e.g., medical implants) must operate without a battery replacement for a long time. We propose an Energy-aware Signature for Embedded Medical devices (ESEM) that achieves near-optimal signer efficiency. ESEM signature generation does not require any costly operations (e.g., elliptic curve (EC) scalar multiplication/addition), but only a small constant-number of pseudo-random function calls, additions, and a single modular multiplication. ESEM has the smallest signature size among its EC-based counterparts with an identical private key size. We achieve this by eliminating the use of the ephemeral public key (i.e, commitment) in Schnorrtype signatures from the signing via a distributed construction at the verifier without interaction with the signer while permitting a constant-size public key. We proved that ESEM is secure (in random oracle model), and fully implemented it on an 8-bit AVR microcontroller that is commonly used in medical devices. Our experiments showed that ESEM achieves 8.4× higher energy efficiency over its closest counterpart while offering a smaller signature and code size. Hence, ESEM can be suitable for deployment on resource-limited embedded devices in IoT. We 
    more » « less
  4. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  5. We consider the problem of efficiently simulating random quantum states and random unitary operators, in a manner which is convincing to unbounded adversaries with black-box oracle access. This problem has previously only been considered for restricted adversaries. Against adversaries with an a priori bound on the number of queries, it is well-known that t-designs suffice. Against polynomial-time adversaries, one can use pseudorandom states (PRS) and pseudorandom unitaries (PRU), as defined in a recent work of Ji, Liu, and Song; unfortunately, no provably secure construction is known for PRUs. In our setting, we are concerned with unbounded adversaries. Nonetheless, we are able to give stateful quantum algorithms which simulate the ideal object in both settings of interest. In the case of Haar-random states, our simulator is polynomial-time, has negligible error, and can also simulate verification and reflection through the simulated state. This yields an immediate application to quantum money: a money scheme which is information-theoretically unforgeable and untraceable. In the case of Haar-random unitaries, our simulator takes polynomial space, but simulates both forward and inverse access with zero error. These results can be seen as the first significant steps in developing a theory of lazy sampling for random quantum objects. 
    more » « less