In this paper, we investigate whether decision trees can be used to interpret a black-box classifier without knowing the learning algorithm and the training data. Decision trees are known for their transparency and high expressivity. However, they are also notorious for their instability and tendency to grow excessively large. We present a classifier reverse engineering model that outputs a decision tree to interpret the black-box classifier. There are two major challenges. One is to build such a decision tree with controlled stability and size, and the other is that probing the black-box classifier is limited for security and economic reasons. Our model addresses the two issues by simultaneously minimizing sampling cost and classifier complexity. We present our empirical results on four real datasets, and demonstrate that our reverse engineering learning model can effectively approximate and simplify the black box classifier.
more »
« less
Attacks and Defenses for Single-Stage Residue Number System PRNGs
This paper explores the security of a single-stage residue number system (RNS) pseudorandom number generator (PRNG), which has previously been shown to provide extremely high-quality outputs when evaluated through available RNG statistical test suites or in using Shannon and single-stage Kolmogorov entropy metrics. In contrast, rather than blindly performing statistical analyses on the outputs of the single-stage RNS PRNG, this paper provides both white box and black box analyses that facilitate reverse engineering of the underlying RNS number generation algorithm to obtain the residues, or equivalently key, of the RNS algorithm. We develop and demonstrate a conditional entropy analysis that permits extraction of the key given a priori knowledge of state transitions as well as reverse engineering of the RNS PRNG algorithm and parameters (but not the key) in problems where the multiplicative RNS characteristic is too large to obtain a priori state transitions. We then discuss multiple defenses and perturbations for the RNS system that fool the original attack algorithm, including deliberate noise injection and code hopping. We present a modification to the algorithm that accounts for deliberate noise, but rapidly increases the search space and complexity. Lastly, we discuss memory requirements and time required for the attacker and defender to maintain these defenses.
more »
« less
- Award ID(s):
- 1946493
- PAR ID:
- 10324940
- Date Published:
- Journal Name:
- IoT
- Volume:
- 2
- Issue:
- 3
- ISSN:
- 2624-831X
- Page Range / eLocation ID:
- 375 to 400
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Side-channel attacks on elliptic curve cryptography (ECC) often assume a white-box attacker who has detailed knowledge of the implementation choices taken by the target implementation. Due to the complex and layered nature of ECC, there are many choices that a developer makes to obtain a functional and interoperable implementation. These include the curve model, coordinate system, addition formulas, and the scalar multiplier, or lower-level details such as the finite-field multiplication algorithm. This creates a gap between the attack requirements and a real-world attacker that often only has black-box access to the target – i.e., has no access to the source code nor knowledge of specific implementation choices made. Yet, when the gap is closed, even real-world implementations of ECC succumb to side-channel attacks, as evidenced by attacks such as TPM-Fail, Minerva, the Side Journey to Titan, or TPMScan [MSE+20; JSS+20; RLM+21; SDB+24].We study this gap by first analyzing open-source ECC libraries for insight into realworld implementation choices. We then examine the space of all ECC implementations combinatorially. Finally, we present a set of novel methods for automated reverse engineering of black-box ECC implementations and release a documented and usable open-source toolkit for side-channel analysis of ECC called pyecsca.Our methods turn attacks around: instead of attempting to recover the private key, they attempt to recover the implementation configuration given control over the private and public inputs. We evaluate them on two simulation levels and study the effect of noise on their performance. Our methods are able to 1) reverse-engineer the scalar multiplication algorithm completely and 2) infer significant information about the coordinate system and addition formulas used in a target implementation. Furthermore, they can bypass coordinate and curve randomization countermeasures.more » « less
-
Dachman-Soled, Dana (Ed.)Pseudorandom number generators with input (PRNGs) are cryptographic algorithms that generate pseudorandom bits from accumulated entropic inputs (e.g., keystrokes, interrupt timings, etc.). This paper studies in particular PRNGs that are secure against premature next attacks (Kelsey et al., FSE '98), a class of attacks leveraging the fact that a PRNG may produce an output (which could be seen by an adversary!) before enough entropy has been accumulated. Practical designs adopt either unsound entropy-estimation methods to prevent such attacks (as in Linux’s /dev/random) or sophisticated pool-based approaches as in Yarrow (MacOS/FreeBSD) and Fortuna (Windows). The only prior theoretical study of premature next attacks (Dodis et al., Algorithmica '17) considers either a seeded setting or assumes constant entropy rate, and thus falls short of providing and validating practical designs. Assuming the availability of random seed is particularly problematic, first because this requires us to somehow generate a random seed without using our PRNG, but also because we must ensure that the entropy inputs to the PRNG remain independent of the seed. Indeed, all practical designs are seedless. However, prior works on seedless PRNGs (Coretti et al., CRYPTO '19; Dodis et al., ITC '21, CRYPTO'21) do not consider premature next attacks. The main goal of this paper is to investigate the feasibility of theoretically sound seedless PRNGs that are secure against premature next attacks. To this end, we make the following contributions: 1) We prove that it is impossible to achieve seedless PRNGs that are secure against premature-next attacks, even in a rather weak model. Namely, the impossibility holds even when the entropic inputs to the PRNG are independent. In particular, our impossibility result holds in settings where seedless PRNGs are otherwise possible. 2) Given the above impossibility result, we investigate whether existing seedless pool-based approaches meant to overcome premature next attacks in practical designs provide meaningful guarantees in certain settings. Specifically, we show the following. 3) We introduce a natural condition on the entropic input and prove that it implies security of the round-robin entropy accumulation PRNG used by Windows 10, called Fortuna. Intuitively, our condition requires the input entropy "not to vary too wildly" within a given round-robin round. 4) We prove that the "root pool" approach (also used in Windows 10) is secure for general entropy inputs, provided that the system’s state is not compromised after system startup.more » « less
-
Sample entropy, an approximation of the Kolmogorov entropy, was proposed to characterize complexity of a time series, which is essentially defined as −log(B/A), where B denotes the number of matched template pairs with length m and A denotes the number of matched template pairs with m+1, for a predetermined positive integer m. It has been widely used to analyze physiological signals. As computing sample entropy is time consuming, the box-assisted, bucket-assisted, x-sort, assisted sliding box, and kd-tree-based algorithms were proposed to accelerate its computation. These algorithms require O(N2) or O(N2−1m+1) computational complexity, where N is the length of the time series analyzed. When N is big, the computational costs of these algorithms are large. We propose a super fast algorithm to estimate sample entropy based on Monte Carlo, with computational costs independent of N (the length of the time series) and the estimation converging to the exact sample entropy as the number of repeating experiments becomes large. The convergence rate of the algorithm is also established. Numerical experiments are performed for electrocardiogram time series, electroencephalogram time series, cardiac inter-beat time series, mechanical vibration signals (MVS), meteorological data (MD), and 1/f noise. Numerical results show that the proposed algorithm can gain 100–1000 times speedup compared to the kd-tree and assisted sliding box algorithms while providing satisfactory approximate accuracy.more » « less
-
Abstract—A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary’s knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Imore » « less
An official website of the United States government

