High quality Machine Learning (ML) models are often considered valuable intellectual property by companies. Model Stealing (MS) attacks allow an adversary with black-box access to a ML model to replicate its functionality by training a clone model using the predictions of the target model for different inputs. However, best available existing MS attacks fail to produce a high-accuracy clone without access to the target dataset or a representative dataset necessary to query the target model. In this paper, we show that preventing access to the target dataset is not an adequate defense to protect a model. We propose MAZE -- a data-free model stealing attack using zeroth-order gradient estimation that produces high-accuracy clones. In contrast to prior works, MAZE uses only synthetic data created using a generative model to perform MS. Our evaluation with four image classification models shows that MAZE provides a normalized clone accuracy in the range of 0.90x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and on surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy 0.97x to 1.0x) and reduces the query budget required for the attack by 2x-24x.
more »
« less
TPUXtract: An Exhaustive Hyperparameter Extraction Framework
Model stealing attacks on AI/ML devices undermine intellectual property rights, compromise the competitive advantage of the original model developers, and potentially expose sensitive data embedded in the model’s behavior to unauthorized parties. While previous research works have demonstrated successful side-channelbased model recovery in embedded microcontrollers and FPGA-based accelerators, the exploration of attacks on commercial ML accelerators remains largely unexplored. Moreover, prior side-channel attacks fail when they encounter previously unknown models. This paper demonstrates the first successful model extraction attack on the Google Edge Tensor Processing Unit (TPU), an off-the-shelf ML accelerator. Specifically, we show a hyperparameter stealing attack that can extract all layer configurations including the layer type, number of nodes, kernel/filter sizes, number of filters, strides, padding, and activation function. Most notably, our attack is the first comprehensive attack that can extract previously unseen models. This is achieved through an online template-building approach instead of a pre-trained ML-based approach used in prior works. Our results on a black-box Google Edge TPU evaluation show that, through obtained electromagnetic traces, our proposed framework can achieve 99.91% accuracy, making it the most accurate one to date. Our findings indicate that attackers can successfully extract various types of models on a black-box commercial TPU with utmost detail and call for countermeasures.
more »
« less
- Award ID(s):
- 1943245
- PAR ID:
- 10634234
- Publisher / Repository:
- IACR Transactions on Cryptographic Hardware and Embedded Systems, 2025
- Date Published:
- Journal Name:
- IACR Transactions on Cryptographic Hardware and Embedded Systems
- Volume:
- 2025
- Issue:
- 1
- ISSN:
- 2569-2925
- Page Range / eLocation ID:
- 78 to 103
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Web privacy is challenged by pixel-stealing attacks, which allow attackers to extract content from embedded iframes and to detect visited links. To protect against multiple pixelstealing attacks that exploited timing variations in SVG filters, browser vendors repeatedly adapted their implementations to eliminate timing variations. In this work we demonstrate that past efforts are still not sufficient. We show how web-based attackers can mount cache-based side-channel attacks to monitor data-dependent memory accesses in filter rendering functions. We identify conditions under which browsers elect the non-default CPU implementation of SVG filters, and develop techniques for achieving access to the high-resolution timers required for cache attacks. We then develop efficient techniques to use the pixel-stealing attack for text recovery from embedded pages and to achieve high-speed history sniffing. To the best of our knowledge, our attack is the first to leak multiple bits per screen refresh, achieving an overall rate of 267 bits per second.more » « less
-
In a black-box setting, the adversary only has API access to the target model and each query is expensive. Prior work on black-box adversarial examples follows one of two main strategies: (1) transfer attacks use white-box attacks on local models to find candidate adversarial examples that transfer to the target model, and (2) optimization-based attacks use queries to the target model and apply optimization techniques to search for adversarial examples. We propose hybrid attacks that combine both strategies, using candidate adversarial examples from local models as starting points for optimization-based attacks and using labels learned in optimization-based attacks to tune local models for finding transfer candidates. We empirically demonstrate on the MNIST, CIFAR10, and ImageNet datasets that our hybrid attack strategy reduces cost and improves success rates, and in combination with our seed prioritization strategy, enables batch attacks that can efficiently find adversarial examples with only a handful of queries.more » « less
-
Machine learning (ML) is successful in achieving human-level performance in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While recent efforts on explainable ML has received significant attention, the existing solutions are not applicable in real-time systems since they map interpretability as an optimization problem, which leads to numerous iterations of time-consuming complex computations. To make matters worse, existing implementations are not amenable for hardware-based acceleration. In this paper, we propose an efficient framework to enable acceleration of explainable ML procedure with hardware accelerators. We explore the effectiveness of both Tensor Processing Unit (TPU) and Graphics Processing Unit (GPU) based architectures in accelerating explainable ML. Specifically, this paper makes three important contributions. (1) To the best of our knowledge, our proposed work is the first attempt in enabling hardware acceleration of explainable ML. (2) Our proposed solution exploits the synergy between matrix convolution and Fourier transform, and therefore, it takes full advantage of TPU's inherent ability in accelerating matrix computations. (3) Our proposed approach can lead to real-time outcome interpretation. Extensive experimental evaluation demonstrates that proposed approach deployed on TPU can provide drastic improvement in interpretation time (39x on average) as well as energy efficiency (69x on average) compared to existing acceleration techniques.more » « less
-
This paper investigates an adversary's ease of attack in generating adversarial examples for real-world scenarios. We address three key requirements for practical attacks for the real-world: 1) automatically constraining the size and shape of the attack so it can be applied with stickers, 2) transform-robustness, i.e., robustness of a attack to environmental physical variations such as viewpoint and lighting changes, and 3) supporting attacks in not only white-box, but also black-box hard-label scenarios, so that the adversary can attack proprietary models. In this work, we propose GRAPHITE, an efficient and general framework for generating attacks that satisfy the above three key requirements. GRAPHITE takes advantage of transform-robustness, a metric based on expectation over transforms (EoT), to automatically generate small masks and optimize with gradient-free optimization. GRAPHITE is also flexible as it can easily trade-off transform-robustness, perturbation size, and query count in black-box settings. On a GTSRB model in a hard-label black-box setting, we are able to find attacks on all possible 1,806 victim-target class pairs with averages of 77.8% transform-robustness, perturbation size of 16.63% of the victim images, and 126K queries per pair. For digital-only attacks where achieving transform-robustness is not a requirement, GRAPHITE is able to find successful small-patch attacks with an average of only 566 queries for 92.2% of victim-target pairs. GRAPHITE is also able to find successful attacks using perturbations that modify small areas of the input image against PatchGuard, a recently proposed defense against patch-based attacks.more » « less
An official website of the United States government

