skip to main content


This content will become publicly available on July 5, 2024

Title: Scalable Binary Neural Network applications in Oblivious Inference
Binary neural network (BNN) delivers increased compute intensity and reduces memory/data requirements for computation. Scalable BNN enables inference in a limited time due to different constraints. This paper explores the application of Scalable BNN in oblivious inference, a service provided by a server to mistrusting clients. Using this service, a client can obtain the inference result on his/her data by a trained model held by the server without disclosing the data or learning the model parameters. Two contributions of this paper are: 1) we devise lightweight cryptographic protocols explicitly designed to exploit the unique characteristics of BNNs. 2) we present an advanced dynamic exploration of the runtime-accuracy tradeoff of scalable BNNs in a single-shot training process. While previous works trained multiple BNNs with different computational complexities (which is cumbersome due to the slow convergence of BNNs), we train a single BNN that can perform inference under various computational budgets. Compared to CryptFlow2, the state-of-the-art technique in the oblivious inference of non-binary DNNs, our approach reaches 3 × faster inference while keeping the same accuracy. Compared to XONN, the state-of-the-art technique in the oblivious inference of binary networks, we achieve 2 × to 12 × faster inference while obtaining higher accuracy.  more » « less
Award ID(s):
2016737
NSF-PAR ID:
10479947
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Embedded Computing Systems
ISSN:
1539-9087
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-trivial in BNNs, it is not proper to adopt any existing pruning method of full- precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9- layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% run- time speedup on a TitanX GPU. 
    more » « less
  2. Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons. 
    more » « less
  3. Binary neural networks (BNNs) substitute complex arithmetic operations with simple bit-wise operations. The binarized weights and activations in BNNs can drastically reduce memory requirement and energy consumption, making it attractive for edge ML applications with limited resources. However, the severe memory capacity and energy constraints of low-power edge devices call for further reduction of BNN models beyond binarization. Weight pruning is a proven solution for reducing the size of many neural network (NN) models, but the binary nature of BNN weights make it difficult to identify insignificant weights to remove. In this paper, we present a pruning method based on latent weight with layer-level pruning sensitivity analysis which reduces the over-parameterization of BNNs, allowing for accuracy gains while drastically reducing the model size. Our method advocates for a heuristics that distinguishes weights by their latent weights, a real-valued vector used to compute the pseudogradient during backpropagation. It is tested using three different convolutional NNs on the MNIST, CIFAR-10, and Imagenette datasets with results indicating a 33%--46% reduction in operation count, with no accuracy loss, improving upon previous works in accuracy, model size, and total operation count. 
    more » « less
  4. Abstract The ability to query and update over encrypted data is an essential feature to enable breach-resilient cyber-infrastructures. Statistical attacks on searchable encryption (SE) have demonstrated the importance of sealing information leaks in access patterns. In response to such attacks, the community has proposed the Oblivious Random Access Machine (ORAM). However, due to the logarithmic communication overhead of ORAM, the composition of ORAM and SE is known to be costly in the conventional client-server model, which poses a critical barrier toward its practical adaptations. In this paper, we propose a novel hardware-supported privacy-enhancing platform called Practical Oblivious Search and Update Platform (POSUP), which enables oblivious keyword search and update operations on large datasets with high efficiency. We harness Intel SGX to realize efficient oblivious data structures for oblivious search/update purposes. We implemented POSUP and evaluated its performance on a Wikipedia dataset containing ≥2 29 keyword-file pairs. Our implementation is highly efficient, taking only 1 ms to access a 3 KB block with Circuit-ORAM. Our experiments have shown that POSUP offers up to 70× less end-to-end delay with 100× reduced network bandwidth consumption compared with the traditional ORAM-SE composition without secure hardware. POSUP is also at least 4.5× faster for up to 99.5% of keywords that can be searched compared with state-of-the-art Intel SGX-assisted search platforms. 
    more » « less
  5. Oblivious Random Access Machine (ORAM) allows a client to hide the access pattern when accessing sensitive data on a remote server. It is known that there exists a logarithmic communication lower bound on any passive ORAM construction, where the server only acts as the storage service. This overhead, however, was shown costly for some applications. Several active ORAM schemes with server computation have been proposed to overcome this limitation. However, they mostly rely on costly homomorphic encryptions, whose performance is worse than passive ORAM. In this article, we propose S3ORAM, a new multi-server ORAM framework, which featuresO(1) client bandwidth blowup and low client storage without relying on costly cryptographic primitives. Our key idea is to harness Shamir Secret Sharing and a multi-party multiplication protocol on applicable binary tree-ORAM paradigms. This strategy allows the client to instruct the server(s) to perform secure and efficient computation on his/her behalf with a low intervention thereby, achieving a constant client bandwidth blowup and low server computational overhead. Our framework can also work atop a generalk-ary tree ORAM structure (k≥ 2). We fully implemented our framework, and strictly evaluated its performance on a commodity cloud platform (Amazon EC2). Our comprehensive experiments confirmed the efficiency of S3ORAM framework, where it is approximately 10× faster than the most efficient passive ORAM (i.e., Path-ORAM) for a moderate network bandwidth while being three orders of magnitude faster than active ORAM withO(1) bandwidth blowup (i.e., Onion-ORAM). We have open-sourced the implementation of our framework for public testing and adaptation.

     
    more » « less