skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: TEE-Graph: efficient privacy and ownership protection for cloud-based graph spectral analysis
Introduction

Big graphs like social network user interactions and customer rating matrices require significant computing resources to maintain. Data owners are now using public cloud resources for storage and computing elasticity. However, existing solutions do not fully address the privacy and ownership protection needs of the key involved parties: data contributors and the data owner who collects data from contributors.

Methods

We propose a Trusted Execution Environment (TEE) based solution: TEE-Graph for graph spectral analysis of outsourced graphs in the cloud. TEEs are new CPU features that can enable much more efficient confidential computing solutions than traditional software-based cryptographic ones. Our approach has several unique contributions compared to existing confidential graph analysis approaches. (1) It utilizes the unique TEE properties to ensure contributors' new privacy needs, e.g., the right of revocation for shared data. (2) It implements efficient access-pattern protection with a differentially private data encoding method. And (3) it implements TEE-based special analysis algorithms: the Lanczos method and the Nystrom method for efficiently handling big graphs and protecting confidentiality from compromised cloud providers.

Results

The TEE-Graph approach is much more efficient than software crypto approaches and also immune to access-pattern-based attacks. Compared with the best-known software crypto approach for graph spectral analysis, PrivateGraph, we have seen that TEE-Graph has 103−105times lower computation, storage, and communication costs. Furthermore, the proposed access-pattern protection method incurs only about 10%-25% of the overall computation cost.

Discussion

Our experimentation showed that TEE-Graph performs significantly better and has lower costs than typical software approaches. It also addresses the unique ownership and access-pattern issues that other TEE-related graph analytics approaches have not sufficiently studied. The proposed approach can be extended to other graph analytics problems with strong ownership and access-pattern protection.

 
more » « less
Award ID(s):
2232824
PAR ID:
10479980
Author(s) / Creator(s):
;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in Big Data
Volume:
6
ISSN:
2624-909X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Background

    Confidential computing has gained prominence due to the escalating volume of data‐driven applications (e.g., machine learning and big data) and the acute desire for secure processing of sensitive data, particularly across distributed environments, such as the edge‐to‐cloud continuum.

    Objective

    Provided that the works accomplished in this emerging area are scattered across various research fields, this paper aims at surveying the fundamental concepts and cutting‐edge software and hardware solutions developed for confidential computing using trusted execution environments, homomorphic encryption, and secure enclaves.

    Methods

    We underscore the significance of building trust at both the hardware and software levels and delve into their applications, particularly for regular and advanced machine learning (ML) (e.g., large language models (LLMs), computer vision) applications.

    Results

    While substantial progress has been made, there are some barely‐explored areas that need extra attention from the researchers and practitioners in the community to improve confidentiality aspects, develop more robust attestation mechanisms, and address vulnerabilities of the existing trusted execution environments.

    Conclusion

    Providing a comprehensive taxonomy of the confidential computing landscape, this survey enables researchers to advance this field to ultimately ensure the secure processing of users' sensitive data across a multitude of applications and computing tiers.

     
    more » « less
  2. Confidential computing aims to secure the code and data in use by providing a Trusted Execution Environment (TEE) for applications using hardware features such as Intel SGX.Timing and cache side-channel attacks, however, are often outside the scope of the threat model, although once exploited they are able to break all the default security guarantees enforced by hardware. Unfortunately, tools detecting potential side-channel vulnerabilities within applications are limited and usually ignore the strong attack model and the unique programming model imposed by Intel SGX. This paper proposes a precise side-channel analysis tool, ENCIDER, detecting both timing and cache side-channel vulnerabilities within SGX applications via inferring potential timing observation points and incorporating the SGX programming model into analysis. ENCIDER uses dynamic symbolic execution to decompose the side-channel requirement based on the bounded non-interference property and implements byte-level information flow tracking via API modeling. We have applied ENCIDER to 4 real-world SGX applications, 2 SGX crypto libraries, and 3 widely-used crypto libraries, and found 29 timing side channels and 73 code and data cache side channels. We have reported our findings to the corresponding parties, e.g., Intel and ARM, who have confirmed most of the vulnerabilities detected. 
    more » « less
  3. Abstract Objective

    Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics.

    Methods and materials

    We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis.

    Results

    We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering).

    Discussion

    Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors.

    Conclusion

    The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data.

     
    more » « less
  4. Abstract Motivation

    The size of a genome graph—the space required to store the nodes, node labels and edges—affects the efficiency of operations performed on it. For example, the time complexity to align a sequence to a graph without a graph index depends on the total number of characters in the node labels and the number of edges in the graph. This raises the need for approaches to construct space-efficient genome graphs.

    Results

    We point out similarities in the string encoding mechanisms of genome graphs and the external pointer macro (EPM) compression model. We present a pair of linear-time algorithms that transform between genome graphs and EPM-compressed forms. The algorithms result in an upper bound on the size of the genome graph constructed in terms of an optimal EPM compression. To further reduce the size of the genome graph, we propose the source assignment problem that optimizes over the equivalent choices during compression and introduce an ILP formulation that solves that problem optimally. As a proof-of-concept, we introduce RLZ-Graph, a genome graph constructed based on the relative Lempel–Ziv algorithm. Using RLZ-Graph, across all human chromosomes, we are able to reduce the disk space to store a genome graph on average by 40.7% compared to colored compacted de Bruijn graphs constructed by Bifrost under the default settings. The RLZ-Graph scales well in terms of running time and graph sizes with an increasing number of human genome sequences compared to Bifrost and variation graphs produced by VGtoolkit.

    Availability

    The RLZ-Graph software is available at: https://github.com/Kingsford-Group/rlzgraph.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. We describe GraFBoost, a flash-based architecture with hardware acceleration for external analytics of multi-terabyte graphs. We compare the performance of GraFBoost with 1 GB of DRAM against various state-of-the-art graph analytics software including FlashGraph, running on a 32-thread Xeon server with 128 GB of DRAM. We demonstrate that despite the relatively small amount of DRAM, GraFBoost achieves high performance with very large graphs no other system can handle, and rivals the performance of the fastest software platforms on sizes of graphs that existing platforms can handle. Unlike in-memory and semi-external systems, GraFBoost uses a constant amount of memory for all problems, and its performance decreases very slowly as graph sizes increase, allowing GraFBoost to scale to much larger problems than possible with existing systems while using much less resources on a single-node system. The key component of GraFBoost is the sort-reduce accelerator, which implements a novel method to sequentialize fine-grained random accesses to flash storage. The sort-reduce accelerator logs random update requests and then uses hardware-accelerated external sorting with interleaved reduction functions. GraFBoost also stores newly updated vertex values generated in each superstep of the algorithm lazily with the old vertex values to further reduce I/O traffic. We evaluate the performance of GraFBoost for PageRank, breadth-first search and betweenness centrality on our FPGA-based prototype (Xilinx VC707 with 1 GB DRAM and 1 TB flash) and compare it to other graph processing systems including a pure software implementation of GrapFBoost. 
    more » « less