skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mycelium: Large-Scale Distributed Graph Queries with Differential Privacy
This paper introduces Mycelium, the first system to process differentially private queries over large graphs that are distributed across millions of user devices. Such graphs occur, for instance, when tracking the spread of diseases or malware. Today, the only practical way to query such graphs is to upload them to a central aggregator, which requires a great deal of trust from users and rules out certain types of studies entirely. With Mycelium, users' private data never leaves their personal devices unencrypted, and each user receives strong privacy guarantees. Mycelium does require the help of a central aggregator with access to a data center, but the aggregator merely facilitates the computation by providing bandwidth and computation power; it never learns the topology of the graph or the underlying data. Mycelium accomplishes this with a combination of homomorphic encryption, a verifiable secret redistribution scheme, and a mix network based on telescoping circuits. Our evaluation shows that Mycelium can answer a range of different questions from the medical literature with millions of devices.  more » « less
Award ID(s):
2045861 2107147 2124184
PAR ID:
10312007
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM SIGOPS Symposium on Operating Systems Principles (SOSP)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present and analyze UDM, a new protocol for user discovery in anonymous communication systems that minimizes the information disclosed to the system and users. Unlike existing systems, including those based on private set intersection, UDM learns nothing about the contact lists and social graphs of the users, is not vulnerable to off-line dictionary attacks that expose contact lists, does not reveal platform identifiers to users without the owner’s explicit permission, and enjoys low computation and communication complexity. UDM solves the following user-discovery problem. User Alice wishes to communicate with Bob over an anonymous communication system, such as cMix or Tor. Initially, each party knows each other’s public contact identifier (e.g., email address or phone number), but neither knows the other’s private platform identifier in the communication system. If both parties wish to communicate with each other, UDM enables them to establish a shared key and learn each other’s private platform identifier. UDM uses an untrusted user-discovery system, which processes and stores only public information, hashed values, or values encrypted with keys it does not know. Therefore, UDM cannot learn any information about the social graphs of its users. Using the anonymous communication system, each pair of users who wish to communicate with each other uploads to the user-discovery system their private platform identifier, encrypted with their shared key. Indexing their request by a truncated cryptographic hash of their shared key, each user can then download each other’s encrypted private platform identifier. 
    more » « less
  2. When collecting information, local differential privacy (LDP) alleviates privacy concerns of users because their private information is randomized before being sent it to the central aggregator. LDP imposes large amount of noise as each user executes the randomization independently. To address this issue, recent work introduced an intermediate server with the assumption that this intermediate server does not collude with the aggregator. Under this assumption, less noise can be added to achieve the same privacy guarantee as LDP, thus improving utility for the data collection task. This paper investigates this multiple-party setting of LDP. We analyze the system model and identify potential adversaries. We then make two improvements: a new algorithm that achieves a better privacy-utility tradeoff; and a novel protocol that provides better protection against various attacks. Finally, we perform experiments to compare different methods and demonstrate the benefits of using our proposed method. 
    more » « less
  3. null (Ed.)
    Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to the PS. Due to limited bandwidth, only a subset of connected devices can be scheduled in each round. There are usually millions of parameters in the state-of-art machine learning models such as deep learning, resulting in a high computation complexity as well as a high communication burden on collecting/distributing data for training. To improve communication efficiency and make the training model converge faster, we propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate under practical constraints during the entire learning process. NOMA allows multiple users to transmit on the same channel simultaneously. The user scheduling problem is transformed into a maximum-weight independent set problem that can be solved using graph theory. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks than other existing schemes within the same learning time. 
    more » « less
  4. Abstract Millions of consumers depend on smart camera systems to remotely monitor their homes and businesses. However, the architecture and design of popular commercial systems require users to relinquish control of their data to untrusted third parties, such as service providers (e.g., the cloud). Third parties therefore can (and in some instances have) access the video footage without the users’ knowledge or consent—violating the core tenet of user privacy. In this paper, we present CaCTUs , a privacy-preserving smart Camera system Controlled Totally by Users. CaCTUs returns control to the user ; the root of trust begins with the user and is maintained through a series of cryptographic protocols, designed to support popular features, such as sharing, deleting, and viewing videos live. We show that the system can support live streaming with a latency of 2 s at a frame rate of 10 fps and a resolution of 480 p. In so doing, we demonstrate that it is feasible to implement a performant smart-camera system that leverages the convenience of a cloud-based model while retaining the ability to control access to (private) data. 
    more » « less
  5. Abstract We present the design, implementation and evaluation of a system, called MATRIX, developed to protect the privacy of mobile device users from location inference and sensor side-channel attacks. MATRIX gives users control and visibility over location and sensor (e.g., Accelerometers and Gyroscopes) accesses by mobile apps. It implements a PrivoScope service that audits all location and sensor accesses by apps on the device and generates real-time notifications and graphs for visualizing these accesses; and a Synthetic Location service to enable users to provide obfuscated or synthetic location trajectories or sensor traces to apps they find useful, but do not trust with their private information. The services are designed to be extensible and easy for users, hiding all of the underlying complexity from them. MATRIX also implements a Location Provider component that generates realistic privacy-preserving synthetic identities and trajectories for users by incorporating traffic information using historical data from Google Maps Directions API, and accelerations using statistical information from user driving experiments. These mobility patterns are generated by modeling/solving user schedule using a randomized linear program and modeling/solving for user driving behavior using a quadratic program. We extensively evaluated MATRIX using user studies, popular location-driven apps and machine learning techniques, and demonstrate that it is portable to most Android devices globally, is reliable, has low-overhead, and generates synthetic trajectories that are difficult to differentiate from real mobility trajectories by an adversary. 
    more » « less