skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Private Analytics via Streaming, Sketching, and Silently Verifiable Proofs
We present Whisper, a system for privacy-preserving collection of aggregate statistics. Like prior systems, a Whisper deployment consists of a small set of non-colluding servers; these servers compute aggregate statistics over data from a large number of users without learning the data of any individual user. Whisper's main contribution is that its server-to-server communication cost and its server-side storage costs scale sublinearly with the total number of users. In particular, prior systems required the servers to exchange a few bits of information to verify the well-formedness of each client submission. In contrast, Whisper uses silently verifiable proofs, a new type of proof system on secret-shared data that allows the servers to verify an arbitrarily large batch of proofs by exchanging a single 128-bit string. This improvement comes with increased client-to-server communication, which, in cloud computing, is typically cheaper (or even free) than the cost of egress for server-to-server communication. To reduce server storage, Whisper approximates certain statistics using small-space sketching data structures. Applying randomized sketches in an environment with adversarial clients requires a careful and novel security analysis. In a deployment with two servers and 100,000 clients of which 1% are malicious, Whisper can improve server-to-server communication for vector sum by three orders of magnitude while each client's communication increases by only 10%.  more » « less
Award ID(s):
2054869
PAR ID:
10530540
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE Security and Privacy 2024
Date Published:
ISSN:
2375-1207
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Fast networks and the desire for high resource utilization in data centers and the cloud have driven disaggregation. Application compute is separated from storage, but this leads to high overheads when data must move over the network for simple operations on it. Alternatively, systems could allow applications to run application logic within storage via user-defined functions. Unfortunately, this ties provisioning and utilization of storage and compute resources together again. We present a new approach to executing storage-level functions in an in-memory key-value store that avoids this problem by dynamically deciding where to execute functions over data. Users write storage functions that are logically decoupled from storage, but storage servers choose where to run invocations of these functions physically. By using a server-internal cost model and observing function execution, servers choose to directly run inexpensive functions, while preferring to execute functions with high CPU-cost at client machines. We show that with this approach storage servers can reduce network request processing costs, avoid server compute bottlenecks, and improve aggregate storage system throughput. We realize our approach on an in-memory key-value store that executes 3.2 million strict serializable user-defined storage functions per second with 100 us response times. When running a mix of logic from different applications, it provides throughput better than running that logic purely at storage servers (85% more) or purely at clients (10% more). For our workloads, it also reduces latency (up to 2x) and transactional aborts (up to 33%) over pure client-side execution. 
    more » « less
  2. Federated learning (FL) is an increasingly popular approach for machine learning (ML) when the training dataset is highly distributed. Clients perform local training on their datasets and the updates are then aggregated into the global model. Existing protocols for aggregation are either inefficient or don’t consider the case of malicious actors in the system. This is a major barrier to making FL an ideal solution for privacy-sensitive ML applications. In this talk, I will present ELSA, a secure aggregation protocol for FL that breaks this barrier - it is efficient and addresses the existence of malicious actors (clients + servers) at the core of its design. Similar to prior work Prio and Prio+, ELSA provides a novel secure aggregation protocol built out of distributed trust across two servers that keeps individual client updates private as long as one server is honest, defends against malicious clients, and is efficient end-to-end. Compared to prior works, the distinguishing theme in ELSA is that instead of the servers generating cryptographic correlations interactively, the clients act as untrusted dealers of these correlations without compromising the protocol’s security. This leads to a much faster protocol while also achieving stronger security at that efficiency compared to prior work. We introduce new techniques that retain privacy even when a server is malicious at a small added cost of 7-25% in runtime with a negligible increase in communication over the case of a semi-honest server. ELSA improves end-to-end runtime over prior work with similar security guarantees by big margins - single-aggregator RoFL by up to 305x (for the models we consider), and distributed-trust Prio by up to 8x (with up to 16x faster server-side protocol). Additionally, ELSA can be run in a bandwidth-saver mode for clients who are geographically bandwidth-constrained - an important property that is missing from prior works. 
    more » « less
  3. It has been long observed that communication between a client and a content server using overlay detours may result in substantially better performance than a native path offered by IP routing. Yet the use of detours has been limited to distributed platforms such as Akamai. This paper poses a question - how can clients practically take advantage of overlay detours without modification to content servers (which are obviously outside clients' control)? We have posited elsewhere that the emergence of gigabit-to-the-home access networks would precipitate a new home network appliance, which would maintain permanent presence on the Internet for the users and have general computing and storage capabilities. Given such an appliance, our vision is that Internet users may form cooperatives in which members agree to serve as waypoints points to each other to improve each other's Internet experience. To make detours transparent to the server, we leverage MPTCP, which normally allows a device to communicate with the server on several network interfaces in parallel but we use it to communicate through external waypoint hosts. The waypoints then mimic MPTCP's subflows to the server, making the server oblivious to the overlay detours as long as it supports MPTCP. 
    more » « less
  4. Sherr, Micah; Shafiq, Zubair (Ed.)
    Private heavy-hitters is a data-collection task where multiple clients possess private bit strings, and data-collection servers aim to identify the most popular strings without learning anything about the clients' inputs. In this work, we introduce PLASMA: a private analytics framework in the three-server setting that protects the privacy of honest clients and the correctness of the protocol against a coalition of malicious clients and a malicious server. Our core primitives are a verifiable incremental distributed point function (VIDPF) and a batched consistency check, which are of independent interest. Our VIDPF introduces new methods to validate client inputs based on hashing. Meanwhile, our batched consistency check uses Merkle trees to validate multiple client sessions together in a batch. This drastically reduces server communication across multiple client sessions, resulting in significantly less communication compared to related works. Finally, we compare PLASMA with the recent works of Asharov et al. (CCS'22) and Poplar (S&P'21) and compare in terms of monetary cost for different input sizes. 
    more » « less
  5. In many networking scenarios, long-lived flows can be rerouted to free up resources and accommodate new flows, but doing so comes at a cost in terms of disruption. An archetypical example is the transmission of live streams in a content delivery network: audio and video encoders (clients) generate live streams and connect to a server which rebroadcasts their stream to the rest of the network. Reconnecting a client to a different server mid-stream is very disruptive. We abstract these scenarios in the setting of a capacitated network where clients arrive one by one and request to send a unit of flow to a designated set of servers subject to edge/vertex capacity constraints. An online algorithm maintains a sequence of flows that route the clients present so far to the set of servers. The cost of a sequence of flows is defined as the net switching cost, i.e. total length of all augmenting paths used to transform each flow into its successor. We prove that for unit-vertex-capacitated networks, the algorithm that successively updates the flow using the shortest augmenting path from the new client to a free server incurs a total switching cost of O(n log2 n), where n is the number of vertices in the network. This result is obtained by reducing to the online bipartite matching problem studied in prior work and applying their result. Finally, we identify a slightly more general class of networks for which essentially the same reduction idea can be applied to get the same bound. 
    more » « less