Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Bos, Joppe W; Celi, Sofia; Kannwischer, Matthias J (Ed.)Privacy-Preserving Federated Learning (PPFL) emphasizes the security and privacy of contributors' data in scenarios such as healthcare, smart grids, and the Internet of Things. However, ensuring the security and privacy throughout PPFL can be challenging, given the complexities of maintaining relationships with many users across multiple epochs. Additionally, under a threat model in which the aggregating server and corrupted users are colluding adversaries, honest users' inputs and output data must be protected at all stages. Two common tools for enforcing privacy in federated learning are Private Stream Aggregation (PSA) and Trusted Execution Environments (TEE). However, PSA-only approaches still expose the raw aggregate to the server (and thus to colluding parties). TEE-only aggregation typically incurs non-negligible per-client per-epoch overhead at scale because the TEE must handle per-client communication and maintain per-client state/key material. This paper presents SCALE-FL, a novel solution for PPFL that maintains security while achieving near-plaintext performance using a state-of-the-art PSA protocol to collect user information and a TEE to hide information about the raw aggregate. By using a PSA protocol for aggregation, we can maintain the privacy of information on the untrusted server without requiring per-user key storage or use by the TEE. Then, the aggregate is securely processed by the TEE in plaintext, without the heavy encryption required on an untrusted server. Finally, we ensure the security of user inputs in the federated learning output by using Differential Privacy (DP). The additional overhead introduced by SCALE-FL is 1% of the overhead of the plain FL executions.more » « less
-
Okhravi, Hamed; Martinovic, Ivan (Ed.)Private Set Intersection (PSI) protocols allow a querier to determine whether an item exists in a dataset without revealing the query or exposing non-matching records. It has many applications in fraud detection, compliance monitoring, healthcare analytics, and secure collaboration across distributed data sources. In these cases, the results obtained through PSI can be sensitive and even require some kind of downstream computation on the associated data before the outcome is revealed to the querier, computation that may involve floating-point arithmetic, such as the inference of a machine learning model. Although many such protocols have been proposed, and some of them even enable secure queries over distributed encrypted sets, they fail to address the aforementioned real-world complexities. In this work, we present the first encrypted label selection and analytics protocol construction, which allows the querier to securely retrieve not just the results of intersections among identifiers but also the outcomes of downstream functions on the data/label associated with the intersected identifiers. To achieve this, we construct a novel protocol based on an approximate CKKSfully homomorphic encryption that supports efficient label retrieval and downstream computations over real-valued data. In addition, we introduce several techniques to handle identifiers in large domains, e.g., 64 or 128 bits, while ensuring high precision for accurate downstream computations. Finally, we implement and benchmark our protocol, compare it against state-of-the-art methods, and perform evaluation over real-world fraud datasets, demonstrating its scalability and efficiency in large-scale use case scenarios. Our results show up to 1.4× to 6.8× speedup over prior approaches and select and analyze encrypted labels over real-world datasets in under 65 sec., making our protocol practical for real-world deployments.more » « less
-
Bos, Joppe W; Celi, Sofia; Kannwischer, Matthias J (Ed.)Retrieval Augmented Generation (RAG) can enhance the performance of Large Language Models (LLMs) when used in conjunction with a comprehensive knowledge database. However, the space required to store the necessary information can be taxing when RAG is used locally. As such, the concept of RAG-as-a-Service (RaaS) has emerged, in which third-party servers can be used to process client queries via an external database. Unfortunately, using such a service would expose the client's query to a third party, making the product unsuitable for processing sensitive queries. Our scheme ensures that throughout the entire RAG processing, neither the query, any distances, nor retrieval information is known to the database hosting server. Using a two-pronged approach, we employ Fully Homomorphic Encryption (FHE) and Private Information Retrieval (PIR) to ensure complete security during RAG processing. FHE is used to maintain privacy during initial query processing, during which the query embedding is encrypted and sent to the server for k-means centroid scoring to obtain a similarity ranking. Then, a series of PIR queries is used to privately retrieve the centroid-associated embeddings and the top-ranked documents. A first-of-its-kind, lightweight, fully secure RAG protocol, RAGtime-PIANO, enables efficient secure RAG.more » « less
-
Mukhopadhyay, Debdeep; Paterson, Kenny (Ed.)Finding intersections across sensitive data is a core operation in many real-world data-driven applications, such as healthcare, anti-money laundering, financial fraud, or watchlist applications. These applications often require large-scale collaboration across thousands or more independent sources, such as hospitals, financial institutions, or identity bureaus, where all records must remain encrypted during storage and computation, and are typically outsourced to dedicated/cloud servers. Such a highly distributed, large-scale, and encrypted setting makes it very challenging to apply existing solutions, e.g., (multi-party) private set intersection (PSI) or private membership test (PMT). In this paper, we present Distributed and Outsourced PSI (DO-PSI), an efficient and scalable PSI protocol over outsourced, encrypted, and highly distributed datasets. Our key technique lies in a generic threshold fully homomorphic encryption (FHE) based framework that aggregates equality results additively, which ensures high scalability to a large number of data sources. In addition, we propose a novel technique called \textit{nonzero-preserving mapping}, which maps a zero vector to zero and preserves nonzero values. This allows homomorphic equality tests over a smaller base field, substantially reducing computation while enabling higher-precision representations. We implement DO-PSI and conduct extensive experiments, showing that ours substantially outperforms existing methods in both computation and communication overheads. Our protocol handles a billion-scale set distributed and outsourced to a thousand data owners within one minute, directly reflecting large-scale deployment scenarios, and achieves up to an 11.16 improvement in end-to-end latency over prior state-of-the-art methods.more » « less
-
Shafiq, Zubair; Jansen, Rob (Ed.)Secure facial matching systems play a crucial role in privacy preserving biometric authentication, particularly in domains such as law enforcement, border control, and healthcare. Traditional facial matching systems require direct access to biometric data, raising significant privacy concerns. This paper presents HyDia, a novel protocol for scalable FHE-based facial matching with high computation and communication efficiencies, enabling secure one-to-many facial matching without exposing biometric data in plaintext. Our protocol adapts diagonalized matrix multiplication techniques to accommodate highly imbalanced matrix computations, enabling our novel non-rotational inner product algorithm that substantially reduces the homomorphic computation overhead compared to prior works. We further propose a hybrid approximation method for homomorphic thresholding, which achieves better approximation than the state-of-the-art approach (Chebyshev approximation) at the same multiplicative depths. More importantly, our design does not reveal exact similarity scores to the querier; instead, it provides only a threshold-based match decision or matching sources, strengthening privacy by withholding granular database information. We implement HyDia and competing approaches and provide both formal security proof and extensive experimental validation. Our results show that HyDia achieves practical query times at scale, significantly outperforming existing HE-based solutions in both computation and communication overhead. Notably, HyDia is the only viable FHE-based approach in common bandwidth settings (2Mbps & 1Gbps), outperforming the state-of-the-art approaches by 5.2x-227.4x in end-to-end latency under different settings. Finally, our experiments on real-face datasets show that HyDia incurs negligible accuracy loss, by achieving the same F1 score of 0.9968 as the corresponding plaintext facial matching baselines. This work advances the feasibility of privacy-preserving biometric identification, offering a scalable, bandwidth-efficient, and accurate solution for real-world deployments.more » « less
An official website of the United States government

Full Text Available