NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Price of Differential Privacy for Hierarchical Clustering

Deng, Chengyuan; Gao, Jie; Upadhyay, Jalaj; Wang, Chen; Zhou, Samson (April 2025, International Conference on Representation Learning 2025 (ICLR 2025))

Free, publicly-accessible full text available April 28, 2026
A Strong Separation for Adversarially Robust $$\ell_0$$ Estimation for Linear Sketches

https://doi.org/10.1109/FOCS61266.2024.00136

Gribelyuk, Elena; Lin, Honghao; Woodruff, David P; Yu, Huacheng; Zhou, Samson (October 2024, IEEE)

Full Text Available
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Asi, Hilal; Feldman, Vitaly; Nelson, Jelani; Nguyen, Huy; Talwar, Kunal; Zhou, Samson (September 2024, OpenReview.net)

Full Text Available
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Asi, Hilal; Feldman, Vitaly; Nelson, Jelani; Nguyen, Huy; Talwar, Kunal; Zhou, Samson (July 2024, Proceedings of the 41st International Conference on Machine Learning (ICML))

Full Text Available
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Asi, Hilal; Feldman, Vitaly; Nelson, Jelani; Nguyen, Huy L; Talwar, Kunal; Zhou, Samson (July 2024, Proceedings of Machine Learning Research)

We study the problem of private vector mean estimation in the shuffle model of privacy where n users each have a unit vector v^{(i)} in R^d. We propose a new multi-message protocol that achieves the optimal error using O~(min(n*epsilon^2, d)) messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send Omega(min(n*epsilon^2,d)/log(n)) messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error O(dn^{d/(d+2)} * epsilon^{-4/(d+2)}). Moreover, we show that any single-message protocol must incur mean squared error Omega(dn^{d/(d+2)}), showing that our protocol is optimal in the standard setting where epsilon = Theta(1). Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.
more » « less
Full Text Available
Streaming Algorithms with Few State Changes

https://doi.org/10.1145/3651145

Jayaram, Rajesh; Woodruff, David P; Zhou, Samson (May 2024, Proceedings of the ACM on Management of Data)

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental F_pmoment estimation problem with p ≥ 1, any streaming algorithm that achieves a constant factor approximation must make Ω(n^1-1/p) internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm which also has near-optimal space complexity. Specifically, we give a (1+ε)-approximation algorithm for F_pmoment estimation that use a near-optimal ~O_ε(n^1-1/p) number of state changes, while simultaneously achieving near-optimal space, i.e., for p∈[1,2), our algorithm uses poly(log n,1/ε) bits of space for, while for p>2, the algorithm uses ~O_ε(n^1-1/p) space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity.
more » « less
Full Text Available
Bandwidth-Hard Functions: Reductions and Lower Bounds

https://doi.org/10.1007/s00145-024-09497-3

Blocki, Jeremiah; Liu, Peiyuan; Ren, Ling; Zhou, Samson (April 2024, Journal of Cryptology)

Memory Hard Functions (MHFs) have been proposed as an answer to the growing inequality between the computational speed of general purpose CPUs and ASICs. MHFs have seen widespread applications including password hashing, key stretching and proofs of work. Several metrics have been proposed to quantify the memory hardness of a function. Cumulative memory complexity (CMC) quantifies the cost to acquire/build the hardware to evaluate the function repeatedly at a given rate. By contrast, bandwidth hardness quantifies the energy costs of evaluating this function. Ideally, a good MHF would be both bandwidth hard and have high CMC. While the CMC of leading MHF candidates is well understood, little is known about the bandwidth hardness of many prominent MHF candidates. Our contributions are as follows: First, we provide the first reduction proving that, in the parallel random oracle model (pROM), the bandwidth hardness of a data-independent MHF (iMHF) is described by the red-blue pebbling cost of the directed acyclic graph associated with that iMHF. Second, we show that the goals of designing an MHF with high CMC/bandwidth hardness are well aligned. Any function (data-independent or not) with high CMC also has relatively high bandwidth costs. Third, we prove that in the pROM the prominent iMHF candidates such as Argon2i, aATSample and DRSample are maximally bandwidth hard. Fourth, we prove the first unconditional tight lower bound on the bandwidth hardness of a prominent data-dependent MHF called Scrypt in the pROM. Finally, we show the problem of finding the minimum cost red–blue pebbling of a directed acyclic graph is NP-hard.
more » « less
Full Text Available
Differentially Private L2-Heavy Hitters in the Sliding Window Model

Blocki, Jeremiah; Lee, Seunghoon; Mukherjee, Tamalika; Zhou, Samson (February 2023, Eleventh International Conference on Learning Representations (ICLR 2023))

The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for months while the Google data retention policy states that browser information may be stored for up to months. These policies are captured by the sliding window model, in which only the most recent statistics form the underlying dataset. In this paper, we consider the problem of privately releasing the L2-heavy hitters in the sliding window model, which include Lp-heavy hitters for p<=2 and in some sense are the strongest possible guarantees that can be achieved using polylogarithmic space, but cannot be handled by existing techniques due to the sub-additivity of the L2 norm. Moreover, existing non-private sliding window algorithms use the smooth histogram framework, which has high sensitivity. To overcome these barriers, we introduce the first differentially private algorithm for L2-heavy hitters in the sliding window model by initiating a number of L2-heavy hitter algorithms across the stream with significantly lower threshold. Similarly, we augment the algorithms with an approximate frequency tracking algorithm with significantly higher accuracy. We then use smooth sensitivity and statistical distance arguments to show that we can add noise proportional to an estimation of the norm. To the best of our knowledge, our techniques are the first to privately release statistics that are related to a sub-additive function in the sliding window model, and may be of independent interest to future differentially private algorithmic design in the sliding window model.
more » « less
Full Text Available
Robust Algorithms on Adaptive Inputs from Bounded Adversaries

Cherapanamjeri, Yeshwanth; Silwal, Sandeep; Woodruff, David P; Zhang, Fred; Zhang, Qiuyi; Zhou, Samson (May 2023, International Conference on Learning Representations)

We study dynamic algorithms robust to adaptive input generated from sources with bounded capabilities, such as sparsity or limited interaction. For example, we consider robust linear algebraic algorithms when the updates to the input are sparse but given by an adversary with access to a query oracle. We also study robust algorithms in the standard centralized setting, where an adversary queries an algorithm in an adaptive manner, but the number of interactions between the adversary and the algorithm is bounded. We first recall a unified framework of [HKM+20, BKM+22, ACSS23] which is roughly a quadratic improvement over the na ̈ıve implementation, and only incurs a logarithmic overhead in query time. Although the general framework has diverse applications in machine learning and data science, such as adaptive distance estimation, kernel density estimation, linear regression, range queries, and point queries and serves as a preliminary benchmark, we demonstrate even better algorithmic improvements for (1) reducing the pre-processing time for adaptive distance estimation and (2) permitting an unlimited number of adaptive queries for kernel density estimation. Finally, we complement our theoretical results with additional empirical evaluations.
more » « less
Full Text Available
On Differential Privacy and Adaptive Data Analysis with Bounded Space

https://doi.org/10.1007/978-3-031-30620-4_2

Dinur, Itai; Stemmer, Uri; Woodruff, David P.; Zhou, Samson (January 2023, Eurocrypt)

Full Text Available

« Prev Next »

Search for: All records