skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Generalized PTR: User-Friendly Recipes for Data-Adaptive Algorithms with Differential Privacy
The ''Propose-Test-Release'' (PTR) framework is a classic recipe for designing differentially private (DP) algorithms that are data-adaptive, i.e. those that add less noise when the input dataset is nice. We extend PTR to a more general setting by privately testing data-dependent privacy losses rather than local sensitivity, hence making it applicable beyond the standard noise-adding mechanisms, e.g. to queries with unbounded or undefined sensitivity. We demonstrate the versatility of generalized PTR using private linear regression as a case study. Additionally, we apply our algorithm to solve an open problem from ''Private Aggregation of Teacher Ensembles (PATE)'' -- privately releasing the entire model with a delicate data-dependent analysis.  more » « less
Award ID(s):
2048091
PAR ID:
10397829
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We provide an end-to-end Renyi DP based-framework for differentially private top-๐‘˜ selection. Unlike previous approaches, which require a data-independent choice on ๐‘˜, we propose to privately release a data-dependent choice of ๐‘˜ such that the gap between ๐‘˜-th and the (๐‘˜+1)st โ€œqualityโ€ is large. This is achieved by an extension of the Report-Noisy-Max algorithm with a more concentrated Gaussian noise. Not only does this eliminates one hyperparameter, the adaptive choice of ๐‘˜ also certifies the stability of the top-๐‘˜ indices in the unordered set so we can release them using a combination of the propose-test-release (PTR) framework and the Distance-to-Stability mechanism. We show that our construction improves the privacy-utility trade-offs compared to the previous top-๐‘˜ selection algorithms theoretically and empirically. Additionally, we apply our algorithm to โ€œPrivate Aggregation of Teacher Ensembles (PATE)โ€ in multi-label classification tasks with a large number of labels and show that it leads to significant performance gains. 
    more » « less
  2. One of the most common problems studied in the context of differential privacy for graph data is counting the number of non-induced embeddings of a subgraph in a given graph. These counts have very high global sensitivity. Therefore, adding noise based on powerful alternative techniques, such as smooth sensitivity and higher-order local sensitivity have been shown to give significantly better accuracy. However, all these alternatives to global sensitivity become computationally very expensive, and to date efficient polynomial time algorithms are known only for few selected subgraphs, such as triangles, k-triangles, and k-stars. In this paper, we show that good approximations to these sensitivity metrics can be still used to get private algorithms. Using this approach, we much faster algorithms for privately counting the number of triangles in real-world social networks, which can be easily parallelized. We also give a private polynomial time algorithm for counting any constant size subgraph using less noise than the global sensitivity; we show this can be improved significantly for counting paths in special classes of graphs. 
    more » « less
  3. The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for months while the Google data retention policy states that browser information may be stored for up to months. These policies are captured by the sliding window model, in which only the most recent statistics form the underlying dataset. In this paper, we consider the problem of privately releasing the L2-heavy hitters in the sliding window model, which include Lp-heavy hitters for p<=2 and in some sense are the strongest possible guarantees that can be achieved using polylogarithmic space, but cannot be handled by existing techniques due to the sub-additivity of the L2 norm. Moreover, existing non-private sliding window algorithms use the smooth histogram framework, which has high sensitivity. To overcome these barriers, we introduce the first differentially private algorithm for L2-heavy hitters in the sliding window model by initiating a number of L2-heavy hitter algorithms across the stream with significantly lower threshold. Similarly, we augment the algorithms with an approximate frequency tracking algorithm with significantly higher accuracy. We then use smooth sensitivity and statistical distance arguments to show that we can add noise proportional to an estimation of the norm. To the best of our knowledge, our techniques are the first to privately release statistics that are related to a sub-additive function in the sliding window model, and may be of independent interest to future differentially private algorithmic design in the sliding window model. 
    more » « less
  4. We study distributed estimation and learning problems in a networked environment where agents exchange information to estimate unknown statistical properties of random variables from their privately observed samples. The agents can collectively estimate the unknown quantities by exchanging information about their private observations, but they also face privacy risks. Our novel algorithms extend the existing distributed estimation literature and enable the participating agents to estimate a complete sufficient statistic from private signals acquired offline or online over time and to preserve the privacy of their signals and network neighborhoods. This is achieved through linear aggregation schemes with adjusted randomization schemes that add noise to the exchanged estimates subject to differential privacy (DP) constraints, both in an offline and online manner. We provide convergence rate analysis and tight finite-time convergence bounds. We show that the noise that minimizes the convergence time to the best estimates is the Laplace noise, with parameters corresponding to each agentโ€™s sensitivity to their signal and network characteristics. Our algorithms are amenable to dynamic topologies and balancing privacy and accuracy trade-offs. Finally, to supplement and validate our theoretical results, we run experiments on real-world data from the US Power Grid Network and electric consumption data from German Households to estimate the average power consumption of power stations and households under all privacy regimes and show that our method outperforms existing first-order privacy-aware distributed optimization methods. 
    more » « less
  5. Nepalโ€™s forest cover nearly doubled over the last three decades. While Community Forest (CF) management and agricultural abandonment are primary drivers of forest cover expansion, the contribution of afforestation on privately managed land is not well documented. We mapped forest cover change from 1988 through 2016 in 40 privately managed sites that transitioned from agriculture to forest and assessed how agricultural abandonment influenced private land management and afforestation. We used a mixed method analysis to integrate our 29- year Landsat satellite image-based record of annual forest cover with interview data on historical land cover and land use dynamics from 65 land managers in Bagmati Province. We find that privately managed land accounted for 37% of local forest cover gain, with mean forest area within private forests growing from 9% to 59%. Land managers identified two factors driving these gains on private land: implementation of CF man- agement in adjacent government forests and out-migration. These previously undocumented linkages between forest cover gain on private land and CF management merits further research in community forests and calls for greater policy and technical support for small-scale timber growers and rural households who rely on private forests for income generation. 
    more » « less