<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Privacy-Aware Rejection Sampling</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2023 February</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10393949</idno>
					<idno type="doi"></idno>
					<title level='j'>Journal of machine learning research</title>
<idno>1532-4435</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Jordan Awan</author><author>Vinayak Rao</author><author>Adrian Weller</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[While differential privacy (DP) offers strong theoretical privacy guarantees, implementations of DP mechanisms may be vulnerable to side-channel attacks, such as timing attacks. When sampling methods such as MCMC or rejection sampling are used to implement a privacy mechanism, the runtime can leak private information. We characterize the additional privacy cost due to the runtime of a rejection sampler in terms of both (, δ)-DP as well as f -DP. We also show that unless the acceptance probability is constant across databases, the runtime of a rejection sampler does not satisfy -DP for any . We show that there is a similar breakdown in privacy with adaptive rejection samplers. We propose three modifications to the rejection sampling algorithm, with varying assumptions, to protect against timing attacks by making the runtime independent of the data. The modification with the weakest assumptions is an approximate sampler, introducing a small increase in the privacy cost, whereas the other modifications give perfect samplers. We also use our techniques to develop an adaptive rejection sampler for log-H ̈older densities, which also hasdata-independent runtime. We give several examples of DP mechanisms that fit the assumptions of our methods and can thus be implemented using our samplers.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>As more data is collected, analyzed, and published by researchers, companies, and government agencies, concerns about the privacy of the participating individuals have become more prominent <ref type="bibr">(Lane et al., 2014)</ref>. While there have been many methods of statistical disclosure control to combat this problem <ref type="bibr">(Hundepool et al., 2012)</ref>, differential privacy (DP) <ref type="bibr">(Dwork et al., 2006)</ref> has arisen as the state-of-the-art framework for privacy protection, and is currently being implemented by Google <ref type="bibr">(Erlingsson et al., 2014)</ref>, Apple <ref type="bibr">(Tang et al., 2017)</ref>, <ref type="bibr">Microsoft (Ding et al., 2017)</ref>, and the US Census <ref type="bibr">(Abowd, 2018)</ref>. Differential privacy is based on a notion of plausible deniability, and requires the introduction of additional noise, beyond sampling, into the analysis procedure. Given the output of a DP mechanism, an adversary cannot determine with high probability whether any particular individual participated in the dataset <ref type="bibr">(Wasserman and Zhou, 2010)</ref>.</p><p>Because of the formal nature of DP, implementations of the mechanisms must be very careful to prevent unintentional privacy leaks through side-channels. Side-channel attacks have been a longstanding problem in computer systems, and may consist of the execution time, power consumption, or memory usage of the system, to name a few <ref type="bibr">(Joy Persial et al., 2011;</ref><ref type="bibr">Nilizadeh et al., 2019)</ref>. With differential privacy, the system can be made black-box to remove some of these side-channels, but may still be susceptible to timing attacks. Such a side-channel may be present if the DP mechanism is part of a query-response framework, where users submit queries and the curator replies with a DP response; in this model, the adversary may measure the time between submitting the query and receiving the answer, and use this information as part of their attack. PINQ <ref type="bibr">(McSherry, 2009)</ref> and Airavat <ref type="bibr">(Roy et al., 2010)</ref> were two of the earliest DP implementations, but were shown by <ref type="bibr">Haeberlen et al. (2011)</ref> to be vulnerable to timing attacks. FUZZ <ref type="bibr">(Haeberlen et al., 2011)</ref> and GUPT <ref type="bibr">(Mohan et al., 2012)</ref> avoid timing attacks by working with simple queries for which the worst-case computational time can be determined. This solution works for simple DP tasks, but is nontrivial for complex DP mechanisms.</p><p>One of the most common and powerful DP mechanisms is the exponential mechanism <ref type="bibr">(McSherry and Talwar, 2007)</ref> which results in an unnormalized density of the form exp(g D (x)) that must be sampled from, where g D is some function that depends on the database D. The exponential mechanism has been widely used to tackle problems such as principal component analysis <ref type="bibr">(Chaudhuri et al., 2013;</ref><ref type="bibr">Kapralov and Talwar, 2013;</ref><ref type="bibr">Awan et al., 2019)</ref>, K-means clustering, <ref type="bibr">(Feldman et al., 2009)</ref>, convex optimization <ref type="bibr">(Bassily et al., 2014a,b)</ref>, robust regression <ref type="bibr">(Asi and Duchi, 2020b)</ref>, linear and quantile regression <ref type="bibr">(Reimherr and Awan, 2019)</ref>, synthetic data <ref type="bibr">(Snoke and Slavkovi&#263;, 2018)</ref>, and Bayesian data analysis <ref type="bibr">(Wang et al., 2015;</ref><ref type="bibr">Minami et al., 2016;</ref><ref type="bibr">Zhang et al., 2016;</ref><ref type="bibr">Dimitrakakis et al., 2017)</ref> to name a few.</p><p>A challenge however is that for functions g D (x) encountered in practice, the unnormalized density exp(g D (x)) is often difficult to sample from. In statistics and machine learning, there are many computational techniques to produce either exact or approximate samples from such distributions, including Markov chain Monte Carlo (MCMC), rejection sampling, and approximate Bayesian computing. However, there are two sources of privacy leaks when using these computational sampling methods: 1) when using approximate samplers, the resulting sample does not exactly follow the target distribution, with the error in the approximation resulting in an increased privacy risk, 2) with either an approximate or exact sampler, if the runtime of the algorithm depends on the database, then this side-channel may leak private information <ref type="bibr">(Haeberlen et al., 2011)</ref>.</p><p>We will consider the runtime of the algorithm as an additional output accessible to an adversary, and we will require that both the official output and the runtime jointly satisfy differential privacy. As <ref type="bibr">Haeberlen et al. (2011)</ref> point out, the simplest solution is to make the runtime independent of the dataset. In this paper we propose different modifications, under different assumptions, which produce rejection samplers with data-independent runtime, and are thus immune to timing attacks.</p><p>Contributions First, we quantify the privacy risk of rejection and adaptive rejection sampling without any privacy-preserving modifications. As a properly implemented rejection sampler results in samples with distribution equal to the target, the only privacy concern is the runtime, which varies for different databases. We characterize the privacy risk due to the runtime of a simple rejection sampler in terms of both (&#491;, &#948;)-DP and f -DP <ref type="bibr">(Dong et al., 2022)</ref>. We also show that the runtime of a simple rejection sampler does not satisfy &#491;-DP for any finite &#491; unless the acceptance rate is constant across databases. We similarly show that the runtime of an adaptive rejection sampler does not satisfy &#491;-DP unless acceptance probabilities across databases converge in terms of a certain series.</p><p>Given the increased privacy risk due to the runtime, we propose several modifications to rejection samplers, which make the runtime independent of the database: 1) choose the number of iterations to run the sampler ahead of time, based on a lower bound on the acceptance probability, 2) introduce an additive wait-time based on a worst-case dataset, 3) use squeeze functions to add an implicit wait-time. We also propose an adaptive rejection sampler with data-independent runtime, which can be applied to any log-H&#246;lder density. The adaptive sampler is a modification of the (nearly) minimax optimal sampler from <ref type="bibr">Achddou et al. (2019)</ref>, using the technique of squeeze functions. Finally, we give examples of the exponential mechanism which satisfy the assumptions of our methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related work</head><p>Often side-channels are handled using more relaxed metrics than DP, such as minentropy <ref type="bibr">(Smith, 2009)</ref>. However, the point of view of this paper is that if the dataset in question is judged to require the protection of differential privacy, then we must ensure that all channels are protected in the DP framework. Thus, while for other applications it may be appropriate to use a weaker protection for side channels, in DP applications, the runtime must also satisfy DP. See <ref type="bibr">Haeberlen et al. (2011)</ref> for a similar discussion.</p><p>Besides timing side-channels, there are other notable side-channel attacks that have been effective against DP implementations. <ref type="bibr">Haeberlen et al. (2011)</ref> showed that when the privacy budget is chosen based on the database, that the budget is another side-channel. <ref type="bibr">Wagh et al. (2018)</ref> consider the privacy cost of RAM access, and propose a differential privacy regime to formally protect the RAM access. <ref type="bibr">Dodis et al. (2012)</ref> and Garfinkel and Leclerc (2020) explore the concerns of using pseudo-random number generators in the implementation of DP systems. <ref type="bibr">Mironov (2012)</ref> showed that when implementing DP mechanisms with floating point arithmetic, privacy can be arbitrarily compromised by the artifacts in the least significant bit. <ref type="bibr">Ilvento (2020)</ref> provide an implementation of the exponential mechanism on finite state spaces that is immune to the floating point attacks, but which is admitted to be susceptible to timing attacks.</p><p>A different approach to sampling the exponential mechanism is using MCMC techniques, and there have been some prior works characterizing the additional privacy cost of these approximate samplers. Usually convergence of MCMC methods is characterized in terms of total variation distance, and <ref type="bibr">Minami et al. (2016)</ref> showed that these guarantees can be imported to produce approximate DP samples with an increased 'delta' in a fixed number of iterations. <ref type="bibr">Ganesh and Talwar (2020)</ref> expanded upon the results of <ref type="bibr">Vempala and Wibisono (2019)</ref> to show that Langevin MCMC converges in R&#233;nyi divergence, which allows for the quantification of the privacy loss by sampling in terms of R&#233;nyi DP. R&#233;nyi divergences are much stronger than total variation, and have been used in various definitions of DP <ref type="bibr">(Mironov, 2017;</ref><ref type="bibr">Bun and Steinke, 2016;</ref><ref type="bibr">Bun et al., 2018)</ref>. <ref type="bibr">Minami et al. (2016)</ref> also study Langevin MCMC, but characterize the privacy cost in terms of (&#491;, &#948;)-DP. <ref type="bibr">Seeman et al. (2021)</ref> develop an exact sampler for the exponential mechanism based on an MCMC procedure with artificial atoms, however, they acknowledge that their approach does not protect against timing side-channels. To our knowledge, there has been no prior work quantifying the privacy risk of rejection sampling, or proposing rejection samplers with data-independent runtime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and notation</head><p>In this section, we review the necessary background on differential privacy and rejection sampling. We also set the notation for the rest of the paper.</p><p>Let X and Y be random variables on a measurable space (Y , F ), with corresponding probability measures &#181; X and &#181;</p><p>For a distribution M , we typically write &#960;(x) for an unnormalized density of M , &#960;(x) = &#960;(x)/ &#960;(x) dx, and g = log( &#960;) (equivalently, &#960;(x) = exp(g(x)). We write U (x) to denote a density that upper bounds &#960; as &#960;(x) &#8804; c U U (x) for some constant c U . Similarly, we write L(x) for a density that lower bounds &#960; as c L L(x) &#8804; &#960;(x) for a constant c L . In rejection sampling, U is called the proposal distribution, and L is the squeeze function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Differential privacy</head><p>Differential privacy (DP), introduced in <ref type="bibr">Dwork et al. (2006)</ref>, is a framework to characterize the privacy risk of a given algorithm, and offers techniques to design mechanisms which limit privacy loss. DP methods require the introduction of additional randomness, beyond sampling, in order to offer a notion of plausible deniability. Given the output of a DP mechanism, it is difficult for an adversary to determine whether a particular individual participated in the dataset or not. While an idealized algorithm may be proven to be differentially private, to characterize the actual privacy cost of a given implementation, one must consider all side-channels such as the runtime as part of the DP output <ref type="bibr">(Haeberlen et al., 2011)</ref>.</p><p>Definition 1 (Privacy mechanism) Given a metric space (D, d), which represents the set of possible databases, a set of probability measures {M D | D &#8712; D} on a common space Y is called a privacy mechanism.</p><p>The space D represents the space of possible databases, and it is common to take D = X n for some set X , with X representing the possible contributions of one individual in the database. In that case, the metric d is often chosen to be the Hamming distance, so that d(D, D &#8242; ) &#8804; 1 represents that D and D &#8242; are adjacent databases, differing in only one individual's contribution.</p><p>When implementing a privacy mechanism, we publish one sample from M D , which satisfies some form of privacy.</p><p>Definition 2 ((&#491;, &#948;)-DP: <ref type="bibr">Dwork et al., 2006)</ref> Given a metric space (D, d), &#491; &#8805; 0 and &#948; &#8712; [0, 1], a privacy mechanism {M D } on the space Y satisfies (&#491;, &#948;)-differential privacy if for all measurable sets B &#8712; Y and all d(D, D &#8242; ) &#8804; 1,</p><p>The values &#491; and &#948; are called the privacy parameters, which capture the privacy risk for the given mechanism. Smaller values of &#491; and &#948; give stronger privacy guarantees. Typically, &#491; is chosen to be a small constant such as 1 or 0.1, whereas usually &#948; &#8810; 1/n. In the case where &#948; = 0, we call (&#491;, 0)-DP "pure differential privacy," and write &#491;-DP. A privacy mechanism satisfying &#491;-DP is equivalent to requiring that</p><p>While we phrase most of our results in terms of (&#491;, &#948;)-DP, another useful formulation of DP is f -DP <ref type="bibr">(Dong et al., 2022)</ref>, which is expressed in terms of hypothesis tests. f -DP is based on bounding the receiver-operator curve (ROC) or tradeoff function when testing between two adjacent databases, given the output of a privacy mechanism. For two probability distributions P and Q, the tradeoff function is the smallest type-II error as a function of the type-I error. Formally, the tradeoff function for P and   </p><p>for all D, D &#8242; &#8712; D such that d(D, D &#8242; ) &#8804; 1.</p><p>See Figure <ref type="figure">1a</ref> for examples of tradeoff functions which do and do not satisfy f -DP for a particular f . Without loss of generality we can assume that f is symmetric: <ref type="bibr">Dong et al. (2022, Proposition 2)</ref>, which states that for a given f and a mechanism M that is f -DP, there exists a symmetric f * &#8805; f such that M is f * -DP.</p><p>It turns out that (&#491;, &#948;)-DP is a special case of f -DP, where f is taken to be a particular piecewise linear function. Specifically, let &#491; &#8805; 0 and &#948; &#8712; [0, 1], and define f &#491;,&#948; (&#945;) = max{0, 1 -&#948;exp(&#491;)&#945;, exp(-&#491;)(1 -&#948; -&#945;)}. Then a privacy mechanism M satisfies (&#491;, &#948;)-DP if and only if it satisfies f &#491;,&#948; -DP <ref type="bibr">(Dong et al., 2022, Proposition 3)</ref>. The following proposition, based on <ref type="bibr">Dong et al. (2022, Propositions 5 and 6)</ref>, gives a simple conversion between f -DP and (&#491;, &#948;)-DP, by determining the linear functions which lower bound f . Proposition 4 Let f be a symmetric tradeoff function. If a privacy mechanism satisfies f -DP, then it satisfies (&#491;, &#948;)-DP provided that (1 -&#948;) -exp(&#491;)&#945; &#8804; f (&#945;) for all &#945; &#8712; [0, 1].</p><p>Proof We need to show that f &#491;,&#948; (&#945;) &#8804; f (&#945;) for all &#945; &#8712; [0, 1]. By symmetry of f and f &#491;,&#948; , the condition stated is sufficient.</p><p>If the tradeoff function f makes the inequalities of Definition 3 tight, then by Proposition 4 the tightest (&#491;, &#948;)-DP guarantee takes a tangent line of f and sets (1 -&#948;) to be the y-intercept and -exp(&#491;) to be its slope. This approach gives a precise conversion from f -DP to (&#491;, &#948;)-DP, which we use in Theorem 10. In fact, there is a stronger duality between f -DP and a family of (&#491;, &#948;(&#491;))-DP characterizations, described in <ref type="bibr">Dong et al. (2022, Propositions 5 and 6)</ref>. Figure <ref type="figure">1b</ref> illustrates the conversion from f -DP to (&#491;, &#948;)-DP.</p><p>An important property of both (&#491;, &#948;)-DP and f -DP is that it is robust to post-processing. That is, if a privacy mechanism satisfies DP, then applying any deterministic or randomized algorithm to the output cannot degrade the DP guarantee. This property is related to data processing inequalities.</p><p>Proposition 5 (Post-processing: <ref type="bibr">Dwork et al., 2014;</ref><ref type="bibr">Dong et al., 2022)</ref> Let M be a privacy mechanism with output space Y , f a tradeoff function, &#491; &#8805; 0, and &#948; &#8712; [0, 1]. Let Proc be a potentially randomized mapping from Y to Z . Then</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Exponential mechanism</head><p>Having established the definitions of both (&#491;, &#948;)-DP and f -DP, there remains the question of how to construct a privacy mechanism for a given statistical task. A general and powerful technique, and one that will be the focus of this paper, is the exponential mechanism <ref type="bibr">(McSherry and Talwar, 2007)</ref>. Given a utility function g D , where large values of g D indicate higher utility, the exponential mechanism samples from the unnormalized density &#960; D (x) = exp(g D (x))&#960; 0 (x), where &#960; 0 is a base measure. This mechanism satisfies (2/&#8710;, 0)-DP where &#8710; is the sensitivity of g D :</p><p>Often &#960; 0 is chosen to be Lebesgue measure, but it can also be chosen to be a probability measure similar to a prior <ref type="bibr">(Wang et al., 2015;</ref><ref type="bibr">Minami et al., 2016;</ref><ref type="bibr">Dimitrakakis et al., 2017)</ref>. In infinitedimensional function spaces, there is no translation-invariant measure, so a nontrivial base measure must be used <ref type="bibr">(Awan et al., 2019)</ref>. Many statistical tasks can be expressed as finding the solution to a minimization or maximization problem of some objective function (e.g., log-likelihood function, sum of squared error, or a general empirical risk function). For these tasks, it is natural to choose the utility function in the exponential mechanism to be some transformation of such an objective function. For example, <ref type="bibr">Reimherr and Awan (2019)</ref> show that when an objective function &#958; D (x) is strongly convex, sampling from the exponential mechanism with utility function g(x) = -&#8711;&#958; D (x) results in an estimator which satisfies x * = arg min x &#958; D (x) + O p (n -1 ). Though the exponential mechanism was designed with (&#491;, 0)-DP in mind, it has been shown that when the utility function satisfies additional assumptions such as concavity, Lipschitz continuity, or strong concavity, the exponential mechanism may satisfy (&#491;, &#948;)-DP <ref type="bibr">(Minami et al., 2016;</ref><ref type="bibr">Dimitrakakis et al., 2017)</ref>, even when the sensitivity &#8710; is infinite.</p><p>While the exponential mechanism is very flexible and offers high utility guarantees, sampling exp(g D (x)) exactly is generally very challenging. While specific implementations of the exponential mechanism sometimes have efficient sampling schemes (e.g., <ref type="bibr">Bassily et al., 2014a,b;</ref><ref type="bibr">Asi and Duchi, 2020a,b)</ref>, in general, more sophisticated computational sampling techniques are needed. For example, <ref type="bibr">Chaudhuri et al. (2012</ref><ref type="bibr">Chaudhuri et al. ( , 2013) )</ref> and <ref type="bibr">Awan et al. (2019)</ref> use a Gibbs sampler to implement the exponential mechanism in the application of principal component analysis, using heuristics to argue convergence. <ref type="bibr">Reimherr and Awan (2019)</ref> use MCMC implementations of their proposed K-norm gradient (KNG) mechanism, but leave considerations of the cost of the implementation for future work. <ref type="bibr">Snoke and Slavkovi&#263; (2018)</ref> propose an instance of the exponential mechanism for synthetic data, which they sample using the Metropolis algorithm, without considering the privacy cost of the sampler.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Rejection sampling</head><p>Given the structure of the unnormalized density, sampling from exp(g D (x)) is often well suited to rejection sampling. Given an unnormalized target density &#960;(x) &#8733; &#960;(x) = exp(g(x)), which is difficult to sample from, and a simpler proposal density U (x) which satisfies &#960;(x) &#8804; cU (x) for some c and all x, a rejection sampler draws X &#8764; U (x) and accepts the sample with probability &#960;(X)/(cU (X)). This process is repeated until a sample is accepted, and it is easy to show that the accepted sample is distributed as X &#8764; &#960;(x). The requirements to implement a rejection sampler are that we can evaluate &#960;(x), and determine U (x) and c which satisfy the above inequality. We will call these samplers simple rejection samplers when we need to distinguish these from adaptive rejection samplers, which we introduce later in this section. See <ref type="bibr">Martino (2018)</ref> for an extensive introduction to rejection samplers.</p><p>The marginal probability of accepting a sample at any particular iteration from a simple rejection sampler is p = c -1 &#960;(x) dx, so that the number of iterations T needed to obtain an accepted sample follows a geometric distribution: T &#8764; Geom(p). In this paper we assume that the geometric distribution has support 1, 2, 3, . . ., with pmf</p><p>While rejection samplers allow exact samples to be drawn from an intractable target distribution, the acceptance probability p typically decays exponentially with dimension, making them suitable only for low-dimensional problems. Adaptive rejection samplers attempt to address this shortcoming, and proceed by producing a sequence of upper bounds U n (x) and constants c n such that &#960;(x) &#8804; c n U n (x) and such that the acceptance probability increases with n. Just like a simple rejection sampler, conditional on acceptance, adaptive rejection samplers produce samples X &#8764; &#960;(x). Typically, the upper bounds are updated stochastically, using the information from the previously rejected samples. While this minimizes the number of evaluations of &#960;, the acceptance probabilities update in a manner depending on the target &#960;, making the runtime difficult to analyze. Alternatively, the upper bound can be updated in a deterministic manner such as in <ref type="bibr">Leydold et al. (2002)</ref>, which makes understanding the runtime much simpler. While deterministic updates require more evaluations of &#960;, they can potentially result in upper bounds that converge to &#960; much faster, resulting in a tradeoff.</p><p>With adaptive rejection sampling, the marginal probability of accepting a sample at iteration n is p n = 1 cn &#960;(x) dx. However, as the acceptance probability changes over time, the runtime T to accept one sample is no longer geometric, but has pmf</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Privacy risk of rejection sampling</head><p>In this section, we characterize the privacy cost of a rejection sampler when we allow the adversary to have access to both the accepted sample as well as the runtime. Recall that if a rejection sampler is run until acceptance, then the accepted sample is an exact sample from the target distribution. Thus, the only increased privacy risk from using this algorithm is due to the runtime. We will measure the privacy risk of this side-channel in terms of &#491;-DP, (&#491;, &#948;)-DP, and f -DP. We show that for the exponential mechanism, the privacy cost of a rejection sampler's runtime is non-negligible.</p><p>Assumption 6 For a rejection sampler, we assume that along with the published accepted sample, the runtime is also available to an attacker. We assume that for all databases D and for all x in the domain, the evaluations g D (x) take the same time to evaluate. As such, the runtime is proportional to the number of iterations in the sampler. Thus for the rest of the paper, the runtime will simply refer to the number of iterations in the sampler.</p><p>Note that while the proposal distribution U D (x), target exp(g D (x)), and threshold c D may all depend on D, none are directly available to the attacker.</p><p>Remark 7 Many utility functions used in the exponential mechanism can be expressed as empirical risks <ref type="bibr">(Bassily et al., 2014a,b;</ref><ref type="bibr">Reimherr and Awan, 2019;</ref><ref type="bibr">Wang et al., 2019)</ref>. In this case, assuming that the database size n is fixed, ensuring that the time to evaluate g D (x) is constant is equivalent to ensuring that the contributions to the empirical risk from each individual take constant time. This is in line with the techniques used in <ref type="bibr">Haeberlen et al. (2011)</ref> who split each query into sub-queries which are evaluated on each member of the dataset.</p><p>First we will study the privacy cost of the rejection sampling runtime in terms of &#491;-DP. Proposition 9 states that the runtime of a rejection sampler violates &#491;-DP unless the probability of acceptance is constant across databases. To prove this, recall that &#491;-DP is measured by the maxdivergence. Lemma 8 shows that the symmetric max-divergence between two geometric random variables is unbounded whenever the parameters differ, and the proposition follows easily from this. Lemma 8 Let p, q &#8712; (0, 1) and let X &#8764; Geom(p) and Y &#8764; Geom(q). Then</p><p>Proof As all geometric random variables, with parameter in (0, 1), are equivalent measures on the positive integers, it suffices to determine an upper bound on log P (X=k) P (Y =k) for k &#8712; {1, 2, . . .}. This quantity can be expressed as</p><p>We see that this quantity is linear in k. The slope is non-positive if and only if p &#8805; q, in which case the maximum value is achieved at k = 1, giving the value log(p/q). When p &lt; q, the slope is positive, and as k &#8594; &#8734;, the quantity is unbounded. T(Geom(p),Geom(q)) T(Geom(q),Geom(p)) f_R</p><p>Figure <ref type="figure">2</ref>: The tradeoff functions of T (Geom(p), Geom(q)) and T (Geom(q), Geom(p)), along with f R from Theorem 10. We fix R = 2. In the left plot q = .1 and in the right q = .6. p = 1-(1-q) R &gt; q.</p><p>We see that the approximation f R is more accurate for smaller q.</p><p>variables with the tradeoff function for exponential variables, which allows for a simpler formula. This bound is tighter for small acceptance probabilities. We use the likelihood ratio property of the exponential distribution along with some properties of convex functions to get the formula in equation ( <ref type="formula">1</ref>). We then use Proposition 5 to convert the f -DP guarantee to (&#491;, &#948;)-DP guarantees.</p><p>Theorem 10 Let (D, d) be a metric space of databases, and let T D be the runtime of a rejection sampler for database D which has acceptance probability p D . Note that</p><p>. The mechanism that releases the runtime</p><p>(1)</p><p>Proof We begin by establishing the form of f R , and then use Proposition 4 to produce (&#491;, &#948;)-DP guarantees. We first show that by bounding the tradeoff function of the exponential distribution, we get bounds for geometric variables as well. Call</p><p>, where T (&#8226;, &#8226;) represents the tradeoff function.</p><p>Next, we will derive the tradeoff function</p><p>) is an increasing function of x. By the Neyman Pearson Lemma, the most powerful test has a rejection region of the form x &#8805; &#964; . The type I error is &#945; = exp(-&#955; D &#964; ) and type II is</p><p>To get a single bound on both T (X D , X D &#8242; ) and T (X D &#8242; , X D ), we take the convex hull of min{1&#945; 1/R , (1 -&#945;) R }, which we claim has the form f R (&#945;) as stated in 1. To this end, we first verify that R) , so that over this range of &#945;, the convex hull just equals 1 -&#945; 1/R as required by the first line of Equation (1). To establish the first inequality of Equation ( <ref type="formula">2</ref>), note that f (&#945;) = 1 -&#945; 1/R is a convex function; this can be seen either by differentiating it twice, or from the fact that it is a tradeoff function. The straight line 1 -R&#945; intersects this curve at &#945; = 0 and &#945; = R R/1-R , and for intermediate values of &#945;, forms a chord segment. From convexity, this chord lies above the curve. For the second inequality of Equation ( <ref type="formula">2</ref>), observe that (1 -&#945;) R is also convex. We can easily verify that the line 1 -R&#945; is the tangent at &#945; = 0. The second inequality then follows from the fact that a convex function is lower bounded by its tangent. This justifies the first line of f R (&#945;) in Equation ( <ref type="formula">1</ref>). By symmetry, we also have that the third line is correct.</p><p>For the middle inequality, we note that the curves 1 - R) intersects the two curves at these two points, and has slope -1. It is thus tangent to both curves, and from convexity, lies below both of them. Altogether, we conclude that f R (&#945;) is the appropriate convex hull.</p><p>To get the formulas in 2. and 3., recall that the mechanism satisfies (&#491;, &#948;)-DP if the line (1-&#948;)exp(&#491;)&#945; is a lower bound for the tradeoff function f R (&#945;). To get the tightest (&#491;, &#948;)-DP guarantees, we characterize the supporting linear functions. By symmetry, it suffices to determine the tangent lines of 1 -&#945; 1/R for values 0 &#8804; &#945; &#8804; R R/(1-R) . We calculate the derivative as</p><p>Eliminating &#945; from the equations &#491; = log(1/R) + (1/R -1) log &#945; and &#948; = &#945; 1/R (1 -1/R) gives the expressions in parts 2. and 3. in the theorem statement. Note that &#948;(0) = (R -1)R R/(1-R) , so for any &#948; &gt; &#948;(0), the mechanism satisfies (0, &#948;)-DP, but this is a strictly weaker guarantee than (0, &#948;(0))-DP.</p><p>The approximation in Theorem 10 improves for smaller probabilities of acceptance, as seen in Figure <ref type="figure">2</ref>. Intuitively, this is because the approximation of a geometric variable as an exponential is more accurate for smaller probabilities of acceptance. As rejection samplers typically have small rejection probabilities, the privacy guarantees of Theorem 10 are quite accurate for rejection samplers of interest. In Table <ref type="table">1</ref>, we give a few examples converting the quantity R to (&#491;, &#948;)-DP guarantees. We see that even with a small R of 1.1, there is a nontrivial privacy cost. </p><p>. Then the privacy cost of M D along with the runtime is f R &#8855; f , where f R is defined in Theorem 10 and &#8855; is the tensor product of two tradeoff functions <ref type="bibr">(Dong et al., 2022, Definition 5)</ref>.</p><p>In Corollary 11, the tensor product f &#8855; g, where f = T (P, Q) and</p><p>, where P &#215; P &#8242; is the product distribution <ref type="bibr">(Dong et al., 2022, Definition 5)</ref>. In general, it is challenging to derive a closed form of f &#8855; g.</p><p>Remark 12 (Rejection sampling is trivial for location-scale) For some distributions, it is easy to build a rejection sampler, with constant acceptance probability. For example, suppose that the mechanism {M D | D &#8712; D} is location-scale (e.g., K-norm mechanisms: <ref type="bibr">Hardt and Talwar, 2010;</ref><ref type="bibr">Awan and Slavkovi&#263;, 2020)</ref>. In this case, we build a rejection sampler for a default distribution in the family, and transform after sampling. Then we have a rejection sampler where the acceptance rate is independent of the dataset.</p><p>While Theorem 10 describes the privacy cost of a rejection sampler's runtime, it is phrased in terms of the quantity R, which may be unintuitive. In the following example, we show that for a generic exponential mechanism, with an arbitrary set of proposal distributions, R is lower bounded by exp(&#491;), and may even be infinite.</p><p>Example 13 (Exponential mechanism) As shown in <ref type="bibr">McSherry and Talwar (2007)</ref>, the exponential mechanism results in a target distribution of the form &#960; D = exp(g D (x)), which usually satisfies exp(-&#491;/2) &#8804; &#960; D &#8242; (x) &#960; D (x) &#8804; exp(&#491;/2) for adjacent databases D and D &#8242; (the integrating constants may also differ by a factor of at most exp(&#177;&#491;/2)). Let U be a family of proposal distributions, and for each database D, let c D and U D be the optimal proposal distribution from U such that &#960; D &#8804; c D U D (x), where by optimal, we mean that the acceptance probability p D = &#960; D (x) dx c D is maximized; or equivalently c D is minimized. Then, from the following inequality,</p><p>we see that exp(&#491;/2)c D and U D offer a (potentially inferior) proposal distribution for &#960; D &#8242; . Using this relationship between the proposal distributions of D and D &#8242; , we can give a bound for the acceptance probability for D &#8242; based on the acceptance probability for D:</p><p>Call p * the highest acceptance probability over all possible databases D. Then the quantity R that appears in Theorem 10 can be expressed as</p><p>Note that as p * &#8594; 1 in Equation (3), R diverges to infinity. We can also get a lower bound on R:</p><p>where</p><p>= indicates the use of L'H&#244;pital's rule, and we used the fact that log(1-p)/ log(1-exp(-&#491;)p) is increasing in p for all p &#8712; (0, 1) and &#491; &gt; 0; to see this, we compute the derivative with respect to p:</p><p>(</p><p>We see in (5) that the denominator is positive so long as 0 &lt; p &lt; 1. The numerator of (5) can be expressed as</p><p>which we can see is positive and finite for all &#491; &gt; 0 and p &#8712; (0, 1).</p><p>Remark 14 (Parallelization and batching) Suppose that we have a simple rejection sampler targeting &#960; D with acceptance probability p D . We could consider a parallelized implementation as follows: run the sampler on k nodes; when the first sample is accepted, return the sample and the runtime and abort the other instances of the sampler. In this scheme, the runtime is distributed as</p><p>, where G i are independent Geom(p D ) random variables. Now, suppose for two adjacent databases D and D &#8242; that</p><p>which is the quantity in Theorem 10 that governs the privacy cost of the runtime. Then, in the parallelized scheme, we have</p><p>We see that parallelization does not affect the privacy cost of the runtime.</p><p>Similarly, one could decide to run the rejection sampler for a fixed number of iterations, say m before checking if one of the samples is accepted, and then repeating if necessary. This may be useful in combination with parallelization, since communication between the nodes could be a bottleneck. With batching, the runtime until a sample is accepted is Geom(1 -(1 -p D ) m ), which is the same runtime as in the parallelization. By the same reasoning, batching also does not affect the privacy cost of the runtime.</p><p>In Section 4, we develop samplers with data-independent runtime. As such, parallelizing or batching the samplers in the manner described above maintains the property that the runtime is data-independent, while potentially giving a significant speed up.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Privacy risk of adaptive rejection sampling</head><p>In this section, we analyze the privacy risk of adaptive rejection samplers. Often adaptive rejection samplers update the proposal in a stochastic manner, based on the target value at previously rejected samples. In this section, we consider the setting where the proposal is updated in a deterministic manner, such as in <ref type="bibr">Leydold et al. (2002)</ref>. We show that unless the acceptance probabilities converge in a strong sense, an adaptive rejection sampler will not satisfy &#491;-DP for any finite &#491;.</p><p>Proposition 15 Let D be the space of databases and {M D | D &#8712; D} a privacy mechanism which satisfies &#491;-DP. Let (p D i ) &#8734; i=1 be the weakly-increasing sequence of acceptance probabilities for an adaptive rejection sampler for M D . Call T D the runtime of the adaptive sampler for M D , which has pmf</p><p>Then releasing a sample from M D as well as the runtime T D satisfies (&#491; + &#491; T )-DP, where</p><p>for all t &#8805; 1 and all d(D, D &#8242; ) &#8804; 1. If there exists a constant c such that p D 1 &#8805; c &gt; 0 for all D, then the value &#491; T is finite if and only if the sequence</p><p>Proof For readability, we set p i := p D i and q i := p D &#8242; i . We require that log P (T D =t) P (T D &#8242; =t) &#8804; &#491; T for all d(D, D &#8242; ) &#8804; 1 and all t = 1, 2, . . .. A little algebra gives the expression for &#491; T .</p><p>Next, &#491; T is finite if and only if log P (T D =t) P (T D &#8242; =t) is bounded above and below for all d(D, D &#8242; ) &#8804; 1. Equivalently, this requires log</p><p>1-p i 1-q i be universally bounded above and below for all t. Since pt qt is bounded below by c and above by 1/c, the previous quantity is bounded if and only if log</p><p>bounded for all t. Relabelling t -1 to t gives the final result.</p><p>Proposition 15 shows that unless the acceptance probabilities are very closely related, it is not guaranteed that an adaptive rejection sampler will satisfy &#491;-DP for any finite &#491;. In the following example, we explore a few special cases to highlight when we can or cannot expect the condition in Proposition 15 to hold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 16</head><p>&#8226; If there exists i such that p i = 1 whereas q i &lt; 1 or vice versa, then &#491; T = &#8734;.</p><p>&#8226; Suppose that (1 -q i ) = &#945;(1 -p i ) where &#945; &#8712; (0, 1). Then the above series is t i=1 log 1-p i 1-q i = t i=1 log &#945; &#8594; &#8734;.</p><p>&#8226; To see that it is not sufficient for lim i&#8594;&#8734; 1-p i 1-q i = 1, consider the following: let (1 -p i ) be any decreasing sequence with values in (0, 1). Set (1 -q i ) = exp(-1/i)(1 -p i ). Then log 1-p i 1-q i = 1/i and so 1-p i 1-q i &#8594; 1. However, the sequence of partial sums t i=1 log 1-p i 1-q i = t i=1 1/i diverges, and so the max-divergence is infinite.</p><p>Remark 17 In Proposition 15, convergence of the series &#8734; i=1 log 1-p i 1-q i is sufficient but not necessary. It is possible that the sequence of partial sums is bounded but does not converge.</p><p>Note that for most adaptive rejection samplers, it is difficult to derive expressions for p i , so it may not even be possible to verify whether the condition in Proposition 15 holds or not. The takeaway is that in general, an adaptive rejection sampler is not guaranteed to preserve privacy unless it is carefully designed to do so.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Rejection samplers with data-independent runtime</head><p>The previous section showed that a rejection sampler (either simple or adaptive) can result in an arbitrary amount of privacy loss through the runtime. The most direct way to avoid this is to ensure that the runtime does not depend on the dataset. <ref type="bibr">Haeberlen et al. (2011)</ref> propose making the runtime a constant, though this is not strictly necessary. Rather, when the runtime is a random variable (as with rejection sampling), we simply need that its distribution does not depend on the dataset.</p><p>In this section we propose three modifications of the rejection sampling algorithm to ensure data-independent runtime. The first method, which requires the weakest assumptions, fixes the number of iterations independent of the dataset, based on a worst-case acceptance probability. This method has a constant runtime, but there is a small probability that a sample is not accepted, and we quantify the additional privacy cost. The second method is based on the memoryless property of the geometric distribution, and introduces an additive random wait-time. This approach however requires the integrating constant of the target distribution corresponding to the current database, as well as the acceptance probability of a worst-case database, which is often not realistic. The third method avoids this by using instead upper and lower bounds for the target densities of all databases, chosen so that the ratio of the area for the upper and lower bounds is constant across databases. Finally, we propose an adaptive rejection sampler with data-independent runtime, which is a modification of the (nearly) minimax optimal sampler of <ref type="bibr">Achddou et al. (2019)</ref>. Our adaptive sampler is entirely automated, and only requires that the family of target densities is log-H&#246;lder with fixed and known parameters</p><p>We show in Section 5 that many commonly studied privacy mechanisms satisfy the assumptions of our methods allowing for our privacy-preserving rejection samplers to be applied.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Constant runtime, truncated rejection sampling</head><p>One clear way to remove the privacy leak due to the runtime is to choose a number of iterations independent of the database, based on a worst-case estimate of the acceptance probability across all databases. We then run the sampler for that many iterations, and publish one of the accepted samples. In this case, the runtime is fixed, and does not leak any privacy. However, it is not guaranteed that an accepted sample is found within the pre-determined number of iterations, and the probability of this event does depend on the database. This probability can be reduced by increasing the number of iterations, but this also increases the runtime of the algorithm.</p><p>Of the methods we propose, the algorithm in Proposition 18 requires the weakest assumptions in that the only knowledge we require is a lower bound on the acceptance probability across the databases. However, there is a small probability that no samples are accepted in the prescribed number of iterations, which negatively impacts both the privacy and the utility of the mechanism. Proposition 18 characterizes the increased cost to privacy of the truncated sampler in terms of (&#491;, &#948;)-DP.</p><p>Proposition 18 Let {M D | D &#8712; D} be a family of privacy mechanisms satisfying (&#491; 0 , &#948; 0 )-DP and (U D , c D ) be such that &#960; D &#8804; c D U D where &#960; D is an unnormalized density for M D . Assume that &#945; 0 &#8804; 1/c D &#960; D (x) dx for all D, that is, &#945; 0 is a lower bound on the acceptance probability in the rejection sampler across all databases. Given &#948; &gt; 0, run the sampler for N = log(1/&#948;) log(1/(1-&#945; 0 )) iterations. If there is an accepted proposal, publish the first one; if not, publish an arbitrary output (such as one more draw from the proposal). Releasing the output as well as the runtime of this algorithm satisfies (&#491; 0 , &#948; 0 + &#948;)-DP.</p><p>Proof First note that the runtime is constant for all D, so there is no privacy leak there. Next, note that conditional on the event that an accepted proposal is found, there is no additional privacy leak. So, we need to determine the probability that an accepted proposal is not found:</p><p>By itself, simply publishing whether a sample is accepted or not satisfies (0, &#948;)-DP. By postprocessing (Proposition 5), we can upper bound the privacy cost by instead considering if we observe both an output from M D as well as whether the algorithm has accepted or rejected a sample. This is a composition of an (&#491; 0 , &#948; 0 )-DP mechanism with a (0, &#948;)-DP mechanism. By composition <ref type="bibr">(Dwork et al., 2014, Theorem 3.16</ref>) the result satisfies (&#491; 0 , &#948; 0 + &#948;)-DP.</p><p>A benefit of the algorithm in Proposition 18 is that it can be vectorized and is easily parallelized. Another benefit is that N grows only in the log of 1/&#948;. By increasing the number of iterations N , the increased &#948; can be reduced exponentially. The two major downsides are that the algorithm must be run much longer than a simple rejection sampler, and that it is not guaranteed that an accepted sample is found, which reduces both the privacy and utility. If no samples are accepted, then the output does not follow the correct distribution, introducing error in the sampling approximation. We see that we are able to remove the runtime side-channel, but at the cost of a small "delta" and loss of utility. In the next two subsections, we show that with slightly stronger assumptions, we are able to obtain both perfect sampling as well as data-independent runtime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Additive geometric wait-time</head><p>In this section, we use the memoryless property of the geometric distribution to introduce an additive wait time based on a lower bound on the acceptance probability. The result is that the runtime of the algorithm is geometric with acceptance rate equal to the worst-case dataset (or a lower bound on the acceptance probability).</p><p>The benefit of this method over the truncated rejection sampler is that a sample from the correct distribution is guaranteed, and the runtime is still independent of the database. The downside is that the acceptance probability (or equivalently the integrating constant) for the present database is required as well as a bound on the worst-case acceptance probability. Typically, rejection samplers do not assume that the integrating constant is known, however for low dimensional problems (e.g., &#8804; 3), it may be possible to numerically evaluate the integral.</p><p>Lemma 19 illustrates the memoryless property of the geometric distribution. Given a simple rejection sampler with acceptance probability q, we can add a random wait time to result in a total runtime that is distributed as Geom(p) for p &#8804; q. So, across databases, we can make all of the runtimes equal in distribution, calibrated to a worst-case acceptance probability.</p><p>Lemma 19 Let 0 &lt; p &#8804; q &lt; 1. Given X 2 &#8764; Geom(q), set X 1 = X 2 with probability p/q and otherwise X 1 = X 2 + &#8710;, where &#8710; &#8764; Geom(p). Then X 1 &#8764; Geom(p).</p><p>Proof Let t &#8712; {1, 2, . . . , &#8734;}. Then</p><p>which is the pmf of Geom(p), as desired. To achieve the last line in the equations, we used the partial sum formula for a geometric series, and simplified the result.</p><p>Theorem 20 Let D &#8712; D be a database and {&#960; D | D} be the normalized target densities. Assume that for each &#960; D , we have normalized densities U D (x) as well as constants c D such that for all x, &#960; D (x) &#8804; c D U D (x). Suppose we know a constant c satisfying c &#8805; sup D c D . Consider the following scheme:</p><p>1. Run a rejection sampler, proposing from U D (x) and targeting &#960; D (x) until acceptance 2. Call the accepted sample X. Also draw Y &#8764; Unif(0, 1).</p><p>3. If Y &lt; c D /c, publish X, else wait for Geom(1/c) cycles before publishing X.</p><p>Then X &#8764; &#960; D , and the wait time follows Geom(1/c), which does not depend on D.</p><p>As compared to the truncated rejection sampler of Section 4.1, Theorem 20 offers a perfect sampler with data independent runtime. This is ideal as there is no loss to either privacy or utility through either approximate samples or a runtime side-channel. However, the downside of this method is that the acceptance probability for the current database must be known. Assuming that the proposal is normalized, this is equivalent to knowing the integrating constant for the target . While this may not be too cumbersome for low-dimensional settings, it becomes computationally intractable for high-dimensional distributions. In the next section, we give an alternative set of assumptions to remove the requirement of the integrating constant.</p><p>Remark 21 A similar alternative to Theorem 20 is as follows: during each step of the rejection sampler, if a sample is accepted, then with probability c D /c report the sample, and with probability 1 -c D /c do not report the sample. This results in the same runtime as Theorem 20.</p><p>We remark that this alternative algorithm has a similar flavor to the randomized response mechanism, one of the oldest privacy mechanisms <ref type="bibr">(Warner, 1965)</ref>. While beyond the scope of this paper, it may be worth investigating whether there is any deeper connection between this privacy-aware rejection sampler and randomized response. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Implicit wait-time via squeeze function</head><p>In this section, we propose another method of producing an exact rejection sampler with dataindependent runtime. Our method, described in Algorithm 1 and Theorem 22, avoids the need for the normalizing constant as in Theorem 20 by instead using a carefully tailored squeeze function. In the rejection sampling literature, a squeeze function is a lower bound on the target density which is assumed to be easy to evaluate, and which is used to speed up the computational time by avoiding evaluations of the target density when a proposed sample lies under the squeeze function (i.e. is rejected by the squeeze function, see Example 25). However, in this section, we will use the squeeze function not to speed up the computational time, but to slow it down; this will enable us to make the runtime equally distributed as in a worst-case setting.</p><p>For this method, we assume that for each unnormalized target density &#960; D , we have normalized densities U D (x) and L D (x) as well as constants c L,D and c U,D such that for all</p><p>and such that the ratio c L,D /c U,D does not depend on D. Note that the latter condition is easy to enforce: if c L,D and c U,D are two valid constants, then so are c * L,D &lt; c L,D and c * U,D &gt; c U,D . We then choose the value X s that we publish based on the rejection sampler that targets &#960; D from U D , but do not publish the sample until a value is accepted from L D (i.e. the proposal lies under the squeeze function c L,D L D : see Example 25). Because of this modification, the runtime is determined only by the ratio c L /c U , which is assumed to be constant across databases. Thus, there is no additional privacy cost to using this sampler, since we get an exact sample with runtime independent of D. This method is similar to that of Section 4.2 in that there is an additive wait-time, but Algorithm 1 is able to do this implicitly, without knowing the acceptance probability for the current database. In Proposition 23, we show that the assumptions of Theorem 22 are strictly weaker than those of Proposition 20. Algorithm 1 Privacy-aware rejection sampling via squeeze functions X) and anyAccepted==FALSE then 5:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>9:</head><p>Publish Xs 10: else 11:</p><p>Go to 2.</p><p>12: end if X) , where X &#8764; U D (x) and Y &#8764; Unif(0, 1). This is a simple rejection sampler, and so conditional on acceptance, X &#8764; &#960; D . However, a sample is not published until X) . This is a rejection sampler targeting L D (X), using the proposal U D (X) and threshold c U,D /c L,D . As such, the number of iterations is Geom(c L,D /c U,D ), which by assumption does not depend on D.</p><p>While the assumption of the squeeze functions in Theorem 22 may seem unintuitive, it is in fact strictly weaker than knowing the integrating constant for &#960; D , as was required in Section 4.2, as shown in Proposition 23. In Section 4.4 and 5 we show that there are several natural instances of the exponential mechanism where the assumptions of Theorem 22 are satisfied. </p><p>Then, the runtime of Algorithm 1 is geometric with parameter (c L,D /c U,D ) = 1/c, and the output of Algorithm 1 has the appropriate distribution as argued in the proof of Theorem 22.</p><p>In fact, the application of Algorithm 1 described in Proposition 23 is very similar to the variation of Theorem 20 described in Remark 21.</p><p>Remark 24 Proposition 23 showed that the assumptions for the squeeze functions in Theorem 22 are actually strictly weaker than the assumptions needed in Section 4.2. Furthermore, it can be seen that the assumptions of Theorem 22 (assuming that we can evaluate the constant c L,D /c U,D ) are strictly stronger than knowing the worst-case acceptance probability, which is needed for the truncated sampler of Section 4.1 -this is because the ratio c L,D /c U,D is itself a lower bound on the worst-case acceptance probability.</p><p>Example 25 Figure <ref type="figure">3</ref> is an illustration of how Algorithm 1 works. We see an example of a target &#960;, which satisfies c L L(x) &#8804; &#960;(x) &#8804; c U U (x) for constants c L , c U , a proposal function U and squeeze function L. The points (x i , y i ) are sequentially drawn uniformly within the area under c U U ; equivalently, x i &#8764; U (x) and y i = u i &#8226; c U U (x), where u i iid &#8764; Unif(0, 1). Algorithm 1 processes these samples as follows: For the first pair, y 1 &gt; &#960;(x 1 ) so the sample is rejected. The second sample satisfies y 2 &#8804; &#960;(x 2 ) so it is accepted (set X s = x 2 ), but because y 2 &gt; c L L(x 2 ) it is not published yet. The third sample is rejected since y 3 &gt; &#960;(x 3 ). The fourth sample satisfies y 4 &#8804; &#960;(x 4 ), but since we already accepted x 2 , we do not update X s . Since y 4 &gt; c L L(x 4 ) we still do not publish anything yet. Finally, y 5 &#8804; c L L(x 5 ) so we publish X s = x 2 .</p><p>As noted in Theorem 22, the procedure results in X s &#8764; &#960;, but the runtime is distributed as Geom(c L /c U ), which does not directly depend on &#960;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Adaptive rejection sampler for log-H&#246;lder densities</head><p>The previous three subsections proposed modifications to simple rejection samplers in order to remove the runtime side-channel. In this section, we use the squeeze method of Section 4.3 to develop an adaptive rejection sampler with data-independent runtime for log-H&#246;lder densities. Our method, outlined in Algorithm 2, is entirely black box, requiring only H&#246;lder parameters (s, H) that hold for every database, and is a modification of the (nearly) minimax optimal sampler of <ref type="bibr">Achddou et al. (2019)</ref>. Let &#960; D (x) &#8733; exp(g D (x)) be an unnormalized target density on a bounded convex set C, where g D is (s, H)-H&#246;lder for all datasets D: |g D (x) -g D (y)| &#8804; H x -y s for all D and for all x, y &#8712; C. This setup differs from <ref type="bibr">Achddou et al. (2019)</ref>, who assume that the target itself is H&#246;lder, rather than the log-target. This difference is important in order to derive upper and lower bounds that satisfy a property similar to Theorem 22. We point out in Remark 29 that the log-H&#246;lder assumption, with the same s and H across all datasets, is natural for many privacy mechanisms, and many instances of the exponential mechanism in the literature satisfy this assumption.</p><p>At a high-level, given evaluations of g D (x) at a finite set of locations, the log-H&#246;lder assumption allows us to construct piecewise-constant upper and lower bounds on g D (x) and therefore the target density. Importantly, these bounds can be constructed so that the ratio of their associated normalization constants is independent of the database D. Then, in the fashion of Algorithm 1, by proposing from the upper bound, and stopping only on accepting from the lower bound, we can have a database-independent runtime. Following each proposal, we add a new location to our set of evaluations of g D (x), tightening the lower and upper bounds, and ensuring the acceptance probability increases each iteration. We describe these steps in detail in Algorithm 2.</p><p>Theorem 26 Let D be a space of databases and { &#960; D = exp(g D ) | D} be the unnormalized target densities, which have support on a bounded convex set C. Suppose that for all D, g D is (s, H)-H&#246;lder with norm &#8226; on C. Then Algorithm 2 results in N i.i.d. samples from &#960; and has runtime between published samples which does not depend on D. If the mapping P T and the update procedure to generate Z are chosen in a way that sup x&#8712;C x -P T (x) &#8594; 0, then the probability of publishing an accepted sample in a given iteration converges to 1.</p><p>Proof The quantity r is an upper bound on the maximum difference between g D (x) and &#285;(x), by the H&#246;lder assumption. So, at any point in the algorithm, since g D is H&#246;lder, and by the definition of r, we have that exp(&#285;(x) -r) &#8804; exp(g D (x)) &#8804; exp(&#285;(x) + r). Define &#285;(x) = g(P T (x)) for all x &#8712; C (note that this only requires evaluations of g from S)</p><p>5:</p><p>Set r &#8805; sup x&#8712;C H x -P T (x) s</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6:</head><p>Sample X &#8764; exp(&#285;(x))/( C exp(&#285;(x)) dx)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>7:</head><p>Sample Y &#8764; Unif(0, 1)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>8:</head><p>if Y &#8804; exp(g(X))/ exp(&#285;(X) + r) and anyAccepted=FALSE then</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>9:</head><p>Set Xs = X 10:</p><p>Set anyAccepted=TRUE 11:</p><p>end if</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>12:</head><p>if Y &#8804; exp(-2r) then</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>13:</head><p>Publish Xs and append Xs to publishedSamples 14:</p><p>Increment numSamples by 1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>15:</head><p>Set anyAccepted=FALSE 16:</p><p>end if</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>17:</head><p>Choose Z &#8712; C \ T either randomly or deterministically based on only T , H and s</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>18:</head><p>Append (Z, g(Z)) to S</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>19:</head><p>Append Z to T 20: end while OUTPUT: publishedSamples, which can be published in a stream</p><p>The probability of publishing a sample is exp(-2r). So, as long as x -P T (x) decreases as more samples Z are appended to T , we have r &#8594; 0 and thus the probability of publishing an accepted sample converges to 1.</p><p>Because the update step and the rejection step are separated, we can think about the best way to update the proposal function. Our goal should be to reduce r as quickly as possible. A simple, but naive solution would be to sample Z uniformly on C. Another approach would be to choose a sequence of (x i ) &#8734; i=1 such that for any N , the subset (x 1 ) N i=1 consists of approximately equally spaced points in C. This could be done intelligently using sequential space-filling experimental designs (e.g., <ref type="bibr">Crombecq and Dhaene, 2010;</ref><ref type="bibr">Pronzato and M&#252;ller, 2012)</ref>. For example, a greedy maximin solution would be to choose z = arg sup z&#8712;C H z -P T (z) s <ref type="bibr">(Pronzato and M&#252;ller, 2012)</ref>, which maximizes the publishing probability for the next iteration. Computing the maximin solution may be possible in low-dimensions, but becomes expensive in high dimensional spaces.</p><p>As in <ref type="bibr">Achddou et al. (2019)</ref>, we can make the adaptive sampler much easier to implement by considering the following special case of Algorithm 2: 1) use the &#8467; &#8734; norm in the H&#246;lder definition, 2) set C = [0, 1] d , 3) approximate the nearest neighbor calculation P T (y) on a grid, as described in <ref type="bibr">Achddou et al. (2019, Definition 4)</ref>. These modifications make the construction, evaluation, and sampling of the proposal exp(&#285;) computationally efficient, even in high dimensions. The acceptreject steps (lines 6-16) and the update steps (lines 17-19) can be done in batches to avoid updating the function &#285; too often, when it will not significantly improve the acceptance probability.</p><p>Remark 27 (Relative runtime) We consider how the runtime of Algorithm 2 compares to a similar sampler without the privacy constraint. Recall that the acceptance probability of our sampler is exp(-2r). If we use the same proposal distribution, but base the acceptance criteria solely on the target, then the acceptance probability depends on the target. For a typical target density, we expect that the acceptance probability is approximately exp(-r). If this is the case, then as r &#8594; 0, the ratio of the rejection probabilities is</p><p>where we use a series expansion of exp(-x) about zero to evaluate the limit. This suggests that the cost of privacy is that the rejection probability is about double that of the non-private sampler.</p><p>Another cost of the privacy-preserving adaptive sampler is the decoupling of the rejection and update steps. Roughly, we will need to evaluate g D twice as often-one for the update and one for the accept/reject step-as compared to non-private adaptive samplers, such as in <ref type="bibr">Achddou et al. (2019)</ref>. This additional cost is somewhat mitigated by the fact that the update points can be chosen in a more intelligent manner, potentially improving the rate of convergence of the proposal.</p><p>Example 28 We illustrate Algorithm 2 applied to the target &#960; = exp(g(x)), where g is the 7-Lipschitz function g(x) = -3|x -1/2| + (1/5) sin(20x), so H = 7 and s = 1. The update and sampling steps of Algorithm 2 are run in batches. First, 5 equally spaces points are used to approximate exp(g(x)) by a piece-wise linear function exp(&#285;(x)). The upper bound is exp(&#285;(x) + r) and the lower bound is exp(&#285;(x) -r). The top left plot of Figure <ref type="figure">4</ref>, illustrates each of these functions. Next, five points (x i , y i ) for i = 1, . . . , 5 are sampled uniformly within the area under exp(&#285;(x) + r), as seen in the top right plot of Figure <ref type="figure">4</ref>. The first value y 1 is below exp(g(x 1 )), but not below exp(&#285;(x 1 ) -r), so we set X s = x 1 and set anyAccepted = TRUE, but do not publish X s yet. We reject x 2 . Then as y 3 &#8804; exp(&#285;(x 3 ) -r), we publish X s = x 1 , and set anyAccepted = FALSE. We reject both x 4 and x 5 .</p><p>After this, we update the approximation &#285; using 15 equally spaced points. This grid is a superset of the 5-point grid, so we can reuse the previous evaluations. The new approximation and bounds are shown in the bottom left plot of Figure <ref type="figure">4</ref>. Then (x i , y i ) for i = 6, . . . , 10 are sequentially sampled uniformly from the area under exp(&#285;(x) + r), illustrated in the bottom right plot of Figure <ref type="figure">4</ref>. As exp(&#285;(x 6 ) -r) &lt; y 6 &#8804; exp(g(x 6 )), we set X s = x 6 and anyAccepted = TRUE, but do not publish X s . Since y 7 &#8804; exp(&#285;(x 7 ) -r), we publish X s = x 6 at this time and set anyAccepted = FALSE. Then since y 8 &#8804; exp(&#285;(x 8 ) -r), we immediately publish x 8 . We reject x 9 . Last, as y 10 &#8804; exp(&#285;(x 10 ) -r), we also publish x 10 .</p><p>Note that by using equally spaced points, r converges to zero rapidly, illustrating the benefit of using deterministically chosen points in the construction of &#285;.</p><p>Remark 29 There are several prior DP works on the exponential mechanism, where the utility function is assumed to be Lipschitz (a special case of H&#246;lder), and where Algorithm 2 can be applied. <ref type="bibr">Minami et al. (2016)</ref> assume Lipschitz and concave utility functions. <ref type="bibr">Bassily et al. (2014a)</ref> and <ref type="bibr">Bassily et al. (2014b)</ref> derive optimal DP mechanisms under the assumption of Lipschitz and convex empirical risk objective functions, as well as a bounded domain, which result in implementations of the exponential mechanism. In part of their work, Ganesh and Talwar (2020) assume Lipschitz and L-smooth utility functions in the exponential mechanism.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Application to exponential mechanism sampling</head><p>In this section, we explore some instances of the exponential mechanism that satisfy the assumptions of the rejection samplers proposed in Section 4, allowing for a privacy-preserving implementation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Strongly concave and L-smooth log-density</head><p>We first consider instances of the exponential mechanism where the utility function g D is both strongly concave and L-smooth. These are the same properties that Ganesh and Talwar (2020) assume. Both <ref type="bibr">Awan et al. (2019)</ref> and <ref type="bibr">Minami et al. (2016)</ref> assume strongly concave utility functions in the exponential mechanism. Other private empirical risk minimization works, while not working directly with the exponential mechanism, also commonly assume L-smooth and strong concavity <ref type="bibr">(Kifer et al., 2012;</ref><ref type="bibr">Bassily et al., 2014a,b)</ref>.</p><p>Under the strongly concave and L-smooth assumptions, we are able to derive upper and lower bounds for the target, which satisfy the requirements of Theorem 22.</p><p>Lemma 30 Let &#960; D (x) &#8733; exp(g D (x)) be the (unnormalized) target density, where g D : R d &#8594; R is twice-differentiable, &#945;-strongly concave, and L-smooth. Call x * D := arg max x g D (x). Using &#966; d (x; &#181;, &#931;) to denote the pdf of N d (&#181;, &#931;), we have for all x, Proof By strong concavity, we have that</p><p>Including the integrating constant for a multivariate normal distribution gives the upper bound. Next, since g D is L-smooth, we have that</p><p>where x is between x * D and x, we used the fact that &#8711;g D (x * D ) = 0, and that the eigenvalues of &#8711; 2 g D are upper bounded by L. This implies that exp</p><p>giving the lower bound.</p><p>Given the bounds in Lemma 30, we can now implement the squeeze-function rejection sampler of Section 4.3, since c L,D /c U,D does not depend on D. As discussed in Proposition 23, generating these bounds is strictly easier than computing the integrating constant for the target, which is not needed in Lemma 30.</p><p>We could also implement the truncated sampler of Section 4.1, by using the bound (&#945;/L) d/2 on the worst-case acceptance probability. However, since Theorem 22 is applicable, the truncated sampler is strictly worse as it incurs a price both in privacy as well as utility, whereas the squeeze sampler produces perfect samples.</p><p>There are many natural problem settings that fit the assumptions of Lemma 30, particularly in empirical risk minimization.</p><p>Example 31 (Strongly convex empirical risk minimization) Suppose that the database can be written as a vector D = (d 1 , . . . , d n ), where d i is the contribution of individual i. Take as our utility function g D (x) = -( n i=1 &#8467;(x; d i ) + r(x)), where &#8467;(x; d) is a twice-differentiable convex function which is L-smooth and satisfies sup d,d &#8242; sup x |&#8467;(x; d) -&#8467;(x; d &#8242; )| &#8804; &#8710;, and r(x) is an &#945;strongly convex regularizer, which does not depend on the database D. For instance, we could take r(x) = &#945; 2 x 2 2 . Then the exponential mechanism samples from &#960; D (x) &#8733; exp( &#491; 2&#8710; g D (x)) and satisfies &#491;-DP.</p><p>Note that g D is nL-smooth and &#945;-strongly concave for all D, so it fits the framework of Lemma 30. Such a setup is common in private empirical risk minimization <ref type="bibr">(Kifer et al., 2012;</ref><ref type="bibr">Bassily et al., 2014a,b)</ref>, and in particular for private regression problems <ref type="bibr">(Kifer et al., 2012;</ref><ref type="bibr">Reimherr and Awan, 2019;</ref><ref type="bibr">Awan and Slavkovi&#263;, 2020)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">K-norm gradient mechanism</head><p>An alternative to simply applying the exponential mechanism to a strongly concave utility function is the K-norm gradient mechanism (KNG), proposed in <ref type="bibr">Reimherr and Awan (2019)</ref>, also known as the gradient mechanism <ref type="bibr">(Asi and Duchi, 2020a)</ref>. KNG has been applied to applications such as geometric median estimation, and linear and quantile regression <ref type="bibr">(Reimherr and Awan, 2019;</ref><ref type="bibr">Asi and Duchi, 2020a)</ref>. Given an objective function g D (x), KNG samples from &#960; D (x) &#8733; exp(-&#491; 2&#8710; &#8711;g D (x) K ), where &#8710; &#8805; sup d(D,D &#8242; )&#8804;1 sup x &#8711;g D (x) -&#8711;g D &#8242; (x) K , and where &#8226; K is a chosen norm.</p><p>While the exponential mechanism with a strongly concave utility is naturally approximated by a Gaussian distribution <ref type="bibr">(Awan et al., 2019)</ref>, KNG is closely related to the K-norm distributions <ref type="bibr">(Reimherr and Awan, 2019)</ref>. The K-norm mechanism was introduced in <ref type="bibr">Hardt and Talwar (2010)</ref>, and were also studied in <ref type="bibr">Awan and Slavkovi&#263; (2020)</ref>. where we used the fact that &#8711;g D (x * D ) = 0 and Cauchy-Schwartz inequality. This implies that &#8711;g D (x) 2 &#8805; &#945; x -x * D 2 , which gives the upper bound. Next, as g D is L-smooth, we have that</p><p>which gives the lower bound.</p><p>Lemma 33 provides bounds that can be used to implement the sampler of Section 4.3. The acceptance probability when targeting the lower bound is (&#945;/L) d , which is independent of D, as required. While we could implement the sampler of Section 4.1, as (&#945;/L) d is a bound on the worstcase acceptance probability, this method is strictly worse than the squeeze sampler, as discussed in Section 5.1. The empirical risk problems of Example 31 are also applicable to KNG, and offer several natural instances that satisfy the assumptions of Lemma 33.</p><p>Finally, note that for the KNG mechanism, if the underlying utility is L-smooth (not necessarily strongly concave), then the log-density is L-Lipschitz. As such, we can apply the adaptive rejection sampler of Section 4.4. If multiple i.i.d. samples are required, this can provide a very computationally efficient sampling method, while keeping the runtime data-independent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>In this paper, we first characterized the privacy cost due to the runtime of both simple and adaptive rejection samplers in terms of &#491;-DP, (&#491;, &#948;)-DP, and f -DP. We found that the runtime of standard rejection samplers can result in an arbitrary increase in the privacy cost, motivating the need for privacy-aware samplers. We then proposed three novel modifications to simple rejection samplers with varying assumptions, which all resulted in data-independent runtime. We also developed a privacy-aware adaptive rejection sampler for log-H&#246;lder densities.</p><p>There are three factors that influence the practicality of our algorithms, (1) the scalability of rejection sampling: Typically, the acceptance probability of a rejection samplers decays exponentially with data dimension, making them impractical for very high dimensional problems. However, imposing additional structure like log-concavity or log-H&#246;lder on the target density, adaptive rejection samplers (like our proposed one) can be applicable to higher-dimensional problems. Such structural assumptions, as well as low-to moderate-dimensional problems are common in differential privacy applications. (2) the additional cost of our differentially-private modifications of rejection sampling: Our algorithms typically result in a reduction in the acceptance probability to match the worst-case dataset. This is unavoidable. However, our adaptive rejection sampler does not require any knowledge of this worst-case database. As such, the sample complexity of the runtime is the same as for a regular rejection sampler, but where the acceptance probability is the worst case. (3) The additional book-keeping overhead in implementing our differentially private rejection samplers: All of our algorithms are minor modifications of existing simple or adaptive rejection samplers, and as such, this overhead is minimal.</p><p>Of our proposed modifications to the rejection sampler, the squeeze method of Section 4.3 is the most powerful. We showed in Section 5 that for many instances of the exponential mechanism, appropriate upper and lower bounds can be generated. Furthermore, our adaptive sampling scheme is also built on the squeeze sampler. In a way, Algorithm 1 can be viewed as a coupling of the sampler applied to the present database and a worst-case database. It is an open question whether similar couplings could be developed with even weaker assumptions.</p><p>An alternative to rejection sampling is coupling from the past (CFP) <ref type="bibr">(Propp and Wilson, 1998)</ref>, a modified MCMC approach. The benefit of CFP is that it is another perfect sampler, and could be a useful technique in designing privacy-aware samplers. The techniques used in this paper may be useful for determining the privacy cost of timing channel attacks on CFP and developing CFP algorithms with data-independent runtime. A variation on CFP is perfect tempering, which also results in perfect samplers <ref type="bibr">(M&#248;ller and Nicholls, 1999;</ref><ref type="bibr">Daghofer and von der Linden, 2004;</ref><ref type="bibr">Brooks et al., 2006)</ref>, and may be an another approach to developing privacy-aware samplers.</p><p>While in this paper we developed samplers whose runtime does not depend on the dataset, one could instead ask for the runtime to be differentially private by itself. Theorem 10 shows that a naive rejection sampler does have an inherent privacy cost, but one could also imagine altering the runtime to give a stronger privacy guarantee. A significant challenge with this approach is that we can only increase, but not reduce, the runtime. Due to this constraint, many existing DP techniques are not applicable. We leave it for future research to investigate mechanisms to privatize the runtime.</p></div></body>
		</text>
</TEI>
