<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>On the Rényi Differential Privacy of the Shuffle Model</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>11/12/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10313175</idno>
					<idno type="doi">10.1145/3460120.3484794</idno>
					<title level='j'>ACM Symposium on Computer and Communication Security (CCS)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Antonious M. Girgis</author><author>Deepesh Data</author><author>Suhas Diggavi</author><author>Ananda Theertha Suresh</author><author>Peter Kairouz</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The central question studied in this paper is Rényi Differential Privacy (RDP) guarantees for general discrete local randomizers in the shuffle privacy model. In the shuffle model, each of the 𝑛 clients randomizes its response using a local differentially private (LDP) mechanism and the untrusted server only receives a random permutation (shuffle) of the client responses without association to each client. The principal result in this paper is the first direct RDP bounds for general discrete local randomization in the shuffle pri- vacy model, and we develop new analysis techniques for deriving our results which could be of independent interest. In applications, such an RDP guarantee is most useful when we use it for composing several private interactions. We numerically demonstrate that, for important regimes, with composition our bound yields an improve- ment in privacy guarantee by a factor of 8× over the state-of-the-art approximate Differential Privacy (DP) guarantee (with standard composition) for shuffle models. Moreover, combining with Pois- son subsampling, our result leads to at least 10× improvement over subsampled approximate DP with standard composition.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>an algorithm to enable such privacy. Originally DP was studied in the centralized context, where the privacy from queries to a trusted server holding the data was the objective <ref type="bibr">[16]</ref>. However, in distributed applications, such as federated learning <ref type="bibr">[30]</ref>, two significant aspects need to be accommodated: (i) data is held locally at clients and needs to be used for computation with an untrusted server; and (ii) to build good learning models, one might need repeated interactions (e.g., through distributed gradient descent).</p><p>To accommodate privacy of locally held data, a more appropriate notion is that of local differential privacy (LDP) <ref type="bibr">[15,</ref><ref type="bibr">34]</ref>. In the LDP framework, each (distributed) client holding local data, individually randomizes its interactions with the (untrusted) server. <ref type="foot">1</ref>Recently, such LDP mechanisms have been deployed by companies such as Google <ref type="bibr">[23]</ref>, Apple <ref type="bibr">[29]</ref>, and Microsoft <ref type="bibr">[14]</ref>. However, LDP mechanisms suffer from poor performance in comparison with the centralized DP mechanisms, making their applicability limited <ref type="bibr">[15,</ref><ref type="bibr">31,</ref><ref type="bibr">34]</ref>. To address this, a new privacy framework using anonymization has been proposed in the so-called shuffle model <ref type="bibr">[7,</ref><ref type="bibr">13,</ref><ref type="bibr">22]</ref>, where each client sends her (randomized) interaction message to a secure shuffler that randomly permutes all the received messages before forwarding them to the server. Such a shuffling can be enabled through anonymization techniques <ref type="bibr">[10,</ref><ref type="bibr">20,</ref><ref type="bibr">22]</ref>. This model enables significantly better privacy-utility performance by amplifying LDP through this mechanism.</p><p>For the second aspect, where there are repeated interactions (e.g., through distributed gradient descent), one needs privacy composition <ref type="bibr">[9]</ref>. In other words, we want to compute the overall privacy budget under the composition of multiple iterations. Clearly, from an optimization viewpoint, we might need to run these interactions longer for better models, but these also result in privacy leakage. Though the privacy leakage can be quantified using advanced composition theorems for DP (e.g., <ref type="bibr">[19,</ref><ref type="bibr">33]</ref>), these might be loose. To address this, Abadi et al. <ref type="bibr">[1]</ref> developed a "moments accountant" framework, which enabled a much tighter composition. This is enabled by providing the composition privacy guarantee in terms of R&#233;nyi Differential Privacy <ref type="bibr">[36]</ref>, and then mapping it back to the DP guarantee <ref type="bibr">[37]</ref>. It is known <ref type="bibr">[1]</ref> that the moments accountant provides a significant saving in the total privacy budget in comparison with using the strong composition theorems <ref type="bibr">[19,</ref><ref type="bibr">33]</ref>. Therefore, developing the RDP privacy guarantee can enable stronger composition privacy results. Analyzing the RDP of the shuffle model could have several applications such as private statistics using interactive schemes for heavy hitters, mean estimation, federated learning, and distributed differentially private stochastic gradient descent (DPSGD). This leads us to the central question addressed in this paper:</p><p>Can we develop strong RDP privacy guarantees for general local mechanisms in the shuffle privacy model? The principal result in this paper is the first direct RDP guarantee for general discrete local randomization mechanisms in the shuffle privacy model. In particular, given an arbitrary discrete local mechanism with 0 -LDP guarantee, we provide an RDP guarantee for the shuffle model, as a function of 0 and the number of users ; see Theorem 3.1. This can be seen as an amplification by shuffling result for amplifying pure LDP guarantee to RDP guarantee via shuffling. In contrast, the existing amplification by shuffling results <ref type="bibr">[7,</ref><ref type="bibr">22,</ref><ref type="bibr">24]</ref> amplify pure LDP guarantee to approximate DP guarantee.</p><p>When numerically evaluating our bound, we save a factor of 8&#215; compared to the state-of-the-art approximate DP guarantee for shuffle models in <ref type="bibr">[24]</ref> <ref type="foot">foot_1</ref> combined with strong composition, with the number of iterations = 10 5 , LDP parameter 0 = 0.5, and number of clients = 10 6 ; see Figure <ref type="figure">4a</ref> in Section 4 for such example regimes. Furthermore, characterizing the RDP of the shuffle model enables us to compute the RDP of shuffling with Poisson sub-sampling by using the results in <ref type="bibr">[43]</ref>. We numerically show that this approach can lead to at least 10&#215; improvement in privacy guarantee. This is for = 10 4 , 0 = 3, and = 10 6 . The comparison is with applying the strong composition theorem <ref type="bibr">[33]</ref> after getting the state-of-the-art approximate DP of the shuffle model given in <ref type="bibr">[24]</ref> with Poisson sub-sampling <ref type="bibr">[35]</ref> (see Figure <ref type="figure">5a</ref> in Section 4 for more such regimes). This in turn implies that we can accommodate at least 10&#215; more interactions for the same privacy budget in these cases. Moreover, our upper bounds also give several orders of magnitude improvement over the simple RDP bound stated in <ref type="bibr">[21,</ref> Remark 1] (also stated in <ref type="bibr">(9)</ref>) in several regimes (see, for example, Figure <ref type="figure">2e</ref> in <ref type="bibr">Section 4)</ref>. We also develop a lower bound for the RDP for the shuffle model and numerically demonstrate that the gap is small for many parameter regimes of interest.</p><p>In order to obtain our upper bound result, we develop new analysis techniques which could be of independent interest. In particular, we develop a novel RDP analysis for neighboring datasets with a special structure (see Theorem 3.7), in which one of the datasets has all the data points to be the same (see the definition in <ref type="bibr">(12)</ref>). A key technical result is then to relate the RDP of general neighboring datasets to those with special structure (see <ref type="bibr">Theorem 3.6)</ref>.</p><p>&#8226; For the RDP analysis of neighboring datasets with the abovementioned special structure, we first observe that the output distribution of the shuffling mechanism is the multinomial distribution. Using this observation, then we show that the ratio of the distributions of the mechanism on special structure neighboring datasets is a sub-Gaussian random variable (r.v.), and we can write the R&#233;nyi divergence of the shuffle mechanism in terms of the moments of this r.v. Bounding the moments of this r.v. then gives an upper bound on the RDP for the special neighboring datasets. See the proof-sketch of Theorem 3.7 in Section 3.3.2 and its complete proof in Section 6. &#8226; We next connect the above analysis to the RDP computation for general neighboring datasets D = ( 1 , . . . , ) and</p><p>To do so, a crucial observation is to write the output distribution of the local randomizer R on the 'th client's data point (for any &#8712; [ -1]) as a mixture distribution = -0 &#8242; + (1 --0 ) &#732; for some &#732; . <ref type="foot">3</ref> So, the number of clients that sample according to &#8242; is concentrated around -0 . Therefore, if we restrict the dataset to these clients only, the resulting datasets will have the special structure, and the size of that dataset will be concentrated around -0 . Finally, in order to be able to reduce the problem to the special case, we remove the effect of the clients that do not sample according to &#8242; without affecting the R&#233;nyi divergence. See the proof-sketch of Theorem 3.6 in Section 3.3.1 and its complete proof in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related Work</head><p>We give the most relevant work related to the paper and put our contributions in the context of these works.</p><p>shuffle privacy model: As mentioned, the shuffle model of privacy has been of significant recent interest <ref type="bibr">[5-8, 13, 22, 25, 26]</ref>. However, all the existing works in literature <ref type="bibr">[7,</ref><ref type="bibr">22,</ref><ref type="bibr">24]</ref> only characterize the approximate DP of the shuffle model -among these, <ref type="bibr">[24]</ref> is the state-of-the-art, but as we show in our experiments, it yields weaker results when combined with composition. To the best of our knowledge, there is no bound on RDP of the shuffle model in the literature except for the one mentioned briefly in a remark in <ref type="bibr">[22,</ref>  <ref type="bibr">Remark 1]</ref> (which is obtained by the standard conversion results from DP to RDP) and we state it in (9) for comparison. However, this bound is loose (e.g., see Figure <ref type="figure">2e</ref>) and not useful for conversion to approximate DP (e.g., see Figures <ref type="figure">3a,</ref><ref type="figure">3c</ref>), as well as for composition (e.g., see Figure <ref type="figure">4e</ref>). Thus, our work makes progress on this important open question of analyzing the RDP of the shuffle model. Both <ref type="bibr">[20]</ref> and <ref type="bibr">[27]</ref> used advanced composition to analyze privacy of shuffle models in federated learning; our results could be adapted to enhance their privacy guarantees.</p><p>R&#233;nyi differential privacy: The work of Abadi et al. <ref type="bibr">[1]</ref> provided a methodology to get stronger composition results. Inherently, this used R&#233;nyi divergence, and was later formalized in <ref type="bibr">[36]</ref> which defined R&#233;nyi differential privacy (RDP). RDP presents a unified definition for several kinds of privacy notions including pure differential privacy ( -DP), approximate differential privacy (( , )-DP), and concentrated differential privacy (CDP) <ref type="bibr">[11,</ref><ref type="bibr">18]</ref>. As mentioned earlier, RDP enables a stronger result for composition, through the "moment accounting" idea. Similarly, several works <ref type="bibr">[37,</ref><ref type="bibr">40,</ref><ref type="bibr">43]</ref> have shown that analyzing the RDP of subsampled mechanisms provides a tighter bound on the total privacy loss than the bound that can be obtained using the standard strong composition theorems. However, to the best of our knowledge, RDP analysis of the shuffle model and its use for composition in the shuffle model has not been studied. In this paper, we analyze the RDP of the shuffle model, where we can bound the approximate DP of a sequence of shuffle models using the transformation from RDP to approximate DP <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr">12,</ref><ref type="bibr">40]</ref>. We show that our RDP analysis provides a better bound on the total privacy loss of composition than that can be obtained using the standard strong composition theorems (see <ref type="bibr">Section 4)</ref>.</p><p>Discrete mechanisms: Many of the works in DP use specific randomization mechanisms, adding noise using the Laplace or Gaussian distributions. However, in many situations the data is inherently discrete (e.g., see <ref type="bibr">[12]</ref> and references therein) or compression causes it to be so (e.g., see <ref type="bibr">[27,</ref><ref type="bibr">32]</ref> and references therein). It is therefore of interest to directly analyze privacy of discrete randomization mechanisms. Such discrete mechanisms have been studied extensively in shuffle models <ref type="bibr">[5,</ref><ref type="bibr">25]</ref>, but for approximate DP. To the best of our knowledge, RDP for general discrete mechanisms in the shuffle privacy framework is new to our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Paper Organization</head><p>The paper is organized as follows. In Section 2, we give some preliminary definitions and results from the literature and also formulate our problem. In Section 3, we present our main results (two upper bounds and one lower bound on RDP), along with a proof sketch of the first upper bound. We also describe the two main ingredients in its proof -first is the reduction of computing RDP for the arbitrary pairs of neighboring datasets to computing RDP for the special pairs of neighboring datasets, and the second is computing RDP for the special pairs of neighboring datasets. In Section 4, we present several numerical results to demonstrate the advantages of our bounds compared to the state-of-the-art. The rest of the sections are devoted to the full proofs of our main results: Section 5 shows the reduction of our general problem to the special case; Section 6 proves the RDP for the special case; Section 7 proves both our upper bounds; and Section 8 proves our lower bound. In Section 9, we conclude with a short discussion. Omitted details from the proofs are provided in the appendices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">PRELIMINARIES AND PROBLEM FORMULATION</head><p>We give different privacy definitions that we use in Section 2.1, some existing results on RDP to DP conversion and RDP composition in Section 2.2, and give our problem formulation in Section 2.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Privacy Definitions</head><p>In this subsection, we define different privacy notions that we will use in this paper: local differential privacy (LDP), central differential privacy (DP), and R&#233;nyi differential privacy (RDP). Definition 1 (Local Differential Privacy -LDP <ref type="bibr">[34]</ref>). For 0 &#8805; 0, a randomized mechanism R : X &#8594; Y is said to be 0 -local differentially private (in short, 0 -LDP), if for every pair of inputs , &#8242; &#8712; X, we have</p><p>Let D = { 1 , . . . , } denote a dataset comprising points from X. We say that two datasets D = { 1 , . . . , } and D &#8242; = { &#8242; 1 , . . . , &#8242; } are neighboring (and denoted by D &#8764; D &#8242; ) if they differ in one data point, i.e., there exists an &#8712; [ ] such that &#8800; &#8242; and for every &#8712; [ ], &#8800; , we have = &#8242; . Definition 2 (Central Differential Privacy -DP <ref type="bibr">[16,</ref><ref type="bibr">17]</ref>). For , &#8805; 0, a randomized mechanism M : X &#8594; Y is said to be ( , )differentially private (in short, ( , )-DP), if for all neighboring datasets D &#8764; D &#8242; &#8712; X and every subset S &#8838; Y, we have Pr [M (D) &#8712; S] &#8804; 0 Pr M (D &#8242; ) &#8712; S + .</p><p>(</p><p>Definition 3 (R&#233;nyi Differential Privacy -RDP <ref type="bibr">[36]</ref>). A randomized mechanism M : X &#8594; Y is said to have ( )-R&#233;nyi differential privacy of order &#8712; (1, &#8734;) (in short, ( , ( ))-RDP), if for any neighboring datasets D &#8764; D &#8242; &#8712; X , the R&#233;nyi divergence between M (D) and M (D &#8242; ) is upper-bounded by ( ), i.e.,</p><p>where M (D)( ) denotes the probability that M on input D generates the output . For convenience, instead of ( ) being an upper bound, we define it as</p><p>Our objective in this paper is to characterize the R&#233;nyi differential privacy of a shuffling mechanism M (see Section 2.3 for details) for different values of order .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">RDP to DP Conversion and RDP Composition</head><p>In this subsection, we state some preliminary results from the literature that we will use. Though our main objective in this paper is to derive RDP guarantees of a shuffling mechanism, we also give the central privacy guarantees of that mechanism. For that purpose, we use the following result for converting the RDP guarantees of a mechanism to its DP guarantees. To the best of our knowledge, this result gives the best conversion. <ref type="foot">4</ref>L 2.1 (F RDP DP <ref type="bibr">[4,</ref><ref type="bibr">12]</ref>). Suppose for any &gt; 1, a mechanism M is ( , ( ))-RDP. Then, the mechanism M is ( , )-DP, where , are define below: For a given &#8712; (0, 1) :</p><p>As mentioned in Section 1, the main strength of RDP in comparison to other privacy notions comes from composition. The following result states that if we adaptively compose two RDP mechanisms with the same order, their privacy parameters add up in the resulting mechanism.  A set of all possible histograms with bins and elements (see (4))</p><p>The shuffle mechanism M on the dataset D &#8712; X ; M (D) is a distribution over A (see ( <ref type="formula">3</ref>)) (P)</p><p>Distribution over A when client maps its data point according to the distribution (see ( <ref type="formula">17</ref>)) Table <ref type="table">1</ref>: Notation used throughout the paper using convexity of the function ( -1) ( ) as follows. From [39, Corollary 2], the function ( -1) (P||Q) is convex in for any given two distributions P and Q. Thus, for any real order &gt; 1, we can bound the RDP of the shuffle model by</p><p>where = &#8968; &#8969; -, since = &#8970; &#8971; + (1 -) &#8968; &#8969; for any real . Here, &#8970; &#8971; and &#8968; &#8969; respectively denote the largest integer smaller than or equal to and the smallest integer bigger than or equal to .</p><p>In the following theorem, we also present another bound on RDP that readily holds for all &#8805; 1.</p><p>. For any &#8712; N, 0 &#8805; 0, and any &#8805; 1 (including the non-integral ), the RDP of the shuffle model is upper-bounded by</p><p>where = &#8970; -1 2 0 &#8971; + 1. We prove Theorem 3.3 in Section 7.2.</p><p>Remark 3 (Improved Upper Bounds -Saving a Factor of 2). The exponential term 0 --1 8 0 in both the upper bounds stated in ( <ref type="formula">5</ref>) and ( <ref type="formula">8</ref>) comes from the Chernoff bound, where we naively choose the factor = 1/2 instead of optimizing it; see the proof of Theorem 3.1 in Section 7.1. If we instead had optimized and chosen it to be, for example, = 2 0 0 &#8730; log( ) (which goes to 0 when, say, 0 &#8804; 1 4 log( )), we would have asymptotically saved a multiplicative factor of 2 in the leading term in both upper bounds, because in this case we have</p><p>We chose to evaluate our bound with = 1/2 because of two reasons: first, it gives a simpler expression to compute; and second, the evaluated bound does not give good results (as compared to the ones with = 1/2) for the parameter ranges of interest.</p><p>Remark 4 (Difference in Upper Bounds). Since the quadratic term in inside the log in <ref type="bibr">(8)</ref> has an extra multiplicative factor of 0 in comparison with the corresponding term in <ref type="bibr">(5)</ref>, our first upper bound presented in Theorem 3.1 is better than our second upper bound presented in Theorem 3.3 for all parameter ranges of interest; see also Figure <ref type="figure">2</ref> in Section 4. However, the expression in ( <ref type="formula">8</ref>) is much cleaner to state as well as to compute as compared to that in <ref type="bibr">(5)</ref>. As we will see later, the techniques required to prove both upper bounds are different.</p><p>Remark 5 (Potentially Better Upper Bounds for Specific Mechanisms). Since both our upper bounds are worse-case bounds that hold for all 0 -LDP mechanisms, it is possible that for specific mechanisms, we may be able to exploit their structure for potentially better bounds. See Remark 8 on this just after <ref type="bibr">(32)</ref>.</p><p>The upper bounds on the RDP of the shuffle model presented in <ref type="bibr">(5)</ref> and ( <ref type="formula">8</ref>) are general and hold for any discrete 0 -LDP mechanism. Furthermore, these bounds are in closed form expressions that can be easily implemented. To the best of our knowledge, there is no bound on RDP of the shuffle model in literature except for the one given in [21, Remark 1], which we provide below<ref type="foot">foot_4</ref> in <ref type="bibr">(9)</ref>. For the LDP parameter 0 and number of clients , they showed that for any &gt; 1, the shuffle mechanism M is ( , ( ))-RDP, where</p><p>In Section 4, we evaluate numerically the performance of both our bounds (from Theorems 3.1 and 3.3) against the above bound in <ref type="bibr">(9)</ref>. We demonstrate that both our bounds outperform the above bound in all cases; and in particular, the gap is significant when 0 &gt; 1 -note that the bound in <ref type="bibr">[22]</ref> is worse than our simplified bound given in Corollary 3.2 by a multiplicative factor of 4 0 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Lower Bound</head><p>In this subsection, we provide a lower bound on the RDP for any integer order satisfying &#8805; 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>T 3.4 (L B</head><p>). For any &#8712; N, 0 &#8805; 0, and any integer &#8805; 2, the RDP of the shuffle model is lower-bounded by:</p><p>where expectation is taken w.r.t. the binomial random variable &#8764; Bin ( , ) with parameter = 1 0 +1 . We give a proof-sketch of Theorem 3.4 in Section 8 and provide its complete proof in Appendix E.</p><p>When is an even integer, then the expectation term in ( <ref type="formula">10</ref>) is positive. When &#8805; 3 is an odd integer, then using the convexity of function ( ) = , it follows from the Jensen's inequality (i.e., E ( ) &#8805; (E )) and</p><p>Using these observations, we can safely ignore the summation term from <ref type="bibr">(10)</ref> and obtain the following simplified lower bound.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C 3.5 (S L B</head><p>). For any &#8712; N, 0 &#8805; 0, and integer &#8805; 2, the RDP of the shuffle model is lowerbounded by:</p><p>Remark 6 (Upper and Lower Bound Proofs). Both our upper bounds stated in Theorems 3.1 and 3.3 hold for any 0 -LDP mechanism. In other words, they are the worst case privacy bounds, in the sense that there is no 0 -LDP mechanism for which the associated shuffle model gives a higher RDP parameter than those stated in ( <ref type="formula">5</ref>) and <ref type="bibr">(8)</ref>. Therefore, the lower bound that we derive should serve as the lower bound on the RDP privacy parameter of the mechanism that achieves the largest privacy bound (i.e., worst privacy). We prove our lower bound result (stated in Theorem 3.4) by showing that a specific mechanism (in particular, the binary Randomized response (RR)) on a specific pair of neighboring datasets yields the RDP privacy parameter stated in the right hand side (RHS) of <ref type="bibr">(10)</ref>. This implies that RDP privacy bound (which is the supremum over all neighboring datasets) of binary RR for the shuffle model is at least the bound stated in <ref type="bibr">(10)</ref>, which in turn implies that the lower bound (which is the tightest bound for any 0 -LDP mechanism) is also at least that.</p><p>Remark 7 (Gap in Upper and Lower Bounds). When comparing our simplified upper and lower bounds from Corollaries 3.2 and 3.5, respectively, we observe that when 4 5 0 &lt; 9 , our upper and lower bounds differ by a multiplicative factor of 4 0 . In our generic upper bound <ref type="bibr">(5)</ref>, note that when is large, only the term corresponding to 2 matters, and with our improved upper bound (which saves a factor of 2 in that term asymptotically -see Remark 3), the upper and lower bounds are away by the factor of 0 , which tends to 1 as 0 &#8594; 0. Thus, in the regime of large and small 0 , our upper and lower bounds coincide. Without any constraints on , 0 , we believe that our lower bound is tight. Closing this gap by showing a tighter upper bound is an interesting and important open problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Proof Sketch of Theorem 3.1</head><p>The proof has two main steps. In the first step, we reduce the problem of deriving RDP for arbitrary neighboring datasets to the problem of deriving RDP for specific neighboring datasets, D, D &#8242; , where all elements in D are the same and D &#8242; differs from D in one entry. In the second step, we derive RDP for the special neighboring datasets. Details follow:</p><p>The specific neighboring datasets to which we reduce our general problem have the following form:  and &#8764; Bin ( -1, ) be a binomial random variable. We have:</p><p>We give a proof-sketch of Theorem 3.6 in Section 3.3.1 and provide its complete proof in Section 5.</p><p>We know (by Chernoff bound) that the binomial random variable is concentrated around its mean, which implies that the terms in the RHS of (13) that correspond to &lt; (1 -) ( -1) (we will take = 1/2) will contribute in a negligible amount. Then we show in Lemma D.1 (on page 20) that</p><p>is a non-increasing function of . These observations together imply that the RHS in ( <ref type="formula">13</ref>) is approximately upper bounded by</p><p>Since is precisely what is required to bound the RDP for the specific neighboring datasets, we have reduced the problem of computing RDP for arbitrary neighboring datasets to the problem of computing RDP for specific neighboring datasets. The second step of the proof bounds (1-) ( -1) , which follows from the result below that holds for any &#8712; N. T 3.7 (RDP S C ). Let &#8712; N be arbitrary. For any integer &#8805; 2, we have</p><p>We give a proof-sketch of Theorem 3.7 in Section 3.3.2 and provide its complete proof in Section 6.</p><p>Substituting = (1 -) ( -1) + 1 in ( <ref type="formula">14</ref>) yields the bound in Theorem 3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.3.1</head><p>Proof Sketch of Theorem 3.6. For &#8712; [ ], let denote the distribution of the 0 -LDP mechanism R when the input data point is , and &#8242; denote the distribution of R when the input data point is &#8242; . The main idea of the proof is the observation that each distribution can be written as the following mixture distribution:</p><p>, where &#732; is a certain distribution associated with . So, instead of client &#8712; [ -1] mapping its data point according to , we can view it as the client maps according to &#8242; with probability 1 0 and according to &#732; with probability (1 -1 0 ). Thus the number of clients that sample from the distribution &#8242; follows a binomial distribution Bin( -1, ) with parameter = 1 0 . This allows us to write the distribution of M when clients map their data points according to 1 , . . . , , &#8242; as a convex combination of the distribution of M when clients map their data points according to &#732; 1 , . . . , &#732; -1 , , &#8242; ; see Lemma 5.1. Then using a joint convexity argument (see Lemma 5.2), we write the R&#233;nyi divergence between the original pair of distributions of M in terms of the same convex combination of the R&#233;nyi divergence between the resulting pairs of distributions of M as in Lemma 5.1. Using a monotonicity argument (see Lemma 5.3), we can remove the effect of clients that do not sample from the distribution &#8242; without decreasing the R&#233;nyi divergence. By this chain of arguments, we have reduced the problem to the one involving the computation of R&#233;nyi divergence only for the special form of neighboring datasets, which proves Theorem 3.6. Details can be found in Section 5. </p><p>Let : A &#8594; R denote a random variable (r.v.) associated with the distribution M (D ), and for every &#8712; A , is defined as</p><p>With this, we can rewrite <ref type="bibr">(15)</ref> in terms of the moments of . Then we show that is a sub-Gaussian r.v. that has zero-mean and bounded variance. Using the sub-Gaussianity of , we bound its higher moments (see Lemma 6.1). Substituting these bounds in <ref type="bibr">(15)</ref> proves Theorem 3.7. Details can be found in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">NUMERICAL RESULTS</head><p>In this section, we present numerical experiments to show the performance of our bounds on the RDP of the shuffle model and its usage for getting approximate DP and composition results.</p><p>RDP of the shuffle model: In Figure <ref type="figure">2</ref>, we plot several bounds on the RDP of the shuffle model in different regimes. In particular, we compare between the first upper bound on the RDP given in Theorem 3.1, the second upper bound on the RDP given in Theorem 3.3, the lower bound on the RDP given in Theorem 3.4, and the upper bound on the RDP given in <ref type="bibr">[22,</ref> Remark 1] and stated in (9). <ref type="foot">6</ref> It is clear that our first upper bound <ref type="bibr">(5)</ref> gives a tighter bound on the RDP in comparison with the second bound <ref type="bibr">(8)</ref> and the upper bound given in <ref type="bibr">[22]</ref>. Furthermore, the first upper bound is close to the lower bound for small values of the LDP parameter 0 and for high orders . In addition, the gap between our proposed bound in Theorem 3.1 and the bound given in <ref type="bibr">[22]</ref> increases as the LDP parameter 0 increases. We also observe that the curves of the lower and upper bounds on the RDP of the shuffle model saturate close to 0 when the order approaches to infinity. This indicates that the pure DP of the shuffle model is bounded below by 0 , an observation made in literature <ref type="bibr">[3,</ref><ref type="bibr">22]</ref>. As can be seen in Figures <ref type="figure">2d</ref> and<ref type="figure">2e</ref>, the RDP obtained by standard approximate DP to RDP conversion in <ref type="bibr">[22,</ref> Remark 1], can be several orders of magnitude loose in comparison to our analysis.</p><p>Approximate DP of the shuffle model: Analyzing RDP of the shuffle model provides a bound on the approximate DP of the shuffle model from the relation between the RDP and approximate DP as shown in Lemma 2.1. In Figure <ref type="figure">3</ref>, we plot several bounds on the approximate ( , )-DP of the shuffle model for fixed = 10 -6 . In Figures <ref type="figure">3d</ref> and<ref type="figure">3b</ref>, we do not plot the results given in <ref type="bibr">[22]</ref>, since their bounds are quite loose and are far from the plotted range when 0 &gt; 1. We can see that our analysis of the RDP of the shuffle model provides a tighter bound on the approximate DP of the shuffle model in comparison with the bound given in <ref type="bibr">[7]</ref> in some regimes. However, our RDP analysis performs worse than the best known bound given in <ref type="bibr">[24]</ref>, when used without composition. This might be due to the gap between our upper and lower bound on the RDP of the shuffle model as the lower bound provides better performance than the bound given in <ref type="bibr">[24]</ref> for all values of LDP parameter 0 . Note that the main use case for converting our RDP analysis to approximate DP is after composition rather than in the single-shot conversion illustrated in Figure <ref type="figure">3</ref>.</p><p>Composition of a sequence of shuffle models: We now numerically evaluate the privacy parameters of the approximate ( , )-DP for a composition of mechanisms (M 1 , . . . , M ), where M is a shuffle mechanism for all &#8712; [ ]. In Figure <ref type="figure">4</ref>, we plot three different bounds on the overall privacy parameter for fixed = 10 -8 for a composition of identical shuffle models. The first bound on Now, using Lemma 5.1, in the following lemma we show that the R&#233;nyi divergence between (P) and (P &#8242; ) can be upper-bounded by a convex combination of the R&#233;nyi divergence between (P C ) and</p><p>). For any &gt; 1, the function</p><p>is jointly convex in ( (P), (P &#8242; )), i.e.,</p><p>We prove Lemma 5.2 in Appendix B.2. For any </p><p>same , if we remove the effect of distributions in P [ -1]\C in the RHS of ( <ref type="formula">24</ref>), we would be able to bound the RHS of (24) using the RDP for the special neighboring datasets in</p><p>same . This is precisely what we will do in the following lemma and the subsequent corollary, where we will eliminate the distributions in P [ -1]\C in the RHS <ref type="bibr">(24)</ref>.</p><p>The following lemma holds for arbitrary pairs (P, P &#8242; ) of neighboring distributions P = { 1 , . . . , } and P &#8242; = { 1 , . . . , -1 , &#8242; }, where we show that E &#8764; ( P &#8242; ) ( P) ( ) ( P &#8242; ) ( ) does not decrease when we eliminate a distribution (i.e., remove the data point from the datasets) for any &#8712; [ -1]. We need this general statement as it will be required in the proof of Theorem 3.1 later.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>L 5.3 (M ).</head><p>For any &#8712; [ -1], we have</p><p>where, for &#8712; [ -1], P -= P \ { } and P &#8242; -= P &#8242; \ { }. Note that in the left hand side (LHS) of (25), (P), (P &#8242; ) are distributions over A , whereas, in the RHS, (P -), (P &#8242; -) for any &#8712; [ -1] are distributions over A -1 .</p><p>We prove Lemma 5.3 in Appendix B.3. Note that Lemma 5.3 is a general statement that holds for arbitrary pairs (P, P &#8242; ) of neighboring distributions. For our purpose, we apply Lemma 5.3 with (P C , P &#8242; C ) for any C &#8838; [ -1] and then eliminate the distributions in P [ -1]\C one by one. The result is stated in the following corollary. </p><p>We prove Corollary 5.4 in Appendix B.4. Substituting from ( <ref type="formula">26</ref>) into <ref type="bibr">(24)</ref> and noting that for every &#8712; A , (P)( ) and (P &#8242; )( ) are distributionally equal to M (D)( ) and M (D &#8242; )( ), respectively, we get</p><p>The inequality (a) is the same as <ref type="bibr">(24)</ref>, just writing it differently. In (b) we used <ref type="bibr">(26)</ref> and in (c) we used the fact that number of -sized subsets of [ -1] is equal to -1 . This completes the proof of Theorem 3.6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">PROOF OF RDP FOR THE SPECIAL FORM</head><p>Fix an arbitrary &#8712; N and consider any pair of neighboring datasets (D , D &#8242; ) &#8712; D same . Let D = ( , . . . , ) &#8712; X and D &#8242; = ( , . . . , , &#8242; ) &#8712; X . Let = ( 1 , . . . , ) and &#8242; = ( &#8242; 1 , . . . , &#8242; ) be the probability distributions of the discrete 0 -LDP mechanism R : X &#8594; Y = [ ] when its inputs are and &#8242; , respectively, where</p><p>Since M is a shuffle mechanism, it induces a distribution on A for any input dataset. So, for any &#8712; A , M (D )( ) and M (D &#8242; )( ) are equal to the probabilities of seeing when the inputs to M are D and D &#8242; , respectively. Thus, for a given histogram = (&#8462; 1 , . . . , &#8462; ) &#8712; A with elements and bins, we have</p><p>where MN ( , , ) denotes the Multinomial distribution with</p><p>Note that (28) can be obtained as a special case of the general distribution in ( <ref type="formula">17</ref>) by putting = for each client . For M (D &#8242; ), note that the last client (independent of the other clients) maps its input data point &#8242; to the 'th bin with probability &#8242; , and the remaining ( -1) clients' mappings induce a distribution on A -1 . Thus, M (D &#8242; )( ) for any &#8712; A can be written as</p><p>where = &#8462; 1 , . . . , &#8462; -1 , &#8462; -1, &#8462; +1 , . . . , &#8462; &#8712; A -1 . We implicitly assume that if &#8462; = 0 for some &#8712; [ ], then MN -1, , = 0 as one of the elements is negative. Note that similar to ( <ref type="formula">28</ref>), <ref type="bibr">(29)</ref> can also be obtained from <ref type="bibr">(17)</ref> as a special case. Using the polynomial expansion (1 + ) = =0</p><p>(with = M ( D &#8242; ) ( ) M ( D ) ( ) -1 in the following), we have:</p><p>Let : A &#8594; R be a random variable associated with the distribution M (D ) on A , and for any &#8712; A , define ( ) :=</p><p>Substituting this in <ref type="bibr">(30)</ref> gives:</p><p>The RHS of ( <ref type="formula">31</ref>) is in terms of the moments of , which we bound in the following lemma. Before that, first we simplify the expression for ( ) by computing the ratio</p><p>Thus, we get ( )</p><p>Remark 8. As mentioned in Remark 5, we could tighten our upper bounds for specific mechanisms. As shown in <ref type="bibr">(31)</ref> above, the R&#233;nyi divergence of a mechanism between two neighboring datasets can be written in terms of the moments of a r.v. , which is defined as the ratio of distributions of the mechanism on these two neighboring datasets. However, since our goal is to bound RDP for all 0 -LDP mechanisms, we prove the worse-case bound on the moments of that holds for all mechanisms; see (34) in Lemma 6.1 for bound on the &#8805; 3'rd moments of and (38) in Lemma 6.2 for bound on the variance of . L 6.1. The random variable has the following properties:</p><p>(1) has zero mean, i.e., E &#8764;M ( D ) [ ( )] = 0.</p><p>(2) The variance of is equal to</p><p>(3) For &#8805; 3, the 'th moment of is bounded by</p><p>is the Gamma function.</p><p>A proof of Lemma 6.1 is presented in Appendix C.1. Substituting the bounds from Lemma 6.1 into (31), we get</p><p>Note that 1 , . . . , , &#8242; 1 , . . . , &#8242; are defined for the fixed pair of datasets (D , D &#8242; ) &#8712; D same that we started with. So, the term con-</p><p>and that is the only term in <ref type="bibr">(35)</ref> that depends on (D , D &#8242; ). Since Theorem 3.7 requires us to bound <ref type="bibr">(35)</ref> for any pair of neighboring datasets (D , D &#8242; ) &#8712; D same , so, in order to prove Theorem 3.7, we need to compute sup ( D ,D &#8242; ) &#8712;D same =1 &#8242;2 -1 . We bound this in the following.</p><p>Define a set T 0 consisting of all pairs of -dimensional probability vectors satisfying the 0 -LDP constraints as follows:</p><p>Note that T 0 contains all pairs of the output probability distributions ( , &#8242; ) of all 0 -LDP mechanisms R on all neighboring data points , &#8242; &#8712; X. Since any (D , D &#8242; ) &#8712; D same generates a pair of probability distributions ( , &#8242; ) &#8712; T 0 (because D = ( , . . . , ) and D &#8242; = ( , . . . , , &#8242; ) together contain only two distinct data points , &#8242; ), we have sup</p><p>In the following lemma, we bounds the RHS of (37). L 6.2. We have the following bound:</p><p>We prove Lemma 6.2 in Appendix C.2. Taking supremum over (D , D &#8242; ) &#8712; D same in <ref type="bibr">(35)</ref> and then using <ref type="bibr">(37)</ref> and <ref type="bibr">(38)</ref>, we get the bound in Theorem 3.7.</p><p>In this section, we will prove our upper bounds stated in Theorems 3.1 and 3.3 in Sections 7.1 and 7.2, respectively. same . Recall from Theorem 3.6, we have</p><p>where := -1 (1 -) --1 . For simplicity of notation, for any &#8712; {0, 1, . . . , -1}, define</p><p>We show in Appendix D.1 that is a non-increasing function of . Using this and concentration properties of the Binomial r.v., we get (details are in Appendix D.1):</p><p>where &gt; 0 is arbitrary, and expectation is taken w.r.t. &#8764; M (D &#8242; ).</p><p>Note that we have already bounded for all in Theorem 3.7. By setting = 1 2 and = &#8970;(1 -) ( -1)&#8971; + 1 = &#8970; -1 2 0 &#8971; + 1, we get from Theorem 3.7, that:</p><p>Since the above bound holds for arbitrary pairs of neighboring datasets D and D &#8242; , this completes the proof of Theorem 3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Proof of Theorem 3.3</head><p>The proof of Theorem 3.3 follows the same steps as that of the proof of Theorem 3.1 that we outlined in Section 3.3 and also gave formally in Section 7.1, except for the following change. Instead of using Theorem 3.7 for bounding the RDP for specific neighboring datasets, we will use the following theorem. </p><p>We prove Theorem 7.1 in Appendix D.2. Note that Theorem 7.1 implies that -1 &#8804; exp 2 ( 0 -1) 2 holds for every integer &#8805; 2.</p><p>Substituting this in <ref type="bibr">(41)</ref> (by putting = = &#8970; -1 2 0 &#8971; + 1), we get</p><p>This proves Theorem 3.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">PROOF SKETCH OF THE LOWER BOUND</head><p>Consider the binary case, where each data point can take a value from X = {0, 1}. Let the local randomizer R be the binary randomized response (2RR) mechanism, where Pr</p><p>for &#8712; X. It is easy to verify that R is an 0 -LDP mechanism.</p><p>For simplicity, let = 1 0 +1 . Consider two neighboring datasets D, D &#8242; &#8712; {0, 1} , where D = (0, . . . , 0, 0) and D &#8242; = (0, . . . , 0, 1). Let &#8712; {0, . . . , } denote the number of ones in the output of the shuffler. As argued in Section 2.3 on page 4, since the output of the shuffle mechanism M can be thought of as the distribution of the number of ones in the output, we have that &#8764; M (D) is distributed as a Binomial random variable Bin( , ). The proof uses some properties of the Binomial r.v., which are provided in Appendix E.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">CONCLUSION</head><p>The analysis of the RDP for the shuffle model presented in this paper was based on some new analysis techniques that may be of independent interest. The utility of these bounds were also demonstrated numerically, where we saw that in important regimes of interest, we get 8&#215; improvement over the state-of-the-art without sampling and at least 10&#215; improvement with sampling (see Section 4 for more details).</p><p>A simple extension of the results would be to work with local approximate DP guarantees instead of pure LDP. This can be seen by using the tight conversion between approximate DP and pure DP given in <ref type="bibr">[24]</ref>. However, there are several open problems of interest. Our upper bounds hold for general discrete local mechanisms. The extension to continuous distributions requires careful technical analysis as the histogram used for RDP analysis would need to approximate continuous distributions via discretization. We leave the analysis of continuous distributions as a future work. Perhaps the most important one is mentioned in Remark 7. There is a multiplicative gap of the order 0 in our upper and lower bounds, and closing this gap is an important open problem. We believe that our lower bound is tight (at least for the first order term) and the upper bound is loose. Showing this or getting a tighter upper bound may require new proof techniques. A second question could be how to get an overall RDP guarantee if we are given local RDP guarantees instead of local LDP guarantees.</p><p>For the LHS and the RHS, we respectively have</p><p>Therefore, in order to show (49), it suffices to show 3<ref type="foot">foot_6</ref> 4 0 &#8804; &#8730; 3 0 , which is equivalent to 4 5 0 &lt; 9 . Thus, we have shown that 4 5 0 &lt; 9 implies (48). Now we show that when 4 5 0 &lt; 9 , (46) and (47) are automatically satisfied:</p><p>(1) Proof of (46):</p><p>In the second inequality we used &#8805; 1 and in the penultimate inequality we used 2 0 &#8805; from (51). (2) Proof of (47): For this, first we upper-bound the LHS and lower-bound the RHS, and then note that the upper-bound is smaller than the lower-bound. For the upper-bound on exp( 0 --1 8 0 ), note that 0 &#8804; 5 0 /4 = 5 0 4 1/4 &lt;</p><p><ref type="foot">foot_7</ref>/4 . Substituting these bounds in the exponent of exp( 0 --1 8 0 ), we get:</p><p>where &#8242; &gt; 0 is a constant even for small values of . For example, for = 100, we get &#8242; &#8805; 1 20 . For the lower-bound on</p><p>follows from 5 0 &#8804; 4 5 0 &lt; 9 . Now we show the lower bound:</p><p>Note that the upper-bound on exp( 0 --1 8 0 ) is exponentially small in 3/4 , whereas, the lower-bound on</p><p>is inverse-polynomial in . So, for sufficiently large , (47) will be satisfied. This completes the proof of Corollary 3.2 Claim 1 (An Inequality for the Gamma Function). For any &#8712; N and &#8805; 3, we have &#915;( /2) &#8804; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. Note that for any &#8712; N and &#8804; , we have = ( -1) ( -2)...( -+1) ! . We show the claim separately for the cases when is an even integer or not.</p><p>(1) When is an even integer: Since for any integer &#8712; N, &#915;( ) = ( -1)!, so when is an even integer, we have</p><p>&#8804; .</p><p>(2) When is an odd integer: Note that for any integer &#8712; N, we have &#915; + 1 2 =</p><p>(2 )! 4 ! &#8730; ; see <ref type="bibr">[42]</ref>. Let = 2 + 1. Then</p><p>&#8804; where (a) follows because</p><p>. (P) and (P &#8242; ) can be written as the following convex combinations:</p><p>where P C , P &#8242; C are defined in (19)- <ref type="bibr">(21)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. We only show (54); (55) can be shown similarly. For convenience, for any C &#8838; [ -1], define</p><p>With these notations, we can write</p><p>For any &#8712; [ -1], define the following random variable :</p><p>For any subset C &#8838; [ -1], define an event</p><p>Consider an arbitrary &#8712; A . Define a random variable (P) over A whose distribution is equal to (P).</p><p>where, P &#8242; | C |, and P [ -1]\C in the RHS of (e) are defined in the statement of the claim.</p><p>Since the above calculation holds for every &#8712; A , we have proved (54).</p><p>B.2 Proof of Lemma 5.2</p><p>2). For any &gt; 1, the function</p><p>is jointly convex in ( (P), (P &#8242; )), i.e.,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. For simplicity of notation, let = (P) and = (P &#8242; ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Note that E</head><p>, which is also called the Hellinger integral. In order to prove the lemma, it suffices to show that &#8747; 1is jointly convex in ( , ), i.e., if</p><p>Proof of (58) is implicit in the proof of <ref type="bibr">[39,</ref><ref type="bibr">Theorem 13]</ref>. However, for completeness, we prove (58) in Lemma B.1 below. Since = (P) and = (P &#8242; ) are convex combinations of C = (P C ) and C = (P &#8242; C ), respectively, with same coefficients, repeated application of (58) implies (57).</p><p>P . Let ( ) = . It is easy to show that for any &#8805; 1, ( ) is a convex function when &gt; 0. This implies that for any point &#8712; &#937; in the sample space, we have</p><p>where the last inequality follows from the convexity of ( ). By multiplying both sides with ( ) and substituting the definition of ( ) = , we get</p><p>By integrating this equality, we get (59).</p><p>B.3 Proof of Lemma 5.3</p><p>we have</p><p>, where, for &#8712; [ -1], P -= P \ { } and P &#8242; -= P &#8242; \ { }. Note that in the LHS, (P), (P &#8242; ) are distributions over A , whereas, in the RHS, (P -), (P &#8242; -) for any &#8712; [ -1] are distributions over A -1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. First we show that E &#8764; ( P &#8242; )</p><p>Note that due to the independence of R on different data points, for any = (&#8462; 1 , . . . , &#8462; ) &#8712; A , we can recursively write the distributions (P)( ) and (P &#8242; )( ) (which are defined in <ref type="bibr">(17)</ref>) as follows:</p><p>where = (&#8462; 1 , . . . , &#8462; -1 , &#8462; -1, &#8462; +1 , . . . , &#8462; ) for any &#8712; [ ]. Here, (P -), (P &#8242; -) are distributions over A -1 . 8 Fix any &#8712; [ -1] and also fix arbitrary 1 , . . . , -1 , +1 , . . . , , &#8242; . Take any &#8712; [0, 1], and consider = 0 + (1 -) 1 .</p><p>Let P = ( 1 , . . . , , . . . , ), P 0 = ( 1 , . . . , 0 , . . . , ), and P 1 = ( 1 , . . . , 1 , . . . , ). Similarly, let P &#8242; = ( 1 , . . . , , . . . , &#8242; ), P &#8242; 0 = ( 1 , . . . , 0 , . . . , &#8242; ), and P &#8242; 1 = ( 1 , . . . , 1 , . . . , &#8242; ). With these definitions, we have P = P 0 + (1 -)P 1 . Note that (P ) -= (P 0 ) -= (P 1 ) -.</p><p>Then, from the recursive definitions of (P) and (P &#8242; ) (given in (60) and (61), respectively), for any &#8712; A , we get (P )( ) = =1 ((P ) -) ( ) 8 We assume that ( P -) ( ) = 0 and ( P &#8242; -) ( ) = 0 if &#8462; -1 &lt; 0. Similarly, we can show that (P &#8242; )( ) = (P &#8242; 0 )( )+(1-) (P &#8242; 1 )( ). Thus we have shown that</p><p>From Lemma B.1, we have that E &#8764; ( P &#8242; )</p><p>is jointly convex in (P) and (P &#8242; ). As a result, we get The LDP constraints put some restrictions on the set of values that the distribution can take; however, the maximum value that E &#8764; ( P &#8242; ) ( P) ( ) ( P &#8242; ) ( ) takes can only increase when we remove those constraints. We instead maximize it w.r.t. over the simplex &#916; := {( 1 , . . . ,</p><p>) : &#8805; 0 for &#8712; [ ] and =1 = 1}. This implies</p><p>Substituting from (60) and ( <ref type="formula">61</ref>) into (63), we get</p><p>Since maximizing a convex function over a polyhedron attains its maximum value at one of its vertices, and there are vertices in the simplex &#916; , which are of the form * = 1 for some * &#8712; [ ] and = 0 for all &#8800; * , we have max</p><p>Since the 'th data point deterministically maps to the * 'th output by the mechanism R, the expectation term in the RHS of (a) has no dependence on the 'th data point, so we can safely remove that, which gives (b). This proves Lemma 5.  </p><p>In the last equality, we used that P &#8242; (1) has zero mean, i.e., E &#8764;M ( D ) [ ( )] = 0.</p><p>(2) The variance of is equal to</p><p>(3) For &#8805; 3, the th moment of is bounded by</p><p>is the Gamma function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. For simplicity of notation, let 0 , 1 denote the distributions M (D ), M (D &#8242; ), respectively. As shown in <ref type="bibr">(32)</ref>, for any &#8712; A , we have</p><p>]. Now we show the three properties.</p><p>(1) The mean of the random variable is given by</p><p>(2) The variance of the random variable is given by</p><p>Here, step (b) uses properties of multinomial distribution:</p><p>, and (see <ref type="bibr">[38,</ref><ref type="bibr">Lemma 1.8]</ref>)), we have that is a sub-Gaussian r.v.</p><p>with parameter 2 =</p><p>It follows that (h) = =1 is also a sub-Gaussian random variable with parameter 2 . The remaining steps are similar to bound the moments of a sub-Gaussian random variable. We write them here for completeness. From Chernoff bound we get</p><p>where (b) follows by setting = 2 . Similarly, we can bound the term Pr [-&#8805; ]. Thus, we get</p><p>Hence, the 'th moment of the random variable can be bounded by</p><p>where step (b) follows by setting = 2 2 2 (change of variables). In the last step, &#915;</p><p>denotes the Gamma function. Thus, we conclude that for every &#8805; 3, we</p><p>.</p><p>This completes the proof of Lemma 6.1.</p><p>C.2 Proof of Lemma 6.2</p><p>2). We have the following bound:</p><p>Since the function ( , ) = 2 is convex in ( , ) for &gt; 0, it implies that the objective function ( , &#8242; ) is also convex in ( , &#8242; ).</p><p>It is easy to verify that T 0 is a polytope. Since we maximize a convex function ( , &#8242; ) over a polytope T 0 , the optimal solution is one of the vertices of the polytope. Note that any vertex ( , &#8242; ) of the polytope in dimensions satisfies all the LDP constraints (i.e., -0 &#8804; &#8242; &#8804; 0 , = 1, . . . , ) with equality. Without loss of generality, assume that the optimal solution ( &#732; , &#732; &#8242; ) is a vertex such that &#732; &#8242; &#732; = 0 for = 1, . . . , and &#732; &#8242; &#732; = -0 for = + 1, . . . , , for some &#8712; [ ]. Thus, we have</p><p>Rearranging the above gives =1 &#732; = 1 0 +1 . This implies =1 &#732; &#8242; = 0 0 +1 , which in turn implies = +1 &#732; &#8242; = 1 0 +1 . Now the result follows from the following set of equalities:</p><p>where the last equality uses the identity 3 + 1 = ( + 1)( 2 -+ 1). This completes the proof of Lemma 6.2.</p><p>D OMITTED DETAILS FROM SECTION 6 D.1 Omitted Details from Section 7.1</p><p>Before proving <ref type="bibr">(40)</ref>, first we show an important property of that we will use in the proof. Here, steps (a) and (d) follow from the fact that is a non-increasing function of (see Lemma D.1). Step (b) follows from the Chernoff bound. In step (c), we used that M ( ) = R ( ) and M ( &#8242; ) = R ( &#8242; ), which together imply that</p><p>where the inequality follows because R is an 0 -LDP mechanism. (67)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. Fix an arbitrary &#8712; N. Let (D , D &#8242; ) &#8712; D same and = ( 1 , . . . , ), &#8242; = ( &#8242; 1 , . . . , &#8242; ) be the same as defined in the proof of Theorem 3.7 in Section 6.</p><p>where the first equality uses <ref type="bibr">(32)</ref> and the last inequality follows from 1 + &#8804; . In (68), is distributed according to M (D ) = H (R ( ), . . . , R ( )), where H denotes the shuffling operation on elements and range of R is equal to [ ]. Since all the data points are identical, and all clients use independent randomness for computing R ( ), we can assume, w.l.o.g., that M (D ) is a collection of i.i.d. random variables 1 , . . . ,</p><p>, where Pr [ = ] = for &#8712; [ ]. Thus, we have (in the following, note that = (&#8462; 1 , . . . , &#8462; ) is a r.v.)</p><p>where 1 {&#8226; } denotes the indicator r.v. Substituting from (69) into (68), we get</p><p>where = [ 1 , . . . , ]. From Taylor expansion of = 1+ &#8734; =1 ! , we get</p><p>where the inequality follows from .</p><p>(since 1 --&#8804; )</p><p>This completes the proof of Theorem 7.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E OMITTED DETAILS FROM SECTION 8</head><p>In this section, we provide a complete proof of Theorem 3.4. Consider the binary case, where each data point can take a value from X = {0, 1}. Let the local randomizer R be the binary randomized response (2RR) mechanism, where Pr [R ( ) = ] = 0 0 +1 for &#8712; X. It is easy to verify that R is an 0 -LDP mechanism. For simplicity, let = 1 0 +1 . Consider two neighboring datasets D, D &#8242; &#8712; {0, 1} , where D = (0, . . . , 0, 0) and D &#8242; = (0, . . . , 0, 1). Let &#8712; {0, . . . , } denote the number of ones in the output of the shuffler. As argued in Section 2.3 on page 4, since the output of the shuffle mechanism M can be thought of as the distribution of the number of ones in the output, we have that &#8764; M (D) is distributed as a Binomial random variable Bin( , ). Thus, we have</p><p>(1 -) --1 .</p><p>It will be useful to compute </p><p>Thus, we have that In view of Remark 6, this completes the proof of Theorem 3.4.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The mechanisms used have a long history including Randomized Response<ref type="bibr">[41]</ref>, but were recently studied through the lens of local differential privacy (LDP).</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>We used the open source implementation for the privacy analysis in<ref type="bibr">[24]</ref> available from https://github.com/apple/ml-shuffling-amplification.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>The idea of writing the distribution of the output of an LDP mechanism as a mixture distribution was previously proposed in<ref type="bibr">[7,</ref><ref type="bibr">24]</ref>. However, the way these mixture distributions are used in our RDP analysis is different from these works, including what mixtures we create and how we use them.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>An optimal conversion from RDP to approximate DP was studied in<ref type="bibr">[2]</ref>; however, we observed numerically, that it does not give better performance as compared to the conversion presented above.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4"><p>As mentioned in Section 1, this was obtained by the standard conversion results from DP to RDP, which could be loose.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5"><p>The results in<ref type="bibr">[24]</ref> are for approximate DP (not for RDP), that is why we did not compare with them in Figure2.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_6"><p>2 0 -1</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_7"><p/></note>
		</body>
		</text>
</TEI>
