<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Computing Epidemic Metrics with Edge Differential Privacy</title></titleStmt>
			<publicationStmt>
				<publisher>PMLR</publisher>
				<date>05/02/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10517657</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of Machine Learning Research</title>
<idno>2640-3498</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>George Li</author><author>Dung Nguyen</author><author>Anil Vullikanti</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Metrics such as the outbreak size in an epidemic process on a network are fundamental quantities used in public health analyses. The datasets used in such models used in practice, e.g., the contact network and disease states, are sensitive in many settings. We study the complexity of computing epidemic outbreak size within a given time horizon, under edge differential privacy. These quantities have high sensitivity, and we show that giving algorithms with good utility guarantees is impossible for general graphs. To address these hardness results, we consider a smaller class of graphs with similar properties as social networks (called expander graphs) and give a polynomial-time algorithm with strong utility guarantees. Our results are the first to give any non-trivial guarantees for differentially private infection size estimation.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Epidemic models, especially network based, are commonly used in public health analyses, e.g., <ref type="bibr">(Marathe and Vullikanti, 2013;</ref><ref type="bibr">Adiga et al., 2020)</ref>. Such a model is defined on a contact network G = (V, E), where V denotes the population, and E denotes the set of edges, on which infections could spread. In the Susceptible-Infected-Recovered (SIR) model of disease spread, an infected node v &#8712; V spreads the infection to each susceptible neighbor, independently, with some probability. The simplest metrics of interest in epidemic analyses are expected number of infections, peak size (which are at a population level), and individual risk of infection (which is at an individual level). There has been a lot of previous work on forecasting such metrics, e.g., <ref type="bibr">(Mamidi et al., 2021;</ref><ref type="bibr">Adiga et al., 2022)</ref>. One limitation of previous approaches is the lack of data privacy guarantees. These methods use diverse kinds of datasets as inputs, including mobility traces, contacts, and infection status. Some individual risk prediction models were based on electronic medical record data, e.g., <ref type="bibr">(Mamidi et al., 2021)</ref>, and contact tracing apps relied on contacts inferred through phones, e.g., <ref type="bibr">(Akinbi et al., 2021)</ref>. Many of these are very sensitive datasets. For instance, many individuals might prefer to keep their contacts and disease states private; indeed, privacy is considered to be one of the reasons the deployment of contact tracing apps were not adopted by a large fraction of the population <ref type="bibr">(Akinbi et al., 2021)</ref>. Our work attempts to address this limitation via the framework of differential privacy <ref type="bibr">(Dwork et al., 2006)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Our Contributions</head><p>We initiate the study of differentially private estimation of the expected number of infections in a graph G resulting from s source infections, denoted by f (G). If the initial infections set is given, we show that any differentially private algorithm must incur an additive &#920;(n) factor in general (see Section 6). Therefore, we focus on the setting where the s source infections are selected randomly. For this setting, we show that we can beat the &#8486;(n) lower bound, in contrast to hardness results in the fixed-source setting. We show that the global sensitivity of f (G) in the random initial infection setting is &#920;(n/s) (see Section 3). Combined with the Laplace mechanism <ref type="bibr">(Dwork et al., 2014)</ref>, this implies an differentially private algorithm with &#920;(n/s) expected additive error. While this is significantly better than a naive bound of &#8486;(n), it can lead to poor utility in general.</p><p>This motivates alternative approaches to global sensitivity such as various local sensitivity based methods. We first show that the smooth sensitivity technique <ref type="bibr">(Nissim et al., 2007)</ref> can be applied to our problem, and can greatly improve utility over global sensitivity-based methods. Since the smooth sensitivity is difficult to compute, we show computing a multiplicative approximation for smooth sensitivity suffices for privacy. We then give a quasi-polynomial time algorithm for computing approximate smooth sensitivity for our problem.</p><p>To address the slow running time, we give a generalization of the classical propose-test-release approach, and use it to give a polynomial-time algorithm with strong utility guarantees for a class of sparse but wellconnected graphs called expanders. Since many social networks are believed to satisfy approximate expansion properties, this gives the first positive result for estimating infection size with differential privacy for social network graphs. Additionally, we believe that our generalized propose-test-release framework will be of independent interest, since it was the only approach which improved over the quasi-polynomial runtime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Related Work</head><p>There is no prior work on the problem we study here, and there has been little work on private computation of epidemic metrics. The most closely related work is the work on the Individual Risk Prediction problem <ref type="bibr">(Harrison et al., 2023)</ref>, which involves privately predicting the probability of infection for a node in the next &#8710; time steps. Recent work by <ref type="bibr">Liu and Smith (2023)</ref> develops a federated learning method for this problem, while guaranteeing node differential privacy. While this method could be used to determine the expected outbreak size by summing the probabilities, the accuracy would be much worse than directly estimating the size. Specifically, the performance of their algorithm degrades very rapidly for large &#8710;, which would be needed for estimating the full outbreak size. This work is orthogonal to ours. More generally, there has been a lot of work on private algorithms for computing a variety of graph statistics (e.g., degree distribution and counts of subgraphs) in node, edge and attribute privacy models <ref type="bibr">(Kasiviswanathan et al., 2013;</ref><ref type="bibr">M&#252;lle et al., 2015;</ref><ref type="bibr">Imola et al., 2021;</ref><ref type="bibr">Blocki et al., 2013;</ref><ref type="bibr">Ji et al., 2019;</ref><ref type="bibr">Hay et al., 2009;</ref><ref type="bibr">Zhang et al., 2015;</ref><ref type="bibr">Dhulipala et al., 2022</ref><ref type="bibr">Dhulipala et al., , 2023))</ref>. Since graph statistics generally have high global sensitivity (e.g., the number of triangles in a graph has a sensitivity of n -2 under edge privacy), using the Laplace mechanism can be very inaccurate. Consequently, more advanced techniques based such as smooth sensitivity <ref type="bibr">(Nissim et al., 2007;</ref><ref type="bibr">Karwa et al., 2014)</ref>, ladder functions <ref type="bibr">(Zhang et al., 2015)</ref>, proposetest-release <ref type="bibr">(Dwork et al., 2014)</ref>, and inverse sensitivity <ref type="bibr">(Asi and Duchi, 2020)</ref> have been developed to provide much more accurate counts for some problems, but are computationally expensive (see <ref type="bibr">(Li et al., 2023)</ref> for a recent survey on private graph algorithms). Our work contributes to and generalizes some approaches in this line of work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Independent Cascades Model</head><p>We consider the Independent Cascades (IC) model, which is the simplest instance of the more general Susceptible-Infected-Recovered (SIR), on a contact graph G = (V, E) <ref type="bibr">(Marathe and Vullikanti, 2013)</ref>. In this model, each node v is in one of Susceptible (S), Infectious (I) or Recovered (R) state. In the beginning (t = 0), we have a subset I 0 &#8838; V of source nodes in the infected state, with s := |I 0 |, and all remaining nodes are in the Susceptible state. Let I t denote the set of nodes which are infected at time t. At each time-step t, an infected node u can infect each susceptible neighbor v with probability p, independent of other neighbors of v. An infected node v is assumed to recover after one time step, so all nodes in I t are Recovered in the next timestep. The expected total number of infections f (G) = E[| t I t |] is one of the most basic metrics in epidemic analyses, which we will try to estimate given an input graph and transmission probability p.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Differential Privacy Model</head><p>We use the notion of differential privacy <ref type="bibr">(Dwork et al., 2014)</ref>, which is one of the most widely used standards of privacy, and has been extended to graph data <ref type="bibr">(Blocki et al., 2013;</ref><ref type="bibr">Kasiviswanathan et al., 2013)</ref>. We focus on the edge privacy model <ref type="bibr">(Blocki et al., 2013)</ref>, which guarantees differential privacy for the contact data between two individuals. Formally, edge-privacy is defined as follows.</p><p>Definition 2.1. We say that two graphs G, G &#8242; are edge-neighbors, i.e., G &#8764; G &#8242; , if they differ in exactly one edge, i.e., |E(G)&#8710;E(G &#8242; )| = 1, where E(G) denotes the set of edges of graph G.</p><p>Definition 2.2. Let G denote the set of all undirected graphs. A (randomized) algorithm M : G &#8594; R is (&#1013;, &#948;)edge differentially private if for all subsets S &#8834; R of its output space, and for all</p><p>One of the most common mechanisms for guaranteeing differential privacy is the Laplace mechanism <ref type="bibr">(Dwork et al., 2014)</ref>, which adds suitably scaled Laplace noise to a statistic to guarantee privacy. We now state the mechanism formally in the context of graph algorithms.</p><p>Definition 2.3. Let h : G &#8594; R be any graph statistic. The Laplace mechanism M h is defined as </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Problem Definition</head><p>In our problem, we will be given a contact graph G = (V, E) and transmission probability p. We will also be given a parameter s, indicating the number of starting infections I 0 . Our goal is to estimate the expected number of infections in the graph G when the s source nodes are chosen uniformly at random. We denote this quantity by f s (G). Formally, our goal is to obtain an (&#945;, &#946;)-approximation for computing f s (G).</p><p>Definition 2.5. We say f (G) is an (&#945;, &#946;)-</p><p>We make some remarks on our problem definition. First, we note that our model has random sources instead of a fixed set of sources given as input. This is justified in Section 6, where we show strong lower bounds for the fixed-sources model: &#8486;(n) additive error is necessary even for expander graphs. Second, we note that our notion of approximation contains multiplicative and additive error. This is the standard one in differential privacy, since additive noise is needed to guarantee privacy while multiplicative approximation is necessary even in the non-private setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Analysis of Global Sensitivity</head><p>In this section, we show that the global sensitivity of f s (G) is &#920;(n/s) when p = 1, where s is the number of random source nodes. We will show that this implies a differentially private algorithm for estimating the expected number of infections with multiplicative error 1 + &#951; and additive error &#920;(n/s), for any given &#951; &gt; 0. We first give a proof of the global sensitivity bound.</p><p>Lemma 3.1. For p = 1, the global sensitivity of f s (G) is &#920;(n/s) for each s.</p><p>Proof. Consider the vector x(G) &#8712; [0, 1] V where x v (G) for each v &#8712; V is the probability that v is infected under the random seeded starting infections. Let C 1 , . . . , C r denote the connected components of G and let C(v) denote the connected component node v &#8712; V lies in. We can calculate</p><p>Now, we calculate the global sensitivity of the statistic f s (G) under the addition or removal of an edge. As mentioned above, this is exactly the &#8467; 1 -sensitivity of the vector x(G). Since we are considering the global sensitivity, we can without loss of generality consider only edge additions; let's say the edge (u, v) &#8712; E is added to the graph G and suppose C(u) &#824; = C(v). Now, consider the changes in the vector x(G). For w &#8712; C(u), the probability of infection increases from 1 -</p><p>Since the remaining probabilities don't change, the &#8467; 1 -sensitivity of x is A 1 + A 2 , where</p><p>The global sensitivity is the maximum of</p><p>This can clearly be computed in polynomial time, so we now have an efficient and private algorithm for computing the expected infections size.</p><p>We will now try to upper bound this expression to obtain accuracy guarantees. Then the expression above can be upper bounded by the following by using 1-x &#8804; e -x for all x &#8712; R:</p><p>Since the two terms in the above expression are independent with respect to |C(u)|, |C(v)|, we can optimize them separately. Simple calculus then gives us that the expression is maximized when |C(u)| = n s , so we have an upper bound of n es . A similar procedure shows that the second term is also upper bounded by n es , so the entire expression is upper bounded by 2n es , so the global sensitivity is O(n/s). Taking |C(u)| = |C(v)| = n/s in the expressions for A 1 and A 2 also shows that the global sensitivity is &#8486;(n/s), proving Lemma 3.1.</p><p>Next, we use our bound on the global sensitivity in the deterministic case to give an algorithm with nontrivial guarantees in the general probabilistic case.</p><p>Theorem 3.2. Given any constant &#951; &gt; 0, there exists a polynomial-time &#1013;-edge differentially private algorithm which computes a (1 + &#951;, O(n/s&#1013;))approximation for f s (G) in expectation.</p><p>Proof. Observe that the running an SIR process on the graph with probability p is equivalent to removing each edge with probability p and then running an SIR process on the resulting graph with probability 1. Given this observation, our algorithm proceeds as follows. We sample N graphs G 1 , . . . , G N , where each G i is obtained from G by retaining edge with probability p.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Then we compute the average fs</head><p>so the global sensitivity of the average fs (G) must also be O(n/s). Our algorithm concludes by outputting fs (G) using the Laplace mechanism, which guarantees edge-differential privacy by Lemma 2.4. By a standard argument via Chernoff-Hoeffding bounds, the average is within a 1 &#177; &#951; multiplicative factor of f s (G) when N = &#8486;(n 2 ), giving us the multiplicative approximation guarantee. The additive approximation guarantee follows directly from combining the bound on the sensitivity with the utility of the Laplace mechanism (Lemma 2.4).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Smooth Sensitivity and Its Difficulties</head><p>In this section, we explore a local sensitivity-based approach to the problem, called smooth sensitivity.</p><p>We show that one can compute a good approximation for smooth sensitivity, which we show suffices for guaranteeing differential privacy. The drawback of this approach is that it is difficult to obtain polynomial run-time, and all algorithms here require quasipolynomial time to implement. This illustrates the difficulties when applying local sensitivity-based approaches, which we overcome in Section 5.1. We note that a similar problem arises when using an approximate version of inverse sensitivity to estimate the infection size, but we omit the details here as it is unrelated to our main results.</p><p>First, let us recall the definition of smooth sensitivity from <ref type="bibr">Nissim et al. (2007)</ref>. Definition 4.1. For &#946; &gt; 0, the &#946;-Smooth Sensitivity of f is</p><p>where LS f (y) is the local sensitivity of f at y. <ref type="bibr">Nissim et al. (2007)</ref> show that for some &#946; = &#946;(&#1013;, &#948;), adding Laplace noise calibrated by the &#946;-Smooth Sensitivity to the statistic f (x) suffices to guarantee (&#1013;, &#948;)differential privacy. However, this mechanism requires the exact calculation of the &#946;-Smooth Sensitivity, which is intractable in our setting. We demonstrate here that instead of the exact calculation, we may use the approximation of the Smooth Sensitivity to calibrate the noise. This slightly generalizes recent work by <ref type="bibr">Nguyen et al. (2023)</ref>, which used approximate smooth sensitivity for faster subgraph counting.</p><p>To state and prove our results, we first define our notion of approximating the smooth sensitivity and recall the definition of an admissible noise distributions from <ref type="bibr">Nissim et al. (2007)</ref>. Definition 4.2. We say that Sf,&#946; is a (&#947;, &#964;, &#948; &#8242; )approximation of S * f,&#946; if with probability at least 1 -&#948; &#8242; , we have the following for all datasets x:</p><p>We may drop the parameter &#964; in the notation when &#964; = 0.</p><p>and for all measurable S &#8834; R, we have the following</p><p>(3)</p><p>In particular, the Lap(1) distribution is</p><p>Now, we are ready to state the privacy guarantee of approximate smooth sensitivity mechanisms.</p><p>Theorem 4.4. Let Sf,&#946; be a (&#947;, &#964;, &#948; &#8242; )-approximation of S * f,&#946; , assume we have a lower bound &#961; on the smallest value of the smooth sensitivity min x S * f,&#946; (x), and let Z be sampled from some (&#945;, &#947;+&#946;+&#964; /(&#961;e &#947; ))-admissible distribution. Then mechanism</p><p>Proof. Fix a pair of neighboring datasets x &#8764; y. Let E be the event that S * f,&#946; (x) &#8804; Sf,&#946; (x) &#8804; e &#947; S * f,&#946; + &#964; and E &#8242; be the event that S * f,&#946; (y) &#8804; Sf,&#946; (y) &#8804; e &#947; S * f,&#946; (y)+&#964; . From the definition of our notion of approximation, we have that Pr[E and E &#8242; ] &#8805; 1 -2&#948; &#8242; by the union bound. We will prove that conditioned on events E and E &#8242; , we have that M f (x) is (&#1013;, e &#1013;/2 +1 2 &#948;)-differentially private. Since E and E &#8242; both occur with probability at least 1 -2&#948; &#8242; , this implies the desired result directly by a standard characterization of differential privacy.</p><p>For the remainder of the proof, condition on the events that E and E &#8242; occur. Fix some measurable subset S &#8838; R of the co-domain of f . We have that Pr</p><p>In inequality (a), we used the Sliding Property with </p><p>= log e &#947;+&#946; + log 1 + &#964; e &#947; &#8226; min x S * (x)</p><p>(5)</p><p>Finally, the two equalities follow by definition of the mechanism M f (x), completing the proof.</p><p>We now show how to approximately compute the smooth sensitivity for our problem in quasi-polynomial time. Specifically, fix some constant &#951; &gt; 0 and take</p><p>. We will show how to approximately compute the smooth sensitivity of f (G). Specifically, assume &#1013; = O(1) and &#948; = 1/poly(n); we will show how to compute the &#946;smooth sensitivity of f for &#946; = 1/O(log n).</p><p>Recall the definition of the &#946;-smooth sensitivity for f :</p><p>In the above equations, the second equality follows since d(G, G &#8242; ) lies between 1 and n 2 . Since LS f (G &#8242; ) &#8804; n for all graphs G &#8242; , observe that whenever k = &#8486;(log 2 n), we have that LS f (G &#8242; ) &#8226; e -&#946;k &#8804; 1/poly(n). Hence, it suffices to compute the following approximation which only considers k &lt; &#920;(log 2 n); this gives (&#947;, &#964;, &#948; &#8242; )-approximation for &#947; = 0, &#964; = 1/poly(n), and &#948; &#8242; = 1/poly(n) for arbitrarily large polynomials in n. Formally, our approximation is defined as follows:</p><p>This approximation can easily be computed in &#213;(n log 2 n ) time by trying all possible G &#8242; , since computing the local sensitivity of f can be done in polynomial time.</p><p>In order to apply Theorem 4.4, we need a lower bound &#961; on the smooth sensitivity of any input. To do this, we simply augment any input graph G with an additional isolated node before inputting the graph into the algorithm for computing approximate smooth sensitivity. This way, the algorithm may assume that all input graphs have at least one isolated node, so that the local sensitivity is always at least &#961; = 1. Using this algorithm for computing approximate smooth sensitivity with the results of Theorem 4.4 gives a new private algorithm<ref type="foot">foot_0</ref> for estimating the expected infection size. Unfortunately. it is unclear how to obtain an approximation for the smooth sensitivity faster than trying all possible G &#8242; , since the structure of the local sensitivity of f is so complex.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Polynomial-time Algorithm</head><p>In this section, we will give a polynomial-time algorithm for estimating the expected number of infections with differential privacy. Our approach will be based on the Propose-Test-Release framework of <ref type="bibr">Dwork and Lei (2009)</ref>, which is another instance of the local sensitivity-based methods explored in the previous section. We show that a suitable generalization of the framework suffices to give a polynomial-time algorithm. We then show that the utility guarantees of the algorithm match the ones given in the previous section up to poly-logarithmic factors. For ease of exposition, we will consider the case where there is only one random source (i.e., s = 1). Our claims and method extends to general s, but the expressions are more complicated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Propose-Test-Release Framework</head><p>We first review the propose-test release framework for an arbitrary statistic f (X) on database X &#8712; X n .</p><p>1. Propose a bound &#946; on the local sensitivity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Compute distance</head><p>3. Compute noisy distance &#947; = &#947; + Lap(1/&#1013;).</p><p>4. If &#947; &#8804; ln(1/&#948;)/&#1013;, return &#8869;. Otherwise, return f (X) + Lap(&#946;/&#1013;). <ref type="bibr">Dwork and Lei (2009)</ref> showed that the above mechanism preserves (2&#1013;, &#948;)-differential privacy. Furthermore, the mechanism can provide much stronger utility guarantees if the bound &#946; is much less than the global sensitivity. We generalize the framework slightly for our application. One difficulty in applying the framework is that the bound &#946; on the local sensitivity needs to be derived analytically for a specific class of input databases. To provide more generality, we can instead use a noisy binary search to find a near-optimal bound &#946; given the input database X. The other difficulty is computing the distance &#947;; this often cannot be computed in an efficient manner. Instead, we propose that it suffices to replace &#947;(X) with any non-negative statistic &#981;(X) with sensitivity 1 which is a lower bound on &#947;(X). In some cases such as ours, the algorithm designer can choose a &#981;(X) which can be efficiently computed and well approximates the original distance &#947;(X); this freedom enables us to give computationally efficient algorithms using propose-test-release in new settings. Our mechanism is formalized below.</p><p>1. Let &#1013; &#8242; = &#1013;/ log 2 (GS f ), where GS f is an upper bound on the global sensitivity 2. Binary search &#946; &#8712; {1, . . . , &#8968;GS f &#8969;} for an upper bound on the local sensitivity LS f (X).</p><p>a. Compute lower bound &#981;(X) on distance &#947;(X) = d(X, {X &#8242; : LS f (X &#8242; ) &#8805; &#946;}). b. Add Laplacian noise to the lower bound &#966;(X) = &#981;(X) + Lap(1/&#1013; &#8242; ).</p><p>c. If &#966; &#8804; ln(1/&#948;)/&#1013; &#8242; , increase guess &#946;. Otherwise, decrease guess &#946;.</p><p>3. Let &#946; be the smallest &#946; where &#981;(X) &gt; ln(1/&#948;)/&#1013; &#8242; .</p><p>4. Return f (X) + Lap( &#946;/&#1013;).</p><p>We will first show that the above algorithm is still (2&#1013;, &#948;)-differentially private. In the next subsection, we will illustrate how the above variant of propose-testrelease can be applied to our problem of estimating the expected number of infections in a contact network.</p><p>Lemma 5.1. The above algorithm is (2&#1013;, &#948;)-DP.</p><p>Proof. Let's first analyze each iteration of the binary search. Since we have assumed that the statistic &#981;(X) has sensitivity 1, the output &#966;(X) is &#1013; &#8242; -differentially private by the privacy guarantees of the Laplace mechanism. Consequently, the decision of whether to increase or decrease &#946; in the binary search is also &#1013; &#8242;differentially private by post-processing. By basic composition of adaptive mechanisms, we have (&#1013;, 0)differential privacy for the output of the binary search. Now consider step 6 of the mechanism and let &#946; be as defined in the algorithm. Suppose that &#946; &#8805; LS f (X); then by the privacy guarantees of the Laplace mechanism, we have &#1013;-differential privacy. Now suppose that &#946; &lt; LS f (X); then we have &#947;(X) = 0 by definition which implies &#981;(X) = 0 since 0 &#8804; &#981;(X) &#8804; &#947;(X) for all X by assumption on &#981;. But by the PDF of the Laplace distribution, the probability that &#966;(X) &#8805; log(1/&#948;)/&#1013; &#8242; is at most &#948;. As a result, we can conclude that step 6 is (&#1013;, &#948;)-differentially private.</p><p>Again by basic composition of adaptive mechanisms, we can conclude that our generalized propose-testrelease framework is (2&#1013;, &#948;)-differentially private.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">PTR for Infection Size Estimation</head><p>We will apply the generalized propose-test-release framework described in Section 5.1 to our problem of interest. The primary difficulty in applying the generalized propose-test-release framework is finding a nonnegative statistic &#981;(X) which lower bounds &#947;(X) and is efficiently computable. Recall that &#947;(X) is defined to be the minimum number of edges which need to be added or removed such that the local sensitivity of f exceeds &#946;. Thus, we need to understand the local sensitivity LS f (G) and how it is affected by adding or removing edges from the graph before designing a good function &#981;(&#8226;). As before, we first consider the case where the transmission is deterministic (Lemma 5.3). Then we will use results from the deterministic case to give an algorithm for the general problem with probabilistic transmission (Lemma 5.4).</p><p>For our algorithm, we will need the following result from Aissi et al. ( <ref type="formula">2017</ref>):</p><p>Lemma 5.2. For an undirected graph G = (V, E), we say that a cut (S, V -S) is a b-balanced cut if there is at least b vertices on the smaller side of the cut. The minimum b-balanced cut of a graph (and its corresponding size) can be computed in polynomial time <ref type="bibr">(Aissi et al., 2017)</ref>.</p><p>Next, we define our statistic &#981;(G, &#946;). For each B &#8712; [n 2 ] and each k &#8804; B, we will compute the following quantities: let</p><p>| and let U 2 (B, k) denote the maximum b such that there is a b-balanced cut of size B -k. Finally, we define &#981;(G, &#946;) as the minimum B such that there exists k &#8804; B where U 1 (B, k) + U 2 (B, k) &#8805; &#946;. We will now prove that this is a feasible statistic:</p><p>Further, &#981;(G, &#946;) has sensitivity 1 and a polynomial time algorithm.</p><p>Proof. Recall that the statistic f we wish to estimate is the expected number of infections in the giant component of G. Since there is only one random source (s = 1), we can write f as a function of |C 1 (G)| as follows:</p><p>always, so it suffices to consider the local sensitivity of |C 1 (G)| and how it's affected by addition or removing edges from the graph. More formally, define f (G) = |C 1 (G)| to be our auxiliary statistic. Since f is a 1-Lipschitz function of f , we have that LS f (G) &#8804; LS f (G). This implies that the distance &#947; f (G, &#946;) to the closest dataset which has large local sensitivity is lower bounded by &#947; f (G, &#946;). As a result, it suffices to find a statistic &#981; f (G, &#946;) which lower bounds &#947; f (G, &#946;) in order to obtain our desired statistic &#981; f (G, &#946;). For the remainder of the proof, we will be working with the auxiliary statistic f (G) = |C 1 (G)|. Now let &#946; be given; we wish to find the minimum number of edges to add or remove from G to obtain G &#8242; satisfying LS f (G &#8242; ) &#8805; &#946;. We will first characterize its local sensitivity LS f (G). If one edge is removed from G, the size of the giant component can only change if G is a bridge graph; the local sensitivity will then be the size of the smaller half of the bridge graph. If one edge is added to G, the size of the largest component can change by at most |C 2 (G)| by adding an edge connecting the first and second largest components. We will compute a lower bound &#966; f (G) on the minimum number of edges to add or remove from G to obtain</p><p>. This is because the two quantities differ by at most 1, by adding or removing the single edge which connects the two components.</p><p>As a consequence of the above claim, we only need to reason about increasing |C 2 (G &#8242; )| by adding or removing edges to G. Suppose for each B and k, we can show that U 1 (B, k) + U 2 (B, k) is an upper bound of how much |C 2 (G)| can increase. We claim that this implies that the output &#966; f (G) of the algorithm is a lower bound for the number of edge changes required to obtain |C 2 (G &#8242; )| &#8805; &#946;. Indeed, if it was possible to use less than &#966; f (G) edge changes to obtain |C 2 (G &#8242; )| &#8805; &#946;. Then using those exact edge changes, we can obtain U 1 (B, k) + U 2 (B, k) &#8805; &#946;, contradicting the assumption that &#966; f (G) was the minimal B output by the algorithm. Thus, it suffices to show that U 1 (B, k) + U 2 (B, k) is a valid upper bound. The remainder of the proof focuses on showing this claim. The outline is as follows. We show that given a graph, the amount which k edge additions can increase |C 2 (G)| is upper bounded by LS add (G) &#8804; U 1 (B, k). In order to make use of the B -k edge removals, our goal is to find B -k edges to remove so that the impact LS add (G) of the edge additions is maximized. We will then show that the amount which LS add (G) can be increase by B -k edge removals is at most U 2 (B, k). Combining the results gives our desired claim.</p><p>When adding edges, the optimal choice for increasing |C 2 (G)| is to iteratively add an edge connecting the second largest component C 2 (G &#8242; ) with the third largest component C 3 (G &#8242; ); this can be proven by a standard exchange argument. In particular, one can observe that LS add (G) is a monotone and Lipschitz function of |C i (G)| for all i &gt; 1. Furthermore, it is clear that the amount which the k edge additions can increase |C 2 (G &#8242; )| without using the budget of B -k for edge removals is upper bounded by LS add (G) &#8804; k+2 i=3 |C i (G)|. Next, we wish to find B -k edges to remove first, so that the k edge additions afterwards can optimally increase |C 2 (G &#8242; )|. Since LS add (G) is a monotone function of all |C i (G)| for i &gt; 1, removing edges from within C i (G) for any i &gt; 1 can never increase LS add (G). Thus, we may assume without loss of generality that all edges are removed from</p><p>Next, we use the results in Lemma 5.3 to solve the general problem. Let N denote the number of samples which we will specify later. As in the non-private version of the problem, let G 1 , . . . , G N be N samples of the infection process, taking each edge in G with probability p. Our estimate of the number of infections in</p><p>We will now apply the generalized propose-test-release framework to our statistic f p (G), which uses the following definition of &#981;(G, &#946;):</p><p>) is defined as in Lemma 5.3. Then &#981;(G, &#946;) is a valid lower bound on &#947;(G, &#946;) and has sensitivity 1.</p><p>Proof. Observe that we have</p><p>since we already have that &#981;(G i , &#946;) &#8804; &#947;(G i , &#946;) for each i &#8712; [n] by Lemma 5.3. Thus, it suffices to show that min i&#8712;[n] &#947;(G i , &#946;) &#8804; &#947;(G, &#946;). Suppose for the sake of contradiction that &#947;(G, &#946;) &lt; min i&#8712;[n] &#947;(G i , &#946;). Then there exists a sequence of &#947; edge additions or removals from G to obtain G &#8242; where LS fp (G &#8242; ) &#8805; &#946;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>But note that we have LS</head><p>) contains an edge if and only if G i (resp. G &#8242; i ) contains the edge. Consequently, we can conclude that &#947;(G, &#946;) edges also suffices to transform G i into some G &#8242; i so that LS f (G &#8242; i ) &#8805; &#946;. But this implies that min i&#8712;[n] &#947;(G i , &#946;) &#8804; &#947;(G, &#946;), contradicting our original assumption. Our claim that &#981;(G, &#946;) has sensitivity 1 follows directly since &#981;(G i , &#946;) has sensitivity 1, so we are done. Now that we have a valid statistic &#981;(G, &#946;), we can apply the generalized propose-test-release framework from Section 5.1 to obtain a polynomial-time private mechanism for releasing an estimate of the expected number of infections. In the next subsection, we will analyze its utility in a special class of graphs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Discussion of Utility Guarantees</head><p>Let us first define the class of graphs we will work with.</p><p>Definition 5.5. A (n, d, &#955;)-spectral expander is a dregular graph on n vertices where its eigenvalues (that is, the eigenvalues of its normalized adjacency matrix)</p><p>Spectral expanders have many nice properties which will enable us to give theoretical guarantees when computing the expected outbreak size. For example, the size of the giant component of G under percolation is of size &#920;(n) when an outbreak occurs in G. Such a property is necessary for our approach to work since we made a simplifying assumption that the smaller components only contribute to lower order terms.</p><p>Lemma 5.6. Let G be a (n, d, &#955;)-spectral expander and let G(p) denote the resulting (random) subgraph formed by retaining each edge of G with probability p. Then for p = 1+&#945; d , there is (with high probability) a unique giant component of size &#920;(n) and all other components are of size O(log n).</p><p>Additionally, spectral expanders have the nice property that the resulting giant component after percolation is a vertex expander for sets which are not too small <ref type="bibr">(Diskin and Krivelevich, 2022)</ref>. Using this property, we can show that our statistic &#981;(G, &#946;) is not too small when &#946; = polylog(n). As a result, we can obtain high accuracy estimates of the expected outbreak size.</p><p>Lemma 5.7. Let &#945; &gt; 0 be a small enough constant and let &#946; &gt; 0 be such that &#946; &#8804; &#945; 4 . Let G = (V, E) be a (n, d, &#955;)-spectral expander with &#955; &#8804; &#948;. Fix p &#8805; 1+&#945; d and let G(p) denote the resulting (random) subgraph formed by retaining each edge of G with probability p. If C 1 is the largest component of G(p), then there exists an absolute constant c &gt; 0 such that with high probability for any S &#8838; L 1 with 16 ln(n)</p><p>Now that we have stated the necessary properties, let us prove our utility guarantees.</p><p>Theorem 5.8. Let G be a (n, d, &#955;)-spectral expander. Then our algorithm is an (1+&#951;, poly log(n)/&#1013;)approximation algorithm for estimating the expected number of infections, with high probability.</p><p>Proof. Consider &#946; &#8242; = C log 3 (n) for some sufficiently large constant C; we will show that for all &#946; &#8805; &#946; &#8242; , we have &#966; &#8805; ln(1/&#948;)/&#1013; &#8242; in Line 5 with high probability for our statistic &#981; described above. As a consequence, we have that &#946; &#8804; &#946; &#8242; in Line 6, so our additive error is at most O(&#946; &#8242; &#8226; log(n)/&#1013;) = O(log 3 (n))/&#1013; with high probability by the tail bounds of a Laplace distribution. Since we have that f (X) is within a 1 &#177; &#951; multiplicative factor of the true expected number of infections by a standard Chernoff-Hoeffding argument, we will have our desired approximation guarantees.</p><p>Fix &#946; &#8805; &#946; &#8242; . We will show that the lower bound &#981; on the distance d(X,</p><p>for some large enough constant C &#8242; . This would imply that with high probability, we have &#966;(X) &gt; ln(1/&#948;)/&#1013; &#8242; by the concentration of the Laplace distribution and the assumption that &#948; = 1/poly(n). The claim that &#981; &#8805; C &#8242; log 2 (n)/&#1013; &#8242; follows by the properties of spectral expander graphs discussed before. If we use edge additions, we know that all non-giant components are of size O(log n) so it requires at least &#8486;(log 2 (n)) edge additions in order to increase the local sensitivity to be greater than &#946;. If we use edge deletions, then it requires &#8486;(log 2 (n)) edge deletions since the sampled graph G(p) is an edge expander with high probability (see Theorem 5.7). Thus, we have proven that &#981; &#8805; C &#8242; log 2 (n)/&#1013; &#8242; and we are done.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Lower Bounds</head><p>In this section, we'll give lower bounds for other natural settings in which one may wish to estimate the expected infection size. For example, our model assumes a random set of sources. We show when the set of source nodes are given, there are strong lower bounds even when the underlying graph is a spectral expander. Similarly, we show in the Appendix that if we instead wish to guarantee attribute privacy instead of edge privacy (in the fixed-source model), there are similar lower bounds showing that incurring &#920;(n) additive error is inevitable. These lower bounds explain and justify our use of the random source model in our paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Lower Bound for Fixed Sources</head><p>Here, we justify our use of the random source SIR model by showing if we allow for fixed sources, any differentially private algorithm must incur &#920;(n) additive error. We remark that our lower bound even applies to spectral expanders in the interesting regime of sparse graphs, which most social network graphs satisfy. This is in contrast to our positive results in the previous section for the random source model.</p><p>Theorem 6.1. Let &#1013; &gt; 0 be a constant and let &#948; = o(1). Then any (&#1013;, &#948;)-differentially private algorithm for estimating the expected number infections with a fixed source must incur an additive &#920;(n) expected error, even if the underlying graph is a constant degree spectral expander.</p><p>Proof. Suppose for contradiction that there exists some (&#1013;, &#948;)-differentially private algorithm M which guarantees sublinear additive error for the problem with a fixed source. Take any (n, d, &#955;)-spectral expander G for some constant d and &#955; with some fixed source node s and let the transmission probability be p = 1. Let G &#8242; denote the graph with all edges incident on s removed from G and note that G &#8242; is a d-edge neighbor of G. Since G is an expander, we know that it is connected so the expected number of infections in G with source s is n. Because the M has sublinear expected error, we know that <ref type="bibr">1)</ref>.</p><p>But by group privacy properties, we have that Pr[A(G &#8242; ) &#8712; o(n)] &#8804; exp(d&#1013;) &#8226; o(1) + de (d-1)&#1013; &#948;.</p><p>Since we have assumed that &#1013;, d = &#920;(1) and &#948; = o(1), the right hand side is o(1) so A(G &#8242; ) &#8712; &#920;(n) with high probability. But in G &#8242; , the source node is an isolated vertex so the expected number of infections is 1, so the expected error of the algorithm when run on G &#8242; is &#920;(n), a contradiction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>Our work initiates the study of estimating the expected infection size of an epidemic, modeled by an SIR model, under edge differential privacy. Our main result is a polynomial-time edge-differentially private algorithm for the problem with only poly-logarithmic additive error on expander graphs. We believe our algorithms perform well on real-world graphs, because they often have good expansion properties. However, we note that this isn't immediately implied by our theoretical results, as our results only apply for d-regular graphs. </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>It turns out that this algorithm only incurs polylogarithmic additive error on a class of graphs called expander graphs (defined in Section 5.1), but we omit the proof since it the algorithm is superseded by the one in Section 5.1.</p></note>
		</body>
		</text>
</TEI>
