<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Optimal policy for control of epidemics with constrained time intervals and region-based interactions</title></titleStmt>
			<publicationStmt>
				<publisher>AIMS Press</publisher>
				<date>01/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10557742</idno>
					<idno type="doi">10.3934/nhm.2024039</idno>
					<title level='j'>Networks and Heterogeneous Media</title>
<idno>1556-1801</idno>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">2</biblScope>					

					<author>Xia Li</author><author>Andrea L Bertozzi</author><author>P Jeffrey Brantingham</author><author>Yevgeniy Vorobeychik</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p lang='fr'><p>We introduce a policy model coupled with the susceptible–infected- recovered (SIR) epidemic model to study interactions between policy-making and the dynamics of epidemics. We considered both single-region policies as well as game-theoretic models involving interactions among several regions and hierarchical interactions among policy-makers modeled as multi-layer games. We assumed that the policy functions are piece-wise constant with a minimum time interval for each policy stage, considering that policies cannot change frequently in time or be easily followed. The optimal policy was obtained by minimizing a cost function that consists of an implementation cost, an impact cost, and, in the case of multi-layer games, a non-compliance cost. We show, in a case study of COVID-19 in France, that when the cost function is reduced to the impact cost and parameterized as the final epidemic size, the solution approximates that of the optimal control in Bliman et al, (2021) for a sufficiently small minimum policy time interval. For a larger time interval, however, the optimal policy is a step down function, quite different from the step up structure typically deployed during the COVID-19 pandemic. In addition, we present a counterfactual study of how the pandemic would have evolved if herd immunity was reached during the second wave in the county of Los Angeles, California. Finally, we study a case of three interacting counties with and without a governing state.</p></p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>1. Introduction. In the course of battling COVID-19, public health policies sought to enforce non-pharmaceutical interventions to slow or halt the spread of the pandemic. Common policies included 'safer-at-home', 'social distancing' and 'mask wearing' mandates, which were seen as crucial during the early stages of the pandemic prior to the availability of vaccines. The timeline of COVID-19 globally and locally <ref type="bibr">([7, 26]</ref>) indicates that the evolution of policy affected the evolution of the pandemic and vice versa. For example, in the county of Los Angeles, social distancing was first mandated <ref type="bibr">[9]</ref> on March 21, 2020, about a month after the first reported COVID-19 case in LA. Around that time, the Los Angeles Mayor's Office released the 'safer-at-home' policy <ref type="bibr">[1]</ref>. One week later, beaches, hiking trails, dog parks, skate parks, and other public sites and facilities were temporarily closed. On April 15th, as infected cases continued to increase, facial coverings were mandated in many indoor places <ref type="bibr">[19]</ref>. In hindsight, it is important to ask: Were policies that were enforced done so in an optimal way? What can we learn by using mathematical modeling to understand the interplay between policy and spread of disease? This paper introduces a policy model coupled to a susceptible-infected-recovered (SIR) epidemic model to study interactions between policy-making and the dynamics of epidemics. There have been several studies on the relationship between policies and epidemics <ref type="bibr">[4,</ref><ref type="bibr">5,</ref><ref type="bibr">21,</ref><ref type="bibr">6,</ref><ref type="bibr">18]</ref>. In a study analyzing data from 16 US cities during the 1918 pandemic [5], Bootsma and Ferguson analyzed specific outcomes related to the impact of the delay of lockdown policies on the total deaths and also on the appearance of second waves of outbreaks due to reopening too early. The analysis was done fitting available data to an SEIR model. They also considered optimal control for the simpler SIR and the end-state of the pandemic, noting that there exists an optimal control level with fewer deaths and no second wave. More recently, Bliman et al. that can change continuously in time, which would imply, for example, the ability to shift in three successive instants between no restrictions, perfect "lockdown", and back to no restrictions. As observed during the COVID-19 pandemic, policies that change frequently in time cannot be easily followed. Moreover, policies must be relatively easy to interpret, with a small number of different intensity levels (see Fig. <ref type="figure">3b</ref>). A practical implementation also requires a minimum time duration for a particular stage of the policy. These practical constraints can be modeled together as a piece-wise constant function of time with a minimum time interval for each well-defined policy level (i.e., not continuous). With this idea in mind, we aim to re-examine the optimal practical policy among all possible piece-wise constant policies with minimal time duration.</p><p>Second, Bliman et al. assume that that the only outcome to manage is the final epidemic size. This so-called "impact cost" is clearly a central concern (see below). However, as also seen during the pandemic, there are real trade-offs between decreased infections and the negative impact of strict policies on other aspects of society such as remote learning for young students, employment curtailment in certain job sectors, and lack of key services provided to the public. In the present work, we modify Bliman et al.'s model to take into account these other practical "implementation costs." Specifying a short minimal time interval during which policies must remain constant (e.g., one week), we find our results resemble Bliman et al.'s bang-bang controller [3] despite the more complex cost structure that includes both impact and implementation costs. With a larger minimal time interval during which time policies must remain constant (e.g., 28 days), optimal policies depart from the bang-bang solution.</p><p>Finally, Bliman et al. also assume a pandemic spreading in an single population pool overseen by a single policy-making entity. The reality of the COVID-19 pandemic is that there are policy makers at several (nested) hierarchical scales that oversee different population pools. For example, within the United States, policies may be set at Federal, State, County and local levels, not to mention finer-grained institutional and family scales. And populations at any one scale (e.g., counties) may interact to varying degrees. Inspired by the work of Jia et al. <ref type="bibr">[14]</ref>, we introduce a hierarchical version of Bliman et al.'s model with sequential (Stackleberg) policy-making. Specifically, levels higher in a jurisdictional hierarchy make policy decisions, while levels lower in the hierarchy make their decisions with full knowledge of the policy recommendations from above. We find that a hierarchical structure can make the policies converge in all regions using the right weight for a non-compliance cost.</p><p>The remainder of this paper is organized as follows. We first introduce the work in [3] and reproduce the results using our methods. We discuss how different optimal policies result from different parameter choices for model constraints and costs. Next, we discuss an empirical case study of the so-called "second wave" of the pandemic (November 6th 2020-May 12th 2021) in Los Angeles County, California.</p><p>Last, we use simulation to study optimal control of the pandemic in three counties with and without a governing state as an example of the multi-layer multi-regional model.</p><p>2. Policy model using optimal control. A policy function is a continuous function that has a range of [0, 1]. As the numerical value increases, the strictness of the policy decreases. The Numerical value 0 denotes a total lockdown and 1 denotes no control. We assume a policy u(t) directly influences the level of a lockdown, which affects the rate of the population transport from compartment S to I. We use the following policy-incorporated SIR:</p><p>Like the traditional SIR model, the reproduction number R 0 = &#946; &#947; . Herd immunity occurs when a large proportion of the population has become immune to the infection. Mathematically, it is defined as the value of S below which the number of infected decrease and can be calculated as S herd = N R0 . In [3], a policy u(t) is assumed to belong to the admissible set U &#945;max,T0 defined by</p><p>The constant T 0 characterizes the duration of the policy, and &#945; max its maximal intensity. In [3], Theorem 2.1 states that no finite time intervention is able to stop the epidemics before or exactly at the herd immunity. However, one may stop arbitrarily close to herd immunity by having a sufficiently long intervention of sufficient intensity. To determine the closest state S to this threshold attainable by control of maximal intensity &#945; max on the interval [0, T 0 ], one is led to consider the following optimal control problem:</p><p>(2) Furthermore, Bliman et al. prove the existence and uniqueness of the optimal solution to problem 2 and that the solution is a bang-bang controller (a control that switches from one extreme to the other). More specifically, they have the following theorem:</p><p>with respect to &#945; max and non-decreasing with respect to T 0 .</p><p>(ii) there exists a unique</p><p>1 [T0,+&#8734;) (in particular, the optimal control is bang-bang).</p><p>3. Single region case. We use the same policy-incorporated SIR model for the epidemic dynamic as in <ref type="bibr">[3]</ref>. Instead of minimizing the final epidemic size alone, we adopt a similar policy-making process as in <ref type="bibr">[14]</ref> by using a cost function that takes into account the cost of implementing the policy, the impact of the infection and a penalty for being non-compliant. The latter cost only applies in hierarchical models where a lower-level unit can choose to not follow the policy recommendation of a higher-level unit.</p><p>We also consider practical implementation constraints, namely that the policy can only be implemented using a finite number of discrete levels of control and with a minimal time interval during which a policy must remain constant. As an example, consider the policy implementation in France during the year 2020 and 2021 shown in Fig. <ref type="figure">1</ref> ( <ref type="bibr">[27]</ref>). Implemented policies were discrete both in terms of the small number of intervention types and the fixed time intervals of enforcement, the shortest of which was approximately 15 days in duration, with the longest lasting more than a year. A discrete policy model is realistic given the empirical pattern of real-world interventions. Such a model also simplifies the computation problem of optimal policy discovery by searching through a discrete set of potential policies rather than a continuum of policies.</p><p>3.1. The policy-incorporated SIR model. To model the evolution of the pandemic, we discretize the system of ODE using forward Euler's method with a time (i) the implementation cost, which represents the consequences of policies meant to curtail the pandemic on individuals and the broader economic and social systems.</p><p>(ii) the impact cost, which represents the consequences of people getting sick both on individuals and the broader economic and social systems.</p><p>(iii) the non-compliance cost, which is a penalty imposed by a policy-maker upon an agent within its jurisdiction for deviating from its recommendation (e.g., a fine or litigation costs).</p><p>The implementation cost is a non-increasing function of &#945; and the impact cost function is a non-decreasing function of &#945;. The coefficients &#954;, &#951;, &#954; + &#951; &#8712; [0, 1]. The cost from time t 1 to t 2 is defined as the averaged integral of the cost function over a total time period T :</p><p>There are different ways to parameterize the cost function. In this paper, the cost function is parameterized in the following way:</p><p>where R t2 (u) is the fraction of the recovered population at time t 2 if policy u is adopted during [t 1 , t 2 ] and &#960;(u) is the policy of the agent one level above. The parameterization of the implementation cost and the non-compliance cost are adopted from <ref type="bibr">[14]</ref>. The impact cost is parameterized as the recovered population at time t 2 to approximate the impact on the medical system since a fraction of the recovered represents the hospitalized population. If the cost function u is fixed at constant value &#945; over time interval [t 1 , t 2 ] , the cost can be written as:</p><p>An example of cost functions with different weights using the above parameterization is shown in Fig. <ref type="figure">6</ref>. In our simulation for a single region, we use a averaged total cost over a time period T as the following:</p><p>If at time T , the SIR model has reached the equilibrium, we can use R T (&#945;) to approximate R &#8734; , the fraction of the final size of the recovered population. To find the optimal policy, we solve for the following optimization problem:</p><p>3.3. Algorithm. We discretize time by MPTI &#8710;t and the policy intensity into multiple levels. Let T be the total time and A be the set of possible policy intensities (e.g., A = {0, 0.5, 1}). We search for all the policies that lead to S final being close to S herd , i.e. S final &gt; S herd -&#1013;, for some sufficiently small &#1013; using a depth-first search algorithm <ref type="bibr">[23]</ref>. The depth-first search algorithm stores the cost up to the current time interval and reuses this result to obtain the total cost for each policy function through backtracking. Let N = T &#8710;t and N denote the number of stages of a policy.</p><p>In total, there are |A| N policies. We initialize the minimal cost c min to be 9999.</p><p>Assume the initial susceptible and infected population are S 0 , I 0 , respectively. For n-th time interval (n &lt; N ), we choose a value from the set intensity levels A that has not been used before, calculate the cost for the policy intensity, add it to the previous cost, and calculate the susceptible and the infected at the end of n-th time interval using the chosen intensity. Then we move to (n + 1)-th time interval. If the end time interval is reached, we check if S final &gt; S herd -&#1013;. If yes, we calculate the cost for the final time interval and add it to the previous cost to get the current total cost c. If the total cost c is smaller than c min , we update c min with the total cost c, and the optimal policy u opt with u. Next, we go back to the previous time interval and repeat the same procedure. After searching over all policies, the policy with the lowest cost is the optimal policy. The detailed algorithm is presented in Alg. 1.</p><p>Algorithm 1 Single-region policy SIR 1: Input: Time T , initial infected population I 0 , initial susceptible population S 0 , intensity levels A, minimal policy time interval &#8710;t, policy end time T 0 , Tol &#1013; 2: Initialize county policies, minimal cost c min = 9999, current cost c = 0</p><p>for intensity level &#945; &#8712; A do 10: the general cost function <ref type="bibr">(8)</ref> reduces to the impact cost and is parameterized as the final epidemic size R &#8734; . Bliman et al. assume that the paths considered all reach herd immunity. Therefore, in our search for the optimal policy, we exclude cases that do not reach herd immunity. Note that without this exclusion, the optimal solution is to adopt and hold the strictest possible policy starting from the beginning of the pandemic. This results in the least number of infections. For ease of computation, we consider three levels of policy intensity: 0, 0.5, 1 and fixed time intervals for the MPTI. We use the same set of parameters for the SIR model as in Bliman et al.</p><p>[3]: N = 6.7 &#215; 10 7 , I 0 = 10 3 , S 0 = N -I 0 , R 0 = 2.9. Following [3], we also choose the policy end time T 0 as close as possible to 100, thus setting T 0 = 98 since the time interval needs to be a multiple of the MPTI of 7 days. We show the result our algorithm produces in Fig. <ref type="figure">2a</ref> which we visually compare to the result from [3],</p><p>shown in Fig. <ref type="figure">2b</ref>. Note that we normalized curves by the total population. Both solutions are bang-bang controllers. The solution using our model starts the control on day 63 (a multiple of 7) rather than day 61.9 (continuous). Slightly more people are infected under a policy that is forced to use seven day intervals compared with continuous time as used by Bliman et al.</p><p>Using a larger minimal policy time interval of 28 days and T 0 = 112, the optimal solution is no longer a bang-bang controller, as shown in Fig. <ref type="figure">2c</ref> with a larger S &#8734; = 0.32. The optimal policy starts with a looser "intermediate" policy phase followed by a stricter phase. Interestingly, in practice, during COVID-19 it was common for policies to start with the strictest restrictions followed by partial opening <ref type="bibr">[27,</ref><ref type="bibr">9]</ref>. Thus, it is interesting to contrast the optimal policy with a policy in which the two stages are flipped in time, see Fig. <ref type="figure">2d</ref>. The flipped policy is a sub-optimal solutionit results in a larger pandemic size and a second wave of infections, as was often seen during the first two years of the COVID-19 pandemic. Nevertheless, the policy in Fig. <ref type="figure">2d</ref>, while infecting more people, divides the impacted population into two distinctive waves, which could decrease daily hospital demand over the course of the outbreak. Our policy model does not optimize for hospital demand. Since many public health agencies (including Los Angeles County) considered hospital demand when making policy decisions, it could be important to consider in future studies.</p><p>6.7 &#215; 10 7 10 3 N -I 0 2.9 0.296 B 100 Not applicable 6.7 &#215; 10 7 10 3 N -I 0 2.9 0.31 C 112 28 6.7 &#215; 10 7 10 3 N -I 0 2.9 0.32 D 112 28 6.7 &#215; 10 7 10 3 N -I 0 2.9 0.174</p><p>Table <ref type="table">1</ref>. Parameters.</p><p>1 u(i) represents the i-th entry of vector u.</p><p>has generated a wealth of models and results using mechanistic approaches taking explicitly into account the movement of individuals ( <ref type="bibr">[13,</ref><ref type="bibr">15,</ref><ref type="bibr">22]</ref>). For example, in <ref type="bibr">[22]</ref>, the authors proposed a multi-regional compartmental model using medical geography theory (central place theory) and studied the effect of the travel of individuals (especially those infected and exposed) between regions on the global spread of severe acute respiratory syndrome (SARS). Another way to account for the interplay between regions is to use a cross excitation matrix <ref type="bibr">[28]</ref>. This scheme assumes the a uniform mixing of the population across regions and the infected population in one region can trigger the infection in another. The entries of the matrix records the pair-wise cross excitation from one region to another. In this paper, we assume uniform mixing in the population and use an excitation matrix K = {K aa &#8242; } to model the travel and infections across counties. Our network-style SIR is the following:</p><p>For any county a, the rate of change from S a to I a triggered by I a &#8242; depends on </p><p>where &#954; f + &#951; f = 1 and R f,T (&#945;) is the number of the recovered of region f which is an aggregation of the the epidemic size of its leaf nodes. 4.1. Algorithms. The single region algorithm minimizes over all admissible piecewise functions, while the multiple-region algorithm only minimizes over every time interval. We assume there are up to three layers: federal government, the states, and the counties. At n-th time interval, we first determine the optimal policy intensity that minimizes the cost C f n&#8710;t,(n+1)&#8710;t for the federal layer. After obtaining the optimal federal policy, each state optimizes its own cost function C s n&#8710;t,(n+1)&#8710;t</p><p>for the period [n&#8710;t, (n + 1)&#8710;t] unilaterally, i.e., assuming other states follow their previous policies. Next, we choose the optimal policy intensity for the counties in the same manner. Note that the federal layer does not pay the non-compliance cost as it is not subject to any higher-level policy making. The states and counties may pay a non-compliance cost. The full details of the three-layer model is in Alg. 2. 4.2. Simulations. In this section, we present results for a three-county example of the multiple regions game and a three-county example with a state. First, we discuss when one layer exists (i.e., only counties).</p><p>The game between the counties 0.0 0.2 0.4 0.6 0.8 1.0 Policy intensity &#945; 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Cost &#954;=0.33, &#951;=0.33 &#954;=0.5, &#951;=0.25 &#954;=0.25, &#951;=0.75 &#954;=0.8, &#951;=0.1 &#954;=0.99, &#951;=0.01 Figure 6. Different cost functions vs policy intensity &#945;. Algorithm 2 Game Policy SIR 1: Input: Time T , excitation matrix K, intensity levels A, time interval &#8710;t 2: Initialize state, county policies 3: Number of policy stages N = T &#8710;t , n = 1 4: while n &#8804; N do 5: t = n&#8710;t 6: while t &lt; T do 7: for every state s do 8:</p><p>for every county a in state s do 9: update S a , I a , R a according to the current policy &#945; a and the excitation matrix K: S a (t) = S a (t -1) -&#945; a &#946; a &#8242; K aa &#8242; I a &#8242; (t-1)Sa(t-1) Na 11:</p><p>R a (t) = R a (t -1) + &#947;I a (t -1)</p><p>13: end for 14: end for 15: t += 1 16: end while 17:</p><p>for every state s do 19:</p><p>for every county a in state s do 21:</p><p>end for</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>23:</head><p>end for</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>24:</head><p>n += 1 25: end while is through cross excitation of infection among the counties. Next, we study the case 1 when a governing state is added.</p><p>We consider three interacting counties with the excitation matrix K:</p><p>We set the reproduction number R 0 = 2 and therefore, S herd = 0.5. Counties 1, 2, 3 have initial fractions of the infected population as i 0 = 0.2, 0.1, 0.1, respectively. This implies that county 1 has a bigger outbreak initially, and part of the infection in county 2 is excited from county 1 and part of the infection in county 3 is excited from county 2. The cost functions for all counties consist of an implementation cost and an impact cost with equal weights (&#951; a = &#954; a = 1/2, for all a). The minimal policy time interval &#8710; is set to be 7 (days).</p><p>The left column (Figs. <ref type="figure">7a</ref>, <ref type="figure">7c</ref>, <ref type="figure">7e</ref>) are simulations for the counties without any intervention and the right column (Figs. <ref type="figure">7b</ref>, <ref type="figure">7d</ref>, <ref type="figure">7f</ref>) are simulations with interventions. Without intervention, we see propagation of waves of infection from county 1 to county 2 and then to county 3. All of the counties reached herd immunity eventually. With interventions, policy restrictions started on day 7 and, for county 2 and 3, the infected curves decrease before reaching their peaks. With control, county 1 contained the pandemic and the final S &#8734; is close herd immunity level S herd . With a fewer infected population to begin with, county 2 and 3 contained the pandemic before reaching herd immunity. Fig. <ref type="figure">8</ref> shows the results of adding a governing state on top of the county layer. We keep the ratio of the weights for the implementation cost and the impact cost to be 1:1, the same as in the no-state case in Fig. <ref type="figure">7</ref>. The state has slightly different weights, with the ratio of the weights for the implementation cost and the impact cost being 1:2. Compared to Fig. <ref type="figure">7</ref>, by adding a state, the three counties ended up with the same policy. In this case, the noncompliance cost results in each county choosing the same policy as the state rather than different policies. In the search for an optimal policy, we used a naive depth-first search algorithm for the one-region model. One can speed up the algorithm by removing some of the obvious non-optimal paths. In our model, the policy intensity &#945; is a heuristic representation of the lockdown, social distancing and mask policy. It remains to be discussed how other policies, for example, vaccination policies, affects the spreading in the different stages of the pandemic. The model ignores some of the important features like the limitation of the hospital capacity <ref type="bibr">[24]</ref>, which could be added as constraints when minimizing the cost function. Fig. <ref type="figure">3b</ref> shows the policy for the first wave is proactive while the one for the second wave is reactive. One possible effect is from fatigue of following policy, which increases in time and has a memory. So far, the model does not have the capability of modeling this fatigue. In the future, one could consider an adaptive term in the cost function to model it. The network example considered was rather simplistic, with just three counties within one state. One could consider more complex systems with multiple layers. The computational method here would likely</p></div></body>
		</text>
</TEI>
