Prediction and prevention of pandemics via graphical model inference and convex programming
Abstract Hard-to-predict bursts of COVID-19 pandemic revealed significance of statistical modeling which would resolve spatio-temporal correlations over geographical areas, for example spread of the infection over a city with census tract granularity. In this manuscript, we provide algorithmic answers to the following two inter-related public health challenges of immense social impact which have not been adequately addressed (1) Inference Challenge assuming that there are N census blocks (nodes) in the city, and given an initial infection at any set of nodes, e.g. any N of possible single node infections, any $$N(N-1)/2$$ N ( N - 1 ) / 2 of possible two node infections, etc, what is the probability for a subset of census blocks to become infected by the time the spread of the infection burst is stabilized? (2) Prevention Challenge What is the minimal control action one can take to minimize the infected part of the stabilized state footprint? To answer the challenges, we build a Graphical Model of pandemic of the attractive Ising (pair-wise, binary) type, where each node represents a census tract and each edge factor represents the strength of the pairwise interaction between a pair of nodes, e.g. representing the inter-node travel, road closure and related, and each local bias/field represents the community level of immunization, acceptance of the social distance and mask wearing practice, etc. Resolving the Inference Challenge requires finding the Maximum-A-Posteriory (MAP), i.e. most probable, state of the Ising Model constrained to the set of initially infected nodes. (An infected node is in the $$+ \, 1$$ + 1 state and a node which remained safe is in the $$- \, 1$$ - 1 state.) We show that almost all attractive Ising Models on dense graphs result in either of the two possibilities (modes) for the MAP state: either all nodes which were not infected initially became infected, or all the initially uninfected nodes remain uninfected (susceptible). This bi-modal solution of the Inference Challenge allows us to re-state the Prevention Challenge as the following tractable convex programming : for the bare Ising Model with pair-wise and bias factors representing the system without prevention measures, such that the MAP state is fully infected for at least one of the initial infection patterns, find the closest, for example in $$l_1$$ l 1 , $$l_2$$ l 2 or any other convexity-preserving norm, therefore prevention-optimal, set of factors resulting in all the MAP states of the Ising model, with the optimal prevention measures applied, to become safe. We have illustrated efficiency of the scheme on a quasi-realistic model of Seattle. Our experiments have also revealed useful features, such as sparsity of the prevention solution in the case of the $$l_1$$ l 1 norm, and also somehow unexpected features, such as localization of the sparse prevention solution at pair-wise links which are NOT these which are most utilized/traveled.  more » « less
Award ID(s):
NSF-PAR ID:
10332258
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Scientific Reports
Volume:
12
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
National Science Foundation
##### More Like this
1. The epsilon-approximate degree, deg_epsilon(f), of a Boolean function f is the least degree of a real-valued polynomial that approximates f pointwise to within epsilon. A sound and complete certificate for approximate degree being at least k is a pair of probability distributions, also known as a dual polynomial, that are perfectly k-wise indistinguishable, but are distinguishable by f with advantage 1 - epsilon. Our contributions are: - We give a simple, explicit new construction of a dual polynomial for the AND function on n bits, certifying that its epsilon-approximate degree is Omega (sqrt{n log 1/epsilon}). This construction is the first to extend to the notion of weighted degree, and yields the first explicit certificate that the 1/3-approximate degree of any (possibly unbalanced) read-once DNF is Omega(sqrt{n}). It draws a novel connection between the approximate degree of AND and anti-concentration of the Binomial distribution. - We show that any pair of symmetric distributions on n-bit strings that are perfectly k-wise indistinguishable are also statistically K-wise indistinguishable with at most K^{3/2} * exp (-Omega (k^2/K)) error for all k < K <= n/64. This bound is essentially tight, and implies that any symmetric function f is a reconstruction function with constant advantage for a ramp secret sharing scheme that is secure against size-K coalitions with statistical error K^{3/2} * exp (-Omega (deg_{1/3}(f)^2/K)) for all values of K up to n/64 simultaneously. Previous secret sharing schemes required that K be determined in advance, and only worked for f=AND. Our analysis draws another new connection between approximate degree and concentration phenomena. As a corollary of this result, we show that for any d <= n/64, any degree d polynomial approximating a symmetric function f to error 1/3 must have coefficients of l_1-norm at least K^{-3/2} * exp ({Omega (deg_{1/3}(f)^2/d)}). We also show this bound is essentially tight for any d > deg_{1/3}(f). These upper and lower bounds were also previously only known in the case f=AND.
more » « less
2. Abstract Each year, bovine respiratory disease (BRD) results in significant economic loss in the cattle sector, and novel metabolic profiling for early diagnosis represents a promising tool for developing effective measures for disease management. Here, 1 H-nuclear magnetic resonance ( 1 H-NMR) spectra were used to characterize metabolites from blood plasma collected from male dairy calves (n = 10) intentionally infected with two of the main BRD causal agents, bovine respiratory syncytial virus (BRSV) and Mannheimia haemolytica (MH), to generate a well-defined metabolomic profile under controlled conditions. In response to infection, 46 metabolites (BRSV = 32, MH = 33) changed in concentration compared to the uninfected state. Fuel substrates and products exhibited a particularly strong effect, reflecting imbalances that occur during the immune response. Furthermore, 1 H-NMR spectra from samples from the uninfected and infected stages were discriminated with an accuracy, sensitivity, and specificity ≥ 95% using chemometrics to model the changes associated with disease, suggesting that metabolic profiles can be used for further development, understanding, and validation of novel diagnostic tools.
more » « less
3. (Ed.)
Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also on the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require knowing several anchor items that are DIF-free in order to draw inferences on whether each of the rest is a DIF item, where the anchor items are used to identify the latent trait distributions. When no prior information on anchor items is available, or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure, and the latter selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals andp-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts a minimal$$L_1$$${L}_{1}$norm condition for identifying the latent trait distributions. Without requiring prior knowledge about an anchor set, it can accurately estimate the DIF effects of individual items and further draw valid statistical inferences for quantifying the uncertainty. Specifically, the inference results allow us to control the type-I error for DIF detection, which may not be possible with item purification and regularized estimation methods. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is applied to analysing the three personality scales of the Eysenck personality questionnaire-revised (EPQ-R).