skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: Evidence-driven spatiotemporal COVID-19 hospitalization prediction with Ising dynamics

In this work, we aim to accurately predict the number of hospitalizations during the COVID-19 pandemic by developing a spatiotemporal prediction model. We propose HOIST, an Ising dynamics-based deep learning model for spatiotemporal COVID-19 hospitalization prediction. By drawing the analogy between locations and lattice sites in statistical mechanics, we use the Ising dynamics to guide the model to extract and utilize spatial relationships across locations and model the complex influence of granular information from real-world clinical evidence. By leveraging rich linked databases, including insurance claims, census information, and hospital resource usage data across the U.S., we evaluate the HOIST model on the large-scale spatiotemporal COVID-19 hospitalization prediction task for 2299 counties in the U.S. In the 4-week hospitalization prediction task, HOIST achieves 368.7 mean absolute error, 0.6$${R}^{2}$$R2and 0.89 concordance correlation coefficient score on average. Our detailed number needed to treat (NNT) and cost analysis suggest that future COVID-19 vaccination efforts may be most impactful in rural areas. This model may serve as a resource for future county and state-level vaccination efforts.

more » « less
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    MF-LOGP, a new method for determining a single component octanol–water partition coefficients ($$LogP$$LogP) is presented which uses molecular formula as the only input. Octanol–water partition coefficients are useful in many applications, ranging from environmental fate and drug delivery. Currently, partition coefficients are either experimentally measured or predicted as a function of structural fragments, topological descriptors, or thermodynamic properties known or calculated from precise molecular structures. The MF-LOGP method presented here differs from classical methods as it does not require any structural information and uses molecular formula as the sole model input. MF-LOGP is therefore useful for situations in which the structure is unknown or where the use of a low dimensional, easily automatable, and computationally inexpensive calculations is required. MF-LOGP is a random forest algorithm that is trained and tested on 15,377 data points, using 10 features derived from the molecular formula to make$$LogP$$LogPpredictions. Using an independent validation set of 2713 data points, MF-LOGP was found to have an average$$RMSE$$RMSE= 0.77 ± 0.007,$$MAE$$MAE= 0.52 ± 0.003, and$${R}^{2}$$R2= 0.83 ± 0.003. This performance fell within the spectrum of performances reported in the published literature for conventional higher dimensional models ($$RMSE$$RMSE= 0.42–1.54,$$MAE$$MAE= 0.09–1.07, and$${R}^{2}$$R2= 0.32–0.95). Compared with existing models, MF-LOGP requires a maximum of ten features and no structural information, thereby providing a practical and yet predictive tool. The development of MF-LOGP provides the groundwork for development of more physical prediction models leveraging big data analytical methods or complex multicomponent mixtures.

    Graphical Abstract

    more » « less
  2. Abstract

    Developing prediction models for emerging infectious diseases from relatively small numbers of cases is a critical need for improving pandemic preparedness. Using COVID-19 as an exemplar, we propose a transfer learning methodology for developing predictive models from multi-modal electronic healthcare records by leveraging information from more prevalent diseases with shared clinical characteristics. Our novel hierarchical, multi-modal model ($${\textsc {TransMED}}$$TRANSMED) integrates baseline risk factors from the natural language processing of clinical notes at admission, time-series measurements of biomarkers obtained from laboratory tests, and discrete diagnostic, procedure and drug codes. We demonstrate the alignment of$${\textsc {TransMED}}$$TRANSMED’s predictions with well-established clinical knowledge about COVID-19 through univariate and multivariate risk factor driven sub-cohort analysis.$${\textsc {TransMED}}$$TRANSMED’s superior performance over state-of-the-art methods shows that leveraging patient data across modalities and transferring prior knowledge from similar disorders is critical for accurate prediction of patient outcomes, and this approach may serve as an important tool in the early response to future pandemics.

    more » « less
  3. Abstract

    Extending computational harmonic analysis tools from the classical setting of regular lattices to the more general setting of graphs and networks is very important, and much research has been done recently. The generalized Haar–Walsh transform (GHWT) developed by Irion and Saito (2014) is a multiscale transform for signals on graphs, which is a generalization of the classical Haar and Walsh–Hadamard transforms. We propose theextendedgeneralized Haar–Walsh transform (eGHWT), which is a generalization of the adapted time–frequency tilings of Thiele and Villemoes (1996). The eGHWT examines not only the efficiency of graph-domain partitions but also that of “sequency-domain” partitionssimultaneously. Consequently, the eGHWT and its associated best-basis selection algorithm for graph signals significantly improve the performance of the previous GHWT with the similar computational cost,$$O(N \log N)$$O(NlogN), whereNis the number of nodes of an input graph. While the GHWT best-basis algorithm seeks the most suitable orthonormal basis for a given task among more than$$(1.5)^N$$(1.5)Npossible orthonormal bases in$$\mathbb {R}^N$$RN, the eGHWT best-basis algorithm can find a better one by searching through more than$$0.618\cdot (1.84)^N$$0.618·(1.84)Npossible orthonormal bases in$$\mathbb {R}^N$$RN. This article describes the details of the eGHWT best-basis algorithm and demonstrates its superiority using several examples including genuine graph signals as well as conventional digital images viewed as graph signals. Furthermore, we also show how the eGHWT can be extended to 2D signals and matrix-form data by viewing them as a tensor product of graphs generated from their columns and rows and demonstrate its effectiveness on applications such as image approximation.

    more » « less
  4. Abstract

    We prove multi-point correlation bounds in$$\mathbb {Z}^d$$Zdfor arbitrary$$d\ge 1$$d1with symmetrized distances, answering open questions proposed by Sims–Warzel (Commun Math Phys 347(3):903–931, 2016) and Aza–Bru–Siqueira Pedra (Commun Math Phys 360(2):715–726, 2018). As applications, we prove multi-point correlation bounds for the Ising model on$$\mathbb {Z}^d$$Zd, and multi-point dynamical localization in expectation for uniformly localized disordered systems, which provides the first examples of this conjectured phenomenon by Bravyi–König (Commun Math Phys 316(3):641–692, 2012) .

    more » « less
  5. Abstract

    We present the first unquenched lattice-QCD calculation of the form factors for the decay$$B\rightarrow D^*\ell \nu $$BDνat nonzero recoil. Our analysis includes 15 MILC ensembles with$$N_f=2+1$$Nf=2+1flavors of asqtad sea quarks, with a strange quark mass close to its physical mass. The lattice spacings range from$$a\approx 0.15$$a0.15fm down to 0.045 fm, while the ratio between the light- and the strange-quark masses ranges from 0.05 to 0.4. The valencebandcquarks are treated using the Wilson-clover action with the Fermilab interpretation, whereas the light sector employs asqtad staggered fermions. We extrapolate our results to the physical point in the continuum limit using rooted staggered heavy-light meson chiral perturbation theory. Then we apply a model-independent parametrization to extend the form factors to the full kinematic range. With this parametrization we perform a joint lattice-QCD/experiment fit using several experimental datasets to determine the CKM matrix element$$|V_{cb}|$$|Vcb|. We obtain$$\left| V_{cb}\right| = (38.40 \pm 0.68_{\text {th}} \pm 0.34_{\text {exp}} \pm 0.18_{\text {EM}})\times 10^{-3}$$Vcb=(38.40±0.68th±0.34exp±0.18EM)×10-3. The first error is theoretical, the second comes from experiment and the last one includes electromagnetic and electroweak uncertainties, with an overall$$\chi ^2\text {/dof} = 126/84$$χ2/dof=126/84, which illustrates the tensions between the experimental data sets, and between theory and experiment. This result is in agreement with previous exclusive determinations, but the tension with the inclusive determination remains. Finally, we integrate the differential decay rate obtained solely from lattice data to predict$$R(D^*) = 0.265 \pm 0.013$$R(D)=0.265±0.013, which confirms the current tension between theory and experiment.

    more » « less