Accurate numerical and physical models play an important role in modeling the spread of infectious disease as well as informing policy decisions. Vaccination programs rely on the estimation of disease parameters from limited, error-prone reported data. Using physics-informed neural networks (PINNs) as universal function approximators of the susceptible-infected-recovered (SIR) compartmentalized differential equation model, we create a data-driven framework that uses reported data to estimate disease spread and approximate corresponding disease parameters. We apply this to datafrom a London boarding school, demonstrating the framework's ability to produce accurate disease and parameter estimations despite noisy data. However, real-world populations contain sub-populations, each exhibiting different levels of risk and activity. Thus, we expand our framework to model meta-populations of preferentially-mixed subgroups with various contact rates, introducing a new substitution to decrease the number of parameters. Optimal parameters are estimated throughPINNs which are then used in a negative gradient approach to calculate an optimal vaccine distribution plan for informed policy decisions. We also manipulate a new hyperparameter in the loss function of the PINNs network to expedite training. Together, our work creates a data-driven tool for future infectious disease vaccination efforts in heterogeneously mixed populations.
- PAR ID:
- 10387948
- Editor(s):
- Mourlas, cotas; Pacheco, Diego; Pandi, Catia
- Date Published:
- Journal Name:
- GoodIT 2022: {ACM} International Conference on Information Technology for Social Good, Limassol, Cyprus, September 7 - 9, 2022
- Page Range / eLocation ID:
- 125 to 131
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The spread dynamics of long-distance-dispersed pathogens are influenced by the dispersal characteristics of a pathogen, anisotropy due to multiple factors, and the presence of multiple sources of inoculum. In this research, we developed a flexible class of phenomenological spatio-temporal models that extend a modeling framework used in plant pathology applications to account for the presence of multiple sources and anisotropy of biological species that can govern disease gradients and spatial spread in time. We use the cucurbit downy mildew pathosystem (caused by Pseudoperonospora cubensis ) to formulate a data-driven procedure based on the 2008 to 2010 historical occurrence of the disease in the U.S. available from standardized sentinel plots deployed as part of the Cucurbit Downy Mildew ipmPIPE program. This pathosystem is characterized by annual recolonization and extinction cycles, generating annual disease invasions at the continental scale. This data-driven procedure is amenable to fitting models of disease spread from one or multiple sources of primary inoculum and can be specified to provide estimates of the parameters by regression methods conditional on a function that can accommodate anisotropy in disease occurrence data. Applying this modeling framework to the cucurbit downy mildew data sets, we found a small but consistent reduction in temporal prediction errors by incorporating anisotropy in disease spread. Further, we did not find evidence of an annually occurring, alternative source of P. cubensis in northern latitudes. However, we found a signal indicating an alternative inoculum source on the western edge of the Gulf of Mexico. This modeling framework is tractable for estimating the generalized location and velocity of a disease front from sparsely sampled data with minimal data acquisition costs. These attributes make this framework applicable and useful for a broad range of ecological data sets where multiple sources of disease may exist and whose subsequent spread is directional.more » « less
-
Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper.more » « less
-
null (Ed.)
Mathematical models are widely recognized as an important tool for analyzing and understanding the dynamics of infectious disease outbreaks, predict their future trends, and evaluate public health intervention measures for disease control and elimination. We propose a novel stochastic metapopulation state-space model for COVID-19 transmission, which is based on a discrete-time spatio-temporal susceptible, exposed, infected, recovered, and deceased (SEIRD) model. The proposed framework allows the hidden SEIRD states and unknown transmission parameters to be estimated from noisy, incomplete time series of reported epidemiological data, by application of unscented Kalman filtering (UKF), maximum-likelihood adaptive filtering, and metaheuristic optimization. Experiments using both synthetic data and real data from the Fall 2020 COVID-19 wave in the state of Texas demonstrate the effectiveness of the proposed model.
-
null (Ed.)Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432more » « less