skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Unraveling the dynamic importance of county-level features in trajectory of COVID-19
Abstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread usually focused on the case prediction with different infection rates without incorporating multiple heterogeneous features that could impact the spatial and temporal trajectory of COVID-19. Recognizing this, we trained a data-driven model using 23 features representing six key influencing factors affecting the pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aimed to answer two research questions: (1) The extent to which the importance of heterogeneous features evolved at different stages; (2) The extent to which the importance of heterogeneous features varied across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across 2787 studied counties; (2) Within-county mobility features had the highest importance in counties with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance for counties with higher population densities. The results showed that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and at different stages of a pandemic life cycle.  more » « less
Award ID(s):
2026814
PAR ID:
10319983
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract The objective of this study is to examine the transmission risk of COVID-19 based on cross-county population co-location data from Facebook. The rapid spread of COVID-19 in the United States has imposed a major threat to public health, the real economy, and human well-being. With the absence of effective vaccines, the preventive actions of social distancing, travel reduction and stay-at-home orders are recognized as essential non-pharmacologic approaches to control the infection and spatial spread of COVID-19. Prior studies demonstrated that human movement and mobility drove the spatiotemporal distribution of COVID-19 in China. Little is known, however, about the patterns and effects of co-location reduction on cross-county transmission risk of COVID-19. This study utilizes Facebook co-location data for all counties in the United States from March to early May 2020 for conducting spatial network analysis where nodes represent counties and edge weights are associated with the co-location probability of populations of the counties. The analysis examines the synchronicity and time lag between travel reduction and pandemic growth trajectory to evaluate the efficacy of social distancing in ceasing the population co-location probabilities, and subsequently the growth in weekly new cases across counties. The results show that the mitigation effects of co-location reduction appear in the growth of weekly new confirmed cases with one week of delay. The analysis categorizes counties based on the number of confirmed COVID-19 cases and examines co-location patterns within and across groups. Significant segregation is found among different county groups. The results suggest that within-group co-location probabilities (e.g., co-location probabilities among counties with high numbers of cases) remain stable, and social distancing policies primarily resulted in reduced cross-group co-location probabilities (due to travel reduction from counties with large number of cases to counties with low numbers of cases). These findings could have important practical implications for local governments to inform their intervention measures for monitoring and reducing the spread of COVID-19, as well as for adoption in future pandemics. Public policy, economic forecasting, and epidemic modeling need to account for population co-location patterns in evaluating transmission risk of COVID-19 across counties. 
    more » « less
  2. null (Ed.)
    The COVID-19 pandemic severely changed the way of life in the United States (US). From early scattered regional outbreaks to current country-wide spread, and from rural areas to highly populated cities, the contagion exhibits diverse patterns at various timescales and locations. We thus conduct a graph frequency analysis to inves- tigate the spread patterns of COVID-19 in different US counties. The commute flows between all 3142 US counties were used to construct a graph capturing the population mobility. The numbers of daily confirmed COVID-19 cases per county were collected and represented as graph signals, which were then mapped into the frequency domain via the graph Fourier transform. The concept of graph frequency in Graph Signal Processing (GSP) enables the decomposition of graph signals (i.e., daily confirmed cases) into modes with smooth or rapid variations with respect to the underlying mobility graph. These different modes of variability are shown to relate to COVID-19 spread patterns within and across counties. Changes in the nature of spread within geographical regions are also revealed by graph frequency analysis at finer temporal scales. Overall, our GSP-based approach leverages case count and mobility data to unveil spatio-temporal contagion patterns of COVID-19 incidence for each US county. Results here support the promising prospect of using GSP tools for epidemiology knowledge discovery on graphs. 
    more » « less
  3. null (Ed.)
    Understanding the dynamics of the spread of COVID-19 between connected communities is fundamental in planning appropriate mitigation measures. To that end, we propose and analyze a novel metapopulation network model, particularly suitable for modeling commuter traffic patterns, that takes into account the connectivity between a heterogeneous set of communities, each with its own infection dynamics. In the novel metapopulation model that we propose here, transport schemes developed in optimal transport theory provide an efficient and easily implementable way of describing the temporary population redistribution due to traffic, such as the daily commuter traffic between work and residence. Locally, infection dynamics in individual communities are described in terms of a susceptible-exposed-infected-recovered (SEIR) compartment model, modified to account for the specific features of COVID-19, most notably its spread by asymptomatic and presymptomatic infected individuals. The mathematical foundation of our metapopulation network model is akin to a transport scheme between two population distributions, namely the residential distribution and the workplace distribution, whose interface can be inferred from commuter mobility data made available by the US Census Bureau. We use the proposed metapopulation model to test the dynamics of the spread of COVID-19 on two networks, a smaller one comprising 7 counties in the Greater Cleveland area in Ohio, and a larger one consisting of 74 counties in the Pittsburgh–Cleveland–Detroit corridor following the Lake Erie’s American coastline. The model simulations indicate that densely populated regions effectively act as amplifiers of the infection for the surrounding, less densely populated areas, in agreement with the pattern of infections observed in the course of the COVID-19 pandemic. Computed examples show that the model can be used also to test different mitigation strategies, including one based on state-level travel restrictions, another on county level triggered social distancing, as well as a combination of the two. 
    more » « less
  4. The COVID-19 pandemic is a global threat presenting health, economic, and social challenges that continue to escalate. Metapopulation epidemic modeling studies in the susceptible–exposed–infectious–removed (SEIR) style have played important roles in informing public health policy making to mitigate the spread of COVID-19. These models typically rely on a key assumption on the homogeneity of the population. This assumption certainly cannot be expected to hold true in real situations; various geographic, socioeconomic, and cultural environments affect the behaviors that drive the spread of COVID-19 in different communities. What’s more, variation of intracounty environments creates spatial heterogeneity of transmission in different regions. To address this issue, we develop a human mobility flow-augmented stochastic SEIR-style epidemic modeling framework with the ability to distinguish different regions and their corresponding behaviors. This modeling framework is then combined with data assimilation and machine learning techniques to reconstruct the historical growth trajectories of COVID-19 confirmed cases in two counties in Wisconsin. The associations between the spread of COVID-19 and business foot traffic, race and ethnicity, and age structure are then investigated. The results reveal that, in a college town (Dane County), the most important heterogeneity is age structure, while, in a large city area (Milwaukee County), racial and ethnic heterogeneity becomes more apparent. Scenario studies further indicate a strong response of the spread rate to various reopening policies, which suggests that policy makers may need to take these heterogeneities into account very carefully when designing policies for mitigating the ongoing spread of COVID-19 and reopening. 
    more » « less
  5. Acharya, Binod (Ed.)
    This study compares pandemic experiences of Missouri’s 115 counties based on rurality and sociodemographic characteristics during the 1918–20 influenza and 2020–21 COVID-19 pandemics. The state’s counties and overall population distribution have remained relatively stable over the last century, which enables identification of long-lasting pandemic attributes. Sociodemographic data available at the county level for both time periods were taken from U.S. census data and used to create clusters of similar counties. Counties were also grouped by rural status (RSU), including fully (100%) rural, semirural (1–49% living in urban areas), and urban (>50% of the population living in urban areas). Deaths from 1918 through 1920 were collated from the Missouri Digital Heritage database and COVID-19 cases and deaths were downloaded from the Missouri COVID-19 dashboard. Results from sociodemographic analyses indicate that, during both time periods, average farm value, proportion White, and literacy were the most important determinants of sociodemographic clusters. Furthermore, the Urban/Central and Southeastern regions experienced higher mortality during both pandemics than did the North and South. Analyses comparing county groups by rurality indicated that throughout the 1918–20 influenza pandemic, urban counties had the highest and rural had the lowest mortality rates. Early in the 2020–21 COVID-19 pandemic, urban counties saw the most extensive epidemic spread and highest mortality, but as the epidemic progressed, cumulative mortality became highest in semirural counties. Additional results highlight the greater effects both pandemics had on county groups with lower rates of education and a lower proportion of Whites in the population. This was especially true for the far southeastern counties of Missouri (“the Bootheel”) during the COVID-19 pandemic. These results indicate that rural-urban and socioeconomic differences in health outcomes are long-standing problems that continue to be of significant importance, even though the overall quality of health care is substantially better in the 21 st century. 
    more » « less