skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ensemble machine learning of factors influencing COVID-19 across US counties
Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities.  more » « less
Award ID(s):
2032264
PAR ID:
10329390
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Turner, Richard (Ed.)
    Background With the availability of multiple Coronavirus Disease 2019 (COVID-19) vaccines and the predicted shortages in supply for the near future, it is necessary to allocate vaccines in a manner that minimizes severe outcomes, particularly deaths. To date, vaccination strategies in the United States have focused on individual characteristics such as age and occupation. Here, we assess the utility of population-level health and socioeconomic indicators as additional criteria for geographical allocation of vaccines. Methods and findings County-level estimates of 14 indicators associated with COVID-19 mortality were extracted from public data sources. Effect estimates of the individual indicators were calculated with univariate models. Presence of spatial autocorrelation was established using Moran’s I statistic. Spatial simultaneous autoregressive (SAR) models that account for spatial autocorrelation in response and predictors were used to assess (i) the proportion of variance in county-level COVID-19 mortality that can explained by identified health/socioeconomic indicators (R 2 ); and (ii) effect estimates of each predictor. Adjusting for case rates, the selected indicators individually explain 24%–29% of the variability in mortality. Prevalence of chronic kidney disease and proportion of population residing in nursing homes have the highest R 2 . Mortality is estimated to increase by 43 per thousand residents (95% CI: 37–49; p < 0.001) with a 1% increase in the prevalence of chronic kidney disease and by 39 deaths per thousand (95% CI: 34–44; p < 0.001) with 1% increase in population living in nursing homes. SAR models using multiple health/socioeconomic indicators explain 43% of the variability in COVID-19 mortality in US counties, adjusting for case rates. R 2 was found to be not sensitive to the choice of SAR model form. Study limitations include the use of mortality rates that are not age standardized, a spatial adjacency matrix that does not capture human flows among counties, and insufficient accounting for interaction among predictors. Conclusions Significant spatial autocorrelation exists in COVID-19 mortality in the US, and population health/socioeconomic indicators account for a considerable variability in county-level mortality. In the context of vaccine rollout in the US and globally, national and subnational estimates of burden of disease could inform optimal geographical allocation of vaccines. 
    more » « less
  2. ImportanceMarked elevation in levels of depressive symptoms compared with historical norms have been described during the COVID-19 pandemic, and understanding the extent to which these are associated with diminished in-person social interaction could inform public health planning for future pandemics or other disasters. ObjectiveTo describe the association between living in a US county with diminished mobility during the COVID-19 pandemic and self-reported depressive symptoms, while accounting for potential local and state-level confounding factors. Design, Setting, and ParticipantsThis survey study used 18 waves of a nonprobability internet survey conducted in the United States between May 2020 and April 2022. Participants included respondents who were 18 years and older and lived in 1 of the 50 US states or Washington DC. Main Outcome and MeasureDepressive symptoms measured by the Patient Health Questionnaire-9 (PHQ-9); county-level community mobility estimates from mobile apps; COVID-19 policies at the US state level from the Oxford stringency index. ResultsThe 192 271 survey respondents had a mean (SD) of age 43.1 (16.5) years, and 768 (0.4%) were American Indian or Alaska Native individuals, 11 448 (6.0%) were Asian individuals, 20 277 (10.5%) were Black individuals, 15 036 (7.8%) were Hispanic individuals, 1975 (1.0%) were Pacific Islander individuals, 138 702 (72.1%) were White individuals, and 4065 (2.1%) were individuals of another race. Additionally, 126 381 respondents (65.7%) identified as female and 65 890 (34.3%) as male. Mean (SD) depression severity by PHQ-9 was 7.2 (6.8). In a mixed-effects linear regression model, the mean county-level proportion of individuals not leaving home was associated with a greater level of depression symptoms (β, 2.58; 95% CI, 1.57-3.58) after adjustment for individual sociodemographic features. Results were similar after the inclusion in regression models of local COVID-19 activity, weather, and county-level economic features, and persisted after widespread availability of COVID-19 vaccination. They were attenuated by the inclusion of state-level pandemic restrictions. Two restrictions, mandatory mask-wearing in public (β, 0.23; 95% CI, 0.15-0.30) and policies cancelling public events (β, 0.37; 95% CI, 0.22-0.51), demonstrated modest independent associations with depressive symptom severity. Conclusions and RelevanceIn this study, depressive symptoms were greater in locales and times with diminished community mobility. Strategies to understand the potential public health consequences of pandemic responses are needed. 
    more » « less
  3. COVID-19, known as Coronavirus Disease 2019, is a major health issue resulting from novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Its emergence has posed a significant menace to the global medical community and healthcare system across the world. Notably, on December 12, 2020, the Food and Drug Administration (FDA) approved the utilization of the Pfizer and Moderna COVID-19 vaccines. As of July 31, 2022, the United Stated has witnessed over 91.3 million cases of COVID-19 and nearly 1.03 million fatalities. An intriguing observation is the recent reduction in the mortality rate of COVID-19, attributed to an augmented focus on early detection, comprehensive screening, and widespread vaccination. Despite this positive trend in some demographics, it is noteworthy that the overall incidence rates of COVID-19 among African American and Hispanic populations have continued to escalate, even as mortality rates have decreased. Therefore, the objective of this research study is to present an overview of COVID-19, spotlighting the disparities among different racial and ethnic groups. It also delves into the management of COVID-19 within the minority populations. To reach our research objective, we used a publicly available COVID-19 dataset from kaggle: https://www.kaggle.com/datasets/paultimothymooney/covid19-casesand- deaths-by-race. In addition, we obtained COVID-19 datasets from 10 different states with the highest proportion of African American populations. Many considerable strikes have been made in COVID-19. However, success rate of treatment in the African American population remains relatively limited when compared to other ethnic groups. Hence, there arises a pressing need for novel strategies and innovative approaches to not only encourage prevention measures against COVID-19, but also to increase survival rates, diminish mortality rates, and ultimately improve the health outcomes of ethnic and racial minorities. 
    more » « less
  4. Goller, Carlos C. (Ed.)
    ABSTRACT The global spread of the novel coronavirus first reported in December 2019 led to drastic changes in the social and economic dynamics of everyday life. Nationwide, racial, gender, and geographic disparities in symptom severity, mortality, and access to health care evolved, which impacted stress and anxiety surrounding COVID-19. On university campuses, drastic shifts in learning environments occurred as universities shifted to remote instruction, which further impacted student mental health and anxiety. Our study aimed to understand how students from diverse backgrounds differ in their worry and stress surrounding COVID-19 upon return to hybrid or in-person classes during the Fall of 2020. Specifically, we addressed the differences in COVID-19 worry, stress response, and COVID-19-related food insecurity related to race/ethnicity (Indigenous American, Asian/Asian American, black/African American, Latinx/Hispanic, white, or multiple races), gender (male, female, and gender expressive), and geographic origin (ranging from rural to large metropolitan areas) of undergraduate students attending a regional-serving R2 university, in the southeastern U.S. Overall, we found significance in worry, food insecurity, and stress responses with females and gender expressive individuals, along with Hispanic/Latinx, Asian/Asian American, and black/African American students. Additionally, students from large urban areas were more worried about contracting the virus compared to students from rural locations. However, we found fewer differences in self-reported COVID-related stress responses within these students. Our findings can highlight the disparities among students’ worry based on gender, racial differences, and geographic origins, with potential implications for mental health of university students from diverse backgrounds. Our results support the inclusion of diverse voices in university decisioning making around the transition through the COVID-19 pandemic. 
    more » « less
  5. null (Ed.)
    Mixed evidence on the relationship between school closure and COVID-19 prevalence could reflect focus on large-scale levels of geography, limited ability to address endogeneity, and demographic variation. Using county-level Centers for Disease Control and Prevention (CDC) COVID-19 data through June 15, 2020, two matching strategies address potential heterogeneity: nearest geographic neighbor and propensity scores. Within nearest neighboring pairs in different states with different school closure timing, each additional day from a county’s first case until state-ordered school closure is related to 1.5 to 2.4 percent higher cumulative COVID-19 deaths per capita (1,227–1,972 deaths for a county with median population and deaths/capita). Results are consistent using propensity score matching, COVID-19 data from two alternative sources, and additional sensitivity analyses. School closure is more strongly related to COVID-19 deaths in counties with a high concentration of Black or poor residents, suggesting schools play an unequal role in transmission and earlier school closure is related to fewer lives lost in disadvantaged counties. 
    more » « less