skip to main content

This content will become publicly available on December 1, 2022

Title: Ensemble machine learning of factors influencing COVID-19 across US counties
Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic more » started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities. « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Scientific Reports
Sponsoring Org:
National Science Foundation
More Like this
  1. Turner, Richard (Ed.)
    Background With the availability of multiple Coronavirus Disease 2019 (COVID-19) vaccines and the predicted shortages in supply for the near future, it is necessary to allocate vaccines in a manner that minimizes severe outcomes, particularly deaths. To date, vaccination strategies in the United States have focused on individual characteristics such as age and occupation. Here, we assess the utility of population-level health and socioeconomic indicators as additional criteria for geographical allocation of vaccines. Methods and findings County-level estimates of 14 indicators associated with COVID-19 mortality were extracted from public data sources. Effect estimates of the individual indicators were calculated withmore »univariate models. Presence of spatial autocorrelation was established using Moran’s I statistic. Spatial simultaneous autoregressive (SAR) models that account for spatial autocorrelation in response and predictors were used to assess (i) the proportion of variance in county-level COVID-19 mortality that can explained by identified health/socioeconomic indicators (R 2 ); and (ii) effect estimates of each predictor. Adjusting for case rates, the selected indicators individually explain 24%–29% of the variability in mortality. Prevalence of chronic kidney disease and proportion of population residing in nursing homes have the highest R 2 . Mortality is estimated to increase by 43 per thousand residents (95% CI: 37–49; p < 0.001) with a 1% increase in the prevalence of chronic kidney disease and by 39 deaths per thousand (95% CI: 34–44; p < 0.001) with 1% increase in population living in nursing homes. SAR models using multiple health/socioeconomic indicators explain 43% of the variability in COVID-19 mortality in US counties, adjusting for case rates. R 2 was found to be not sensitive to the choice of SAR model form. Study limitations include the use of mortality rates that are not age standardized, a spatial adjacency matrix that does not capture human flows among counties, and insufficient accounting for interaction among predictors. Conclusions Significant spatial autocorrelation exists in COVID-19 mortality in the US, and population health/socioeconomic indicators account for a considerable variability in county-level mortality. In the context of vaccine rollout in the US and globally, national and subnational estimates of burden of disease could inform optimal geographical allocation of vaccines.« less
  2. Goller, Carlos C. (Ed.)
    ABSTRACT The global spread of the novel coronavirus first reported in December 2019 led to drastic changes in the social and economic dynamics of everyday life. Nationwide, racial, gender, and geographic disparities in symptom severity, mortality, and access to health care evolved, which impacted stress and anxiety surrounding COVID-19. On university campuses, drastic shifts in learning environments occurred as universities shifted to remote instruction, which further impacted student mental health and anxiety. Our study aimed to understand how students from diverse backgrounds differ in their worry and stress surrounding COVID-19 upon return to hybrid or in-person classes during the Fallmore »of 2020. Specifically, we addressed the differences in COVID-19 worry, stress response, and COVID-19-related food insecurity related to race/ethnicity (Indigenous American, Asian/Asian American, black/African American, Latinx/Hispanic, white, or multiple races), gender (male, female, and gender expressive), and geographic origin (ranging from rural to large metropolitan areas) of undergraduate students attending a regional-serving R2 university, in the southeastern U.S. Overall, we found significance in worry, food insecurity, and stress responses with females and gender expressive individuals, along with Hispanic/Latinx, Asian/Asian American, and black/African American students. Additionally, students from large urban areas were more worried about contracting the virus compared to students from rural locations. However, we found fewer differences in self-reported COVID-related stress responses within these students. Our findings can highlight the disparities among students’ worry based on gender, racial differences, and geographic origins, with potential implications for mental health of university students from diverse backgrounds. Our results support the inclusion of diverse voices in university decisioning making around the transition through the COVID-19 pandemic.« less
  3. The U.S. has merely 4% of the world population, but contains 25% of the world’s COVID-19 cases. Since the COVID-19 outbreak in the U.S., Massachusetts has been leading other states in the total number of COVID-19 cases. Racial residential segregation is a fundamental cause of racial disparities in health. Moreover, disparities of access to health care have a large impact on COVID-19 cases. Thus, this study estimates racial segregation and disparities in testing site access and employs economic, demographic, and transportation variables at the city/town level in Massachusetts. Spatial regression models are applied to evaluate the relationships between COVID-19 incidencemore »rate and related variables. This is the first study to apply spatial analysis methods across neighborhoods in the U.S. to examine the COVID-19 incidence rate. The findings are: (1) Residential segregations of Hispanic and Non-Hispanic Black/African Americans have a significantly positive association with COVID-19 incidence rate, indicating the higher susceptibility of COVID-19 infections among minority groups. (2) Non-Hispanic Black/African Americans have the shortest drive time to testing sites, followed by Hispanic, Non-Hispanic Asians, and Non-Hispanic Whites. The drive time to testing sites is significantly negatively associated with the COVID-19 incidence rate, implying the importance of the accessibility of testing sites by all populations. (3) Poverty rate and road density are significant explanatory variables. Importantly, overcrowding represented by more than one person per room is a significant variable found to be positively associated with COVID-19 incidence rate, suggesting the effectiveness of social distancing for reducing infection. (4) Different from the findings of previous studies, the elderly population rate is not statistically significantly correlated with the incidence rate because the elderly population in Massachusetts is less distributed in the hotspot regions of COVID-19 infections. The findings in this study provide useful insights for policymakers to propose new strategies to contain the COVID-19 transmissions in Massachusetts.« less
  4. Objective: To identify differences in short-term outcomes of patients with coronavirus disease 2019 (COVID-19) according to various racial/ethnic groups.Design: Analysis of Cerner de-identified COVID-19 dataset.Setting: A total of 62 health care facilities.Participants: The cohort included 49,277 adult COVID-19 patients who were hospitalized from December 1, 2019 to November 13, 2020.Methods: We compared patients’ age, gender, individual components of Charl­son and Elixhauser comorbidities, medical complications, use of do-not-resuscitate, use of palliative care, and socioeconomic status between various racial and/or ethnic groups. We further compared the rates of in-hos­pital mortality and non-routine discharges between various racial and/or ethnic groups.Main Outcome Measures: Themore »primary outcome of interest was in-hospital mortali­ty. The secondary outcome was non-routine discharge (discharge to destinations other than home, such as short-term hospitals or other facilities including intermediate care and skilled nursing homes).Results: Compared with White patients, in-hospital mortality was significantly higher among African American (OR 1.5; 95%CI:1.3-1.6, P<.001), Hispanic (OR1.4; 95%CI:1.3-1.6, P<.001), and Asian or Pacific Islander (OR 1.5; 95%CI: 1.1-1.9, P=.002) patients after adjustment for age and gender, Elixhauser comorbidities, do-not-resuscitate status, palliative care use, and socioeconomic status.Conclusions: Our study found that, among hospitalized patients with COVID-2019, African American, Hispanic, and Asian or Pacific Islander patients had increased mortality compared with White patients after adjusting for sociodemographic factors, comorbidities, and do-not-resuscitate/pallia­tive care status. Our findings add additional perspective to other recent studies. Ethn Dis. 2021;31(3):389-398; doi:10.18865/ed.31.3.389« less
  5. Abstract Background No versatile web app exists that allows epidemiologists and managers around the world to comprehensively analyze the impacts of COVID-19 mitigation. The web app presented here fills this gap. Methods Our web app uses a model that explicitly identifies susceptible, contact, latent, asymptomatic, symptomatic and recovered classes of individuals, and a parallel set of response classes, subject to lower pathogen-contact rates. The user inputs a CSV file of incidence and, if of interest, mortality rate data. A default set of parameters is available that can be overwritten through input or online entry, and a user-selected subset ofmore »these can be fitted to the model using maximum-likelihood estimation (MLE). Model fitting and forecasting intervals are specifiable and changes to parameters allow counterfactual and forecasting scenarios. Confidence or credible intervals can be generated using stochastic simulations, based on MLE values, or on an inputted CSV file containing Markov chain Monte Carlo (MCMC) estimates of one or more parameters. Results We illustrate the use of our web app in extracting social distancing, social relaxation, surveillance or virulence switching functions (i.e., time varying drivers) from the incidence and mortality rates of COVID-19 epidemics in Israel, South Africa, and England. The Israeli outbreak exhibits four distinct phases: initial outbreak, social distancing, social relaxation, and a second wave mitigation phase. An MCMC projection of this latter phase suggests the Israeli epidemic will continue to produce into late November an average of around 1500 new case per day, unless the population practices social-relaxation measures at least 5-fold below the level in August, which itself is 4-fold below the level at the start of July. Our analysis of the relatively late South African outbreak that became the world’s fifth largest COVID-19 epidemic in July revealed that the decline through late July and early August was characterised by a social distancing driver operating at more than twice the per-capita applicable-disease-class (pc-adc) rate of the social relaxation driver. Our analysis of the relatively early English outbreak, identified a more than 2-fold improvement in surveillance over the course of the epidemic. It also identified a pc-adc social distancing rate in early August that, though nearly four times the pc-adc social relaxation rate, appeared to barely contain a second wave that would break out if social distancing was further relaxed. Conclusion Our web app provides policy makers and health officers who have no epidemiological modelling or computer coding expertise with an invaluable tool for assessing the impacts of different outbreak mitigation policies and measures. This includes an ability to generate an epidemic-suppression or curve-flattening index that measures the intensity with which behavioural responses suppress or flatten the epidemic curve in the region under consideration.« less