skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 11, 2026

Title: A Spatio-temporal Cluster-aware Supervised Learning Framework for Predicting County-level Drug Overdose Deaths
The soaring drug overdose crisis in the United States has claimed more than half a million lives in the past decade and remains a major public health threat. The ability to predict drug overdose deaths at the county level can help local communities develop action plans in response to emerging changes. Applying off-the-shelf machine learning algorithms for prediction can be challenging due to the heterogeneous risk profiles of the counties and suppressed data in common publicly available data sources. To fill these gaps, we develop a cluster-aware supervised learning (CASL) framework to enhance the prediction of county-level drug overdose deaths. This CASL model simultaneously clusters counties into groups based on geographical and socioeconomic characteristics and minimizes the loss function that accounts for suppressed values and cluster-specific regularization. Our computational study uses real-world data from 2010 to 2021, focusing on the ten states most severely impacted by the drug overdose crisis. The results demonstrate that our proposed CASL framework significantly outperforms state-of-the-art methods by achieving a superior balance in prediction accuracy for both unsuppressed and suppressed observations. The proposed model also identifies different clusters of counties, capturing heterogeneous patterns of overdose mortality among counties of diverse characteristics.  more » « less
Award ID(s):
2240409 2240408
PAR ID:
10609842
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of the AAAI Conference on Artificial Intelligence
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
39
Issue:
27
ISSN:
2159-5399
Page Range / eLocation ID:
27978 to 27988
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Galea, Sandro (Ed.)
    Abstract The drug-overdose crisis in the United States continues to intensify. Fatalities have increased 5-fold since 1999 reaching a record high of 108,000 deaths in 2021. The epidemic has unfolded through distinct waves of different drug types, uniquely impacting various age, gender, race, and ethnic groups in specific geographical areas. One major challenge in designing interventions and efficiently delivering treatment is forecasting age-specific overdose patterns at the local level. To address this need, we develop a forecasting method that assimilates observational data obtained from the CDC WONDER database with an age-structured model of addiction and overdose mortality. We apply our method nationwide and to three select areas: Los Angeles County, Cook County, and the five boroughs of New York City, providing forecasts of drug-overdose mortality and estimates of relevant epidemiological quantities, such as mortality and age-specific addiction rates. 
    more » « less
  2. Midlife non-Hispanic white mortality in the United States is rising, particularly in small metro and rural counties. This article responds to calls for county-level studies. We examine social determinants of morbidity and mortality among adult non-Hispanic whites in Yavapai County, Arizona, as part of an integrative study. We report overall mortality trends in Yavapai County using CDC Wonder data and then examine social determinants of reported physical health and mental distress in Yavapai County data using 6 years (2011–2016) of the Arizona Behavioral Risk Factor Surveillance System (BRFSS). The BRFSS includes 1,024 non-Hispanic white respondents aged 25–64. We also present data from the recently established Yavapai County Overdose Fatality Review Board (YCOFRB). Mortality trends indicate that suicide and drug and alcoholrelated mortality have all increased since 1999. These increases affect all 5-year age groups from 25 to 64 and both men and women. BRFSS data show that low education and unemployment, but not number of children or home ownership, are significantly associated with worse reported health and frequent mental distress in multivariate analyses. The YCOFRB point to the importance of homelessness and mental health. The mortality crisis in Yavapai County is not restricted to midlife or to drug-related deaths. The unemployed and those with low levels of education are particularly at risk. There is a need for integrative approaches that use local data to elucidate social determinants of morbidity and mortality and to reveal structural determinants. 
    more » « less
  3. Abstract Drug overdose deaths continue to increase in the United States for all major drug categories. Over the past two decades the total number of overdose fatalities has increased more than fivefold; since 2013 the surge in overdose rates is primarily driven by fentanyl and methamphetamines. Different drug categories and factors such as age, gender, and ethnicity are associated with different overdose mortality characteristics that may also change in time. For example, the average age at death from a drug overdose has decreased from 1940 to 1990 while the overall mortality rate has steadily increased. To provide insight into the population-level dynamics of drug overdose mortality, we develop an age-structured model for drug addiction. Using an augmented ensemble Kalman filter (EnKF), we show through a simple example how our model can be combined with synthetic observation data to estimate mortality rate and an age-distribution parameter. Finally, we use an EnKF to combine our model with observation data on overdose fatalities in the United States from 1999 to 2020 to forecast the evolution of overdose trends and estimate model parameters. 
    more » « less
  4. Abstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread usually focused on the case prediction with different infection rates without incorporating multiple heterogeneous features that could impact the spatial and temporal trajectory of COVID-19. Recognizing this, we trained a data-driven model using 23 features representing six key influencing factors affecting the pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aimed to answer two research questions: (1) The extent to which the importance of heterogeneous features evolved at different stages; (2) The extent to which the importance of heterogeneous features varied across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across 2787 studied counties; (2) Within-county mobility features had the highest importance in counties with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance for counties with higher population densities. The results showed that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and at different stages of a pandemic life cycle. 
    more » « less
  5. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less