skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on September 13, 2025

Title: Spatiotemporal Forecasting of Opioid-Related Fatal Overdoses: Towards Best Practices for Modeling and Evaluation
To inform public health interventions, researchers have developed models to forecast opioid-related overdose mortality. These efforts often have limited overlap in the models and datasets employed, presenting challenges to assessing progress in this field. Furthermore, common error-based performance metrics, such as root mean squared error (RMSE), cannot directly assess a key modeling purpose: the identification of priority areas for interventions. We recommend a new intervention-aware performance metric, Percentage of Best Possible Reach (%BPR). We compare metrics for many published models across two distinct geographic settings, Cook County, Illinois and Massachusetts, assuming the budget to intervene in 100 census tracts out of 1000s in each setting. The top-performing models based on RMSE recommend areas that do not always reach the most possible overdose events. In Massachusetts, the top models preferred by %BPR could have reached 18 additional fatal overdoses per year in 2020-2021 compared to models favored by RMSE. In Cook County, the different metrics select similar top-performing models, yet other models with similar RMSE can have significant variation in %BPR. We further find that simple models often perform as well as recently published ones. We release open code and data for others to build upon.  more » « less
Award ID(s):
2338962 1908617 2149871
PAR ID:
10569557
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
American Journal of Epidemiology
ISSN:
0002-9262
Subject(s) / Keyword(s):
predictive modeling, spatiotemporal forecasting, machine learning, opioid, overdose, public health practice
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Galea, Sandro (Ed.)
    Abstract

    The drug-overdose crisis in the United States continues to intensify. Fatalities have increased 5-fold since 1999 reaching a record high of 108,000 deaths in 2021. The epidemic has unfolded through distinct waves of different drug types, uniquely impacting various age, gender, race, and ethnic groups in specific geographical areas. One major challenge in designing interventions and efficiently delivering treatment is forecasting age-specific overdose patterns at the local level. To address this need, we develop a forecasting method that assimilates observational data obtained from the CDC WONDER database with an age-structured model of addiction and overdose mortality. We apply our method nationwide and to three select areas: Los Angeles County, Cook County, and the five boroughs of New York City, providing forecasts of drug-overdose mortality and estimates of relevant epidemiological quantities, such as mortality and age-specific addiction rates.

     
    more » « less
  2. Abstract

    An inadequate characterization of hydrogeological properties can significantly decrease the trustworthiness of subsurface flow and transport model predictions. A variety of data assimilation methods have been proposed in order to estimate hydrogeological parameters from spatially scarce data by incorporating them into the governing physical models. In order to quantify the accuracy of the estimations, several metrics have been used such as Rank Histograms, root‐mean‐square error (RMSE), and Ensemble Spread. However, these commonly used metrics do not regard the spatial correlation of the aquifer's properties. This can cause permeability fields with very different spatial structures to have similar histograms or RMSE. In this paper, we propose an approach based on color coherence vectors (CCV) for evaluating the performance of these estimation methods. CCV is a histogram‐based technique for comparing images that incorporate spatial information. We represent estimated fields as digital three‐channel images and use CCV to compare and quantify the accuracy of estimations. The appealing feature of this technique is that it considers the spatial structure embedded in the estimated fields. The sensitivity of CCV to spatial information makes it a suitable metric for assessing the performance of data assimilation techniques. Under various factors, such as numbers of measurements and structural parameters of the log conductivity field, we compare the performance of CCV with the RMSE.

     
    more » « less
  3. Abstract

    An intelligent sensing framework using Machine Learning (ML) and Deep Learning (DL) architectures to precisely quantify dielectrophoretic force invoked on microparticles in a textile electrode-based DEP sensing device is reported. The prediction accuracy and generalization ability of the framework was validated using experimental results. Images of pearl chain alignment at varying input voltages were used to build deep regression models using modified ML and CNN architectures that can correlate pearl chain alignment patterns of Saccharomyces cerevisiae(yeast) cells and polystyrene microbeads to DEP force. Various ML models such as K-Nearest Neighbor, Support Vector Machine, Random Forest, Neural Networks, and Linear Regression along with DL models such as Convolutional Neural Network (CNN) architectures of AlexNet, ResNet-50, MobileNetV2, and GoogLeNet have been analyzed in order to build an effective regression framework to estimate the force induced on yeast cells and microbeads. The efficiencies of the models were evaluated using Mean Absolute Error, Mean Absolute Relative, Mean Squared Error, R-squared, and Root Mean Square Error (RMSE) as evaluation metrics. ResNet-50 with RMSPROP gave the best performance, with a validation RMSE of 0.0918 on yeast cells while AlexNet with ADAM optimizer gave the best performance, with a validation RMSE of 0.1745 on microbeads. This provides a baseline for further studies in the application of deep learning in DEP aided Lab-on-Chip devices.

     
    more » « less
  4. Abstract

    Grid independence studies have emerged as essential methodological frameworks for comprehending the impact of domain resolution on simulating anisotropic turbulence at the river‐reach scale using large eddy simulation models. This study proposes a methodology to assess the loss of information in turbulent flow patterns when coarsening the computational domain, examined in a 1‐km transect of the Colorado River along Marble Canyon. Seven computational domain resolutions are explored to analyse the sensitivity of turbulent flow to spatial resolution changes, utilizing the turbulent kinetic energy (TKE) spectrum technique and spatiotemporal analysis of eddy structures via statistical metrics such as root mean square error (RMSE), Kullback‐Leibler (KL) divergence, Nash‐Sutcliffe model efficiency coefficient (NSE), wavelet power spectrum and grid convergence index (GCI). Based on physical principles and statistics, these metrics quantify information loss and assess domain resolutions. A computational fluid dynamic (CFD) model is developed by employing the detached eddy simulation (DES) technique, with boundary condition (BC) integrating the rough wall extension of the Spallart‐Allmaras model in cells near the bed. Evaluation of domain resolutions aims to identify grid cell sizes capturing flow behaviour and hydraulic characteristics, including primary and secondary flows, return currents, shear layers and primary and secondary eddies. The study observes an increase in data representation of the TKE spectrum with finer spatial domain resolution. Additionally, surface analysis, conducted via RMSE, KL and NSE metrics, identifies specific areas within the flow field showing high sensitivity to refining the grid cell sizes.

     
    more » « less
  5. Abstract

    Microbial ecological functions are an emergent property of community composition. For some ecological functions, this link is strong enough that community composition can be used to estimate the quantity of an ecological function. Here, we apply random forest regression models to compare the predictive performance of community composition and environmental data for bacterial production (BP). Using data from two independent long-term ecological research sites—Palmer LTER in Antarctica and Station SPOT in California—we found that community composition was a strong predictor of BP. The top performing model achieved an R2 of 0.84 and RMSE of 20.2 pmol L−1 hr−1 on independent validation data, outperforming a model based solely on environmental data (R2 = 0.32, RMSE = 51.4 pmol L−1 hr−1). We then operationalized our top performing model, estimating BP for 346 Antarctic samples from 2015 to 2020 for which only community composition data were available. Our predictions resolved spatial trends in BP with significance in the Antarctic (P value = 1 × 10−4) and highlighted important taxa for BP across ocean basins. Our results demonstrate a strong link between microbial community composition and microbial ecosystem function and begin to leverage long-term datasets to construct models of BP based on microbial community composition.

     
    more » « less