skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Better null models for assessing predictive accuracy of disease models
Null models provide a critical baseline for the evaluation of predictive disease models. Many studies consider only the grand mean null model (i.e. R 2 ) when evaluating the predictive ability of a model, which is insufficient to convey the predictive power of a model. We evaluated ten null models for human cases of West Nile virus (WNV), a zoonotic mosquito-borne disease introduced to the United States in 1999. The Negative Binomial, Historical (i.e. using previous cases to predict future cases) and Always Absent null models were the strongest overall, and the majority of null models significantly outperformed the grand mean. The length of the training timeseries increased the performance of most null models in US counties where WNV cases were frequent, but improvements were similar for most null models, so relative scores remained unchanged. We argue that a combination of null models is needed to evaluate the forecasting performance of predictive models for infectious diseases and the grand mean is the lowest bar.  more » « less
Award ID(s):
1911853
PAR ID:
10424997
Author(s) / Creator(s):
;
Editor(s):
Ansari, Ali R.
Date Published:
Journal Name:
PLOS ONE
Volume:
18
Issue:
5
ISSN:
1932-6203
Page Range / eLocation ID:
e0285215
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Wen, Feng (Ed.)
    Background Since 1999, West Nile virus (WNV) has moved rapidly across the United States, resulting in tens of thousands of human cases. Both the number of human cases and the minimum infection rate (MIR) in vector mosquitoes vary across time and space and are driven by numerous abiotic and biotic forces, ranging from differences in microclimates to socio-demographic factors. Because the interactions among these multiple factors affect the locally variable risk of WNV illness, it has been especially difficult to model human disease risk across varying spatial and temporal scales. Cook and DuPage Counties, comprising the city of Chicago and surrounding suburbs, experience some of the highest numbers of human neuroinvasive cases of WNV in the United States. Despite active mosquito control efforts, there is consistent annual WNV presence, resulting in more than 285 confirmed WNV human cases and 20 deaths from the years 2014–2018 in Cook County alone. Methods A previous Chicago-area WNV model identified the fifty-five most high and low risk locations in the Northwest Mosquito Abatement District (NWMAD), an enclave ¼ the size of the combined Cook and DuPage county area. In these locations, human WNV risk was stratified by model performance, as indicated by differences in studentized residuals. Within these areas, an additional two-years of field collections and data processing was added to a 12-year WNV dataset that includes human cases, MIR, vector abundance, and land-use, historical climate, and socio-economic and demographic variables, and was assessed by an ultra-fine-scale (1 km spatial x 1 week temporal resolution) multivariate logistic regression model. Results Multivariate statistical methods applied to the ultra-fine-scale model identified fewer explanatory variables while improving upon the fit of the previous model. Beyond MIR and climatic factors, efforts to acquire additional covariates only slightly improved model predictive performance. Conclusions These results suggest human WNV illness in the Chicago area may be associated with fewer, but increasingly critical, key variables at finer scales. Given limited resources, these findings suggest large variations in model performance occur, depending on covariate availability, and provide guidance in variable selection for optimal WNV human illness modeling. 
    more » « less
  2. Abstract Background West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental USA. WNV occurrence has high spatiotemporal variation, and current approaches to targeted control of the virus are limited, making forecasting a public health priority. However, little research has been done to compare strengths and weaknesses of WNV disease forecasting approaches on the national scale. We used forecasts submitted to the 2020 WNV Forecasting Challenge, an open challenge organized by the Centers for Disease Control and Prevention, to assess the status of WNV neuroinvasive disease (WNND) prediction and identify avenues for improvement. Methods We performed a multi-model comparative assessment of probabilistic forecasts submitted by 15 teams for annual WNND cases in US counties for 2020 and assessed forecast accuracy, calibration, and discriminatory power. In the evaluation, we included forecasts produced by comparison models of varying complexity as benchmarks of forecast performance. We also used regression analysis to identify modeling approaches and contextual factors that were associated with forecast skill. Results Simple models based on historical WNND cases generally scored better than more complex models and combined higher discriminatory power with better calibration of uncertainty. Forecast skill improved across updated forecast submissions submitted during the 2020 season. Among models using additional data, inclusion of climate or human demographic data was associated with higher skill, while inclusion of mosquito or land use data was associated with lower skill. We also identified population size, extreme minimum winter temperature, and interannual variation in WNND cases as county-level characteristics associated with variation in forecast skill. Conclusions Historical WNND cases were strong predictors of future cases with minimal increase in skill achieved by models that included other factors. Although opportunities might exist to specifically improve predictions for areas with large populations and low or high winter temperatures, areas with high case-count variability are intrinsically more difficult to predict. Also, the prediction of outbreaks, which are outliers relative to typical case numbers, remains difficult. Further improvements to prediction could be obtained with improved calibration of forecast uncertainty and access to real-time data streams (e.g. current weather and preliminary human cases). Graphical Abstract 
    more » « less
  3. Abstract ObjectivesWest Nile virus (WNV) is the most common mosquito-borne disease in the United States. Predicting the location and timing of outbreaks would allow targeting of disease prevention and mosquito control activities. Our objective was to develop software (ArboMAP) for routine WNV forecasting using public health surveillance data and meteorological observations. Materials and MethodsArboMAP was implemented using an R markdown script for data processing, modeling, and report generation. A Google Earth Engine application was developed to summarize and download weather data. Generalized additive models were used to make county-level predictions of WNV cases. ResultsArboMAP minimized the number of manual steps required to make weekly forecasts, generated information that was useful for decision-makers, and has been tested and implemented in multiple public health institutions. Discussion and ConclusionRoutine prediction of mosquito-borne disease risk is feasible and can be implemented by public health departments using ArboMAP. 
    more » « less
  4. null (Ed.)
    Temperature is widely known to influence the spatio-temporal dynamics of vector-borne disease transmission, particularly as temperatures vary across critical thermal thresholds. When temperature conditions exhibit such ‘transcritical variation’, abrupt spatial or temporal discontinuities may result, generating sharp geographical or seasonal boundaries in transmission. Here, we develop a spatio-temporal machine learning algorithm to examine the implications of transcritical variation for West Nile virus (WNV) transmission in the Los Angeles metropolitan area (LA). Analysing a large vector and WNV surveillance dataset spanning 2006–2016, we found that mean temperatures in the previous month strongly predicted the probability of WNV presence in pools of Culex quinquefasciatus mosquitoes, forming distinctive inhibitory (10.0–21.0°C) and favourable (22.7–30.2°C) mean temperature ranges that bound a narrow 1.7°C transitional zone (21–22.7°C). Temperatures during the most intense months of WNV transmission (August/September) were more strongly associated with infection probability in Cx. quinquefasciatus pools in coastal LA, where temperature variation more frequently traversed the narrow transitional temperature range compared to warmer inland locations. This contributed to a pronounced expansion in the geographical distribution of human cases near the coast during warmer-than-average periods. Our findings suggest that transcritical variation may influence the sensitivity of transmission to climate warming, and that especially vulnerable locations may occur where present climatic fluctuations traverse critical temperature thresholds. 
    more » « less
  5. Abstract The strain on healthcare resources brought forth by the recent COVID-19 pandemic has highlighted the need for efficient resource planning and allocation through the prediction of future consumption. Machine learning can predict resource utilization such as the need for hospitalization based on past medical data stored in electronic medical records (EMR). We conducted this study on 3194 patients (46% male with mean age 56.7 (±16.8), 56% African American, 7% Hispanic) flagged as COVID-19 positive cases in 12 centers under Emory Healthcare network from February 2020 to September 2020, to assess whether a COVID-19 positive patient’s need for hospitalization can be predicted at the time of RT-PCR test using the EMR data prior to the test. Five main modalities of EMR, i.e., demographics, medication, past medical procedures, comorbidities, and laboratory results, were used as features for predictive modeling, both individually and fused together using late, middle, and early fusion. Models were evaluated in terms of precision, recall, F1-score (within 95% confidence interval). The early fusion model is the most effective predictor with 84% overall F1-score [CI 82.1–86.1]. The predictive performance of the model drops by 6 % when using recent clinical data while omitting the long-term medical history. Feature importance analysis indicates that history of cardiovascular disease, emergency room visits in the past year prior to testing, and demographic factors are predictive of the disease trajectory. We conclude that fusion modeling using medical history and current treatment data can forecast the need for hospitalization for patients infected with COVID-19 at the time of the RT-PCR test. 
    more » « less