skip to main content

Title: Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks
Abstract Background Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread. Methods We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growth model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19. Results We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemble more » method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets. Conclusion Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts. « less
Authors:
;
Award ID(s):
2034003
Publication Date:
NSF-PAR ID:
10273071
Journal Name:
BMC Medical Research Methodology
Volume:
21
Issue:
1
ISSN:
1471-2288
Sponsoring Org:
National Science Foundation
More Like this
  1. Real-time forecasting of non-stationary time series is a challenging problem, especially when the time series evolves rapidly. For such cases, it has been observed that ensemble models consisting of a diverse set of model classes can perform consistently better than individual models. In order to account for the nonstationarity of the data and the lack of availability of training examples, the models are retrained in real-time using the most recent observed data samples. Motivated by the robust performance properties of ensemble models, we developed a Bayesian model averaging ensemble technique consisting of statistical, deep learning, and compartmental models for fore-casting epidemiological signals, specifically, COVID-19 signals. We observed the epidemic dynamics go through several phases (waves). In our ensemble model, we observed that different model classes performed differently during the various phases. Armed with this understanding, in this paper, we propose a modification to the ensembling method to employ this phase information and use different weighting schemes for each phase to produce improved forecasts. However, predicting the phases of such time series is a significant challenge, especially when behavioral and immunological adaptations govern the evolution of the time series. We explore multiple datasets that can serve as leading indicators of trendmore »changes and employ transfer entropy techniques to capture the relevant indicator. We propose a phase prediction algorithm to estimate the phases using the leading indicators. Using the knowledge of the estimated phase, we selectively sample the training data from similar phases. We evaluate our proposed methodology on our currently deployed COVID-19 forecasting model and the COVID-19 ForecastHub models. The overall performance of the proposed model is consistent across the pandemic. More importantly, it is ranked second during two critical rapid growth phases in cases, regimes where the performance of most models from the ForecastHub dropped significantly.« less
  2. Abstract

    We propose a piecewise linear quantile trend model to analyse the trajectory of the COVID-19 daily new cases (i.e. the infection curve) simultaneously across multiple quantiles. The model is intuitive, interpretable and naturally captures the phase transitions of the epidemic growth rate via change-points. Unlike the mean trend model and least squares estimation, our quantile-based approach is robust to outliers, captures heteroscedasticity (commonly exhibited by COVID-19 infection curves) and automatically delivers both point and interval forecasts with minimal assumptions. Building on a self-normalized (SN) test statistic, this paper proposes a novel segmentation algorithm for multiple change-point estimation. Theoretical guarantees such as segmentation consistency are established under mild and verifiable assumptions. Using the proposed method, we analyse the COVID-19 infection curves in 35 major countries and discover patterns with potentially relevant implications for effectiveness of the pandemic responses by different countries. A simple change-adaptive two-stage forecasting scheme is further designed to generate short-term prediction of COVID-19 cumulative new cases and is shown to deliver accurate forecast valuable to public health decision-making.

  3. Abstract Background

    Beginning May 7, 2022, multiple nations reported an unprecedented surge in monkeypox cases. Unlike past outbreaks, differences in affected populations, transmission mode, and clinical characteristics have been noted. With the existing uncertainties of the outbreak, real-time short-term forecasting can guide and evaluate the effectiveness of public health measures.

    Methods

    We obtained publicly available data on confirmed weekly cases of monkeypox at the global level and for seven countries (with the highest burden of disease at the time this study was initiated) from the Our World in Data (OWID) GitHub repository and CDC website. We generated short-term forecasts of new cases of monkeypox across the study areas using an ensemble n-sub-epidemic modeling framework based on weekly cases using 10-week calibration periods. We report and assess the weekly forecasts with quantified uncertainty from the top-ranked, second-ranked, and ensemble sub-epidemic models. Overall, we conducted 324 weekly sequential 4-week ahead forecasts across the models from the week of July 28th, 2022, to the week of October 13th, 2022.

    Results

    The last 10 of 12 forecasting periods (starting the week of August 11th, 2022) show either a plateauing or declining trend of monkeypox cases for all models and areas of study. According to our latest 4-weekmore »ahead forecast from the top-ranked model, a total of 6232 (95% PI 487.8, 12,468.0) cases could be added globally from the week of 10/20/2022 to the week of 11/10/2022. At the country level, the top-ranked model predicts that the USA will report the highest cumulative number of new cases for the 4-week forecasts (median based on OWID data: 1806 (95% PI 0.0, 5544.5)). The top-ranked and weighted ensemble models outperformed all other models in short-term forecasts.

    Conclusions

    Our top-ranked model consistently predicted a decreasing trend in monkeypox cases on the global and country-specific scale during the last ten sequential forecasting periods. Our findings reflect the potential impact of increased immunity, and behavioral modification among high-risk populations.

    « less
  4. Population forecasting, in which past dynamics are used to make predictions of future state, has many real-world applications. While time series of animal abundance are often modeled in ways that aim to capture the underlying biological processes involved, doing so is neither necessary nor sufficient for making good predictions. Here we report on a data science competition focused on modelling time series of Antarctic penguin abundance. We describe the best performing submitted models and compare them to a Bayesian model previously developed by domain experts and build an ensemble model that outperforms the individual component models in prediction accuracy. The top performing models varied tremendously in model complexity, ranging from very simple forward extrapolations of average growth rate to ensembles of models integrating recently developed machine learning techniques. Despite the short time frame for the competition, four of the submitted models outperformed the model previously created by the team of domain experts. We discuss the structure of the best performing models and components therein that might be useful for other ecological applications, the benefit of creating ensembles of models for ecological prediction, and the costs and benefits of including detailed domain expertise in ecological modelling. Additionally, we discuss the benefitsmore »of data science competitions, among which are increased visibility for challenging science questions, the generation of new techniques not yet adopted within the ecological community, and the ability to generate ensemble model forecasts that directly address model uncertainty.« less
  5. Solar flare prediction is a central problem in space weather forecasting and has captivated the attention of a wide spectrum of researchers due to recent advances in both remote sensing as well as machine learning and deep learning approaches. The experimental findings based on both machine and deep learning models reveal significant performance improvements for task specific datasets. Along with building models, the practice of deploying such models to production environments under operational settings is a more complex and often time-consuming process which is often not addressed directly in research settings. We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for ≥M1.0-class flares with two prediction modes: full-disk and active region-based. In full-disk mode, predictions are performed on full-disk line-of-sight magnetograms using deep learning models whereas in active region-based models, predictions are issued for each active region individually using multivariate time series data instances. The outputs from individual active region forecasts and full-disk predictors are combined to a final full-disk prediction result with a meta-model. We utilized an equal weighted average ensemble of two base learners’ flare probabilities as our baseline meta learner and improved the capabilities of our two basemore »learners by training a logistic regression model. The major findings of this study are: 1) We successfully coupled two heterogeneous flare prediction models trained with different datasets and model architecture to predict a full-disk flare probability for next 24 h, 2) Our proposed ensembling model, i.e., logistic regression, improves on the predictive performance of two base learners and the baseline meta learner measured in terms of two widely used metrics True Skill Statistic (TSS) and Heidke Skill Score (HSS), and 3) Our result analysis suggests that the logistic regression-based ensemble (Meta-FP) improves on the full-disk model (base learner) by ∼9% in terms TSS and ∼10% in terms of HSS. Similarly, it improves on the AR-based model (base learner) by ∼17% and ∼20% in terms of TSS and HSS respectively. Finally, when compared to the baseline meta model, it improves on TSS by ∼10% and HSS by ∼15%.« less