skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Context-dependent representation of within- and between-model uncertainty: aggregating probabilistic predictions in infectious disease epidemiology
Probabilistic predictions support public health planning and decision making, especially in infectious disease emergencies. Aggregating outputs from multiple models yields more robust predictions of outcomes and associated uncertainty. While the selection of an aggregation method can be guided by retrospective performance evaluations, this is not always possible. For example, if predictions are conditional on assumptions about how the future will unfold (e.g. possible interventions), these assumptions may never materialize, precluding any direct comparison between predictions and observations. Here, we summarize literature on aggregating probabilistic predictions, illustrate various methods for infectious disease predictions via simulation, and present a strategy for choosing an aggregation method when empirical validation cannot be used. We focus on the linear opinion pool (LOP) and Vincent average, common methods that make different assumptions about between-prediction uncertainty. We contend that assumptions of the aggregation method should align with a hypothesis about how uncertainty is expressed within and between predictions from different sources. The LOP assumes that between-prediction uncertainty is meaningful and should be retained, while the Vincent average assumes that between-prediction uncertainty is akin to sampling error and should not be preserved. We provide an R package for implementation. Given the rising importance of multi-model infectious disease hubs, our work provides useful guidance on aggregation and a deeper understanding of the benefits and risks of different approaches.  more » « less
Award ID(s):
2126278 2028301 2037885
PAR ID:
10409931
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of The Royal Society Interface
Volume:
20
Issue:
198
ISSN:
1742-5662
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Estimating and predicting the state of the atmosphere is a probabilistic problem for which an ensemble modeling approach often is taken to represent uncertainty in the system. Common methods for examining uncertainty and assessing performance for ensembles emphasize pointwise statistics or marginal distributions. However, these methods lose specific information about individual ensemble members. This paper explores contour band depth (cBD), a method of analyzing uncertainty in terms of contours of scalar fields. cBD is fully nonparametric and induces an ordering on ensemble members that leads to box-and-whisker-plot-type visualizations of uncertainty for two-dimensional data. By applying cBD to synthetic ensembles, we demonstrate that it provides enhanced information about the spatial structure of ensemble uncertainty. We also find that the usefulness of the cBD analysis depends on the presence of multiple modes and multiple scales in the ensemble of contours. Finally, we apply cBD to compare various convection-permitting forecasts from different ensemble prediction systems and find that the value it provides in real-world applications compared to standard analysis methods exhibits clear limitations. In some cases, contour boxplots can provide deeper insight into differences in spatial characteristics between the different ensemble forecasts. Nevertheless, identification of outliers using cBD is not always intuitive, and the method can be especially challenging to implement for flow that exhibits multiple spatial scales (e.g., discrete convective cells embedded within a mesoscale weather system). Significance StatementPredictions of Earth’s atmosphere inherently come with some degree of uncertainty owing to incomplete observations and the chaotic nature of the system. Understanding that uncertainty is critical when drawing scientific conclusions or making policy decisions from model predictions. In this study, we explore a method for describing model uncertainty when the quantities of interest are well represented by contours. The method yields a quantitative visualization of uncertainty in both the location and the shape of contours to an extent that is not possible with standard uncertainty quantification methods and may eventually prove useful for the development of more robust techniques for evaluating and validating numerical weather models. 
    more » « less
  2. In order to learn about broad scale ecological patterns, data from large-scale surveys must allow us to either estimate the correlations between the environment and an outcome and/or accurately predict ecological patterns. An important part of data collection is the sampling effort used to collect observations, which we decompose into two quantities: the number of observations or plots ( n ) and the per-observation/plot effort ( E ; e.g., area per plot). If we want to understand the relationships between predictors and a response variable, then lower model parameter uncertainty is desirable. If the goal is to predict a response variable, then lower prediction error is preferable. We aim to learn if and when aggregating data can help attain these goals. We find that a small sample size coupled with large observation effort coupled (few large) can yield better predictions when compared to a large number of observations with low observation effort (many small). We also show that the combination of the two values ( n and E ), rather than one alone, has an impact on parameter uncertainty. In an application to Forest Inventory and Analysis (FIA) data, we model the tree density of selected species at various amounts of aggregation using linear regression in order to compare the findings from simulated data to real data. The application supports the theoretical findings that increasing observational effort through aggregation can lead to improved predictions, conditional on the thoughtful aggregation of the observational plots. In particular, aggregations over extremely large and variable covariate space may lead to poor prediction and high parameter uncertainty. Analyses of large-range data can improve with aggregation, with implications for both model evaluation and sampling design: testing model prediction accuracy without an underlying knowledge of the datasets and the scale at which predictor variables operate can obscure meaningful results. 
    more » « less
  3. Abstract For data assimilation to provide faithful state estimates for dynamical models, specifications of observation uncertainty need to be as accurate as possible. Innovation-based methods based on Desroziers diagnostics, are commonly used to estimate observation uncertainty, but such methods can depend greatly on the prescribed background uncertainty. For ensemble data assimilation, this uncertainty comes from statistics calculated from ensemble forecasts, which require inflation and localization to address under sampling. In this work, we use an ensemble Kalman filter (EnKF) with a low-dimensional Lorenz model to investigate the interplay between the Desroziers method and inflation. Two inflation techniques are used for this purpose: 1) a rigorously tuned fixed multiplicative scheme and 2) an adaptive state-space scheme. We document how inaccuracies in observation uncertainty affect errors in EnKF posteriors and study the combined impacts of misspecified initial observation uncertainty, sampling error, and model error on Desroziers estimates. We find that whether observation uncertainty is over- or underestimated greatly affects the stability of data assimilation and the accuracy of Desroziers estimates and that preference should be given to initial overestimates. Inline estimates of Desroziers tend to remove the dependence between ensemble spread–skill and the initially prescribed observation error. In addition, we find that the inclusion of model error introduces spurious correlations in observation uncertainty estimates. Further, we note that the adaptive inflation scheme is less robust than fixed inflation at mitigating multiple sources of error. Last, sampling error strongly exacerbates existing sources of error and greatly degrades EnKF estimates, which translates into biased Desroziers estimates of observation error covariance. Significance StatementTo generate accurate predictions of various components of the Earth system, numerical models require an accurate specification of state variables at our current time. This step adopts a probabilistic consideration of our current state estimate versus information provided from environmental measurements of the true state. Various strategies exist for estimating uncertainty in observations within this framework, but are sensitive to a host of assumptions, which are investigated in this study. 
    more » « less
  4. Abstract How individuals use space and, thus, the rate and the nature of their interactions with others are shaped by their environment. Exogenous changes that alter aggregation patterns, such as resource pulses, can therefore have a significant impact on seemingly unrelated processes like disease spread. White-tailed deer (Odocoileus virginianus) aggregate in oak forests during mast events, and chronic wasting disease (CWD) transmission patterns vary with deer density, so we hypothesize a link between the masting cycle and CWD dynamics. We investigate various possible effects of masting on deer, including shifts to more frequency-dependent CWD transmission due to aggregation, as well as elevated fecundity and decreased mortality of deer in response to the resource pulse, using a simplified compartment model of CWD spread. When masting affects epidemiological parameters, including the strength of frequency dependence in CWD transmission, disease spread during masting events significantly reduces the size of deer populations but, paradoxically, without any change in the proportion of the population in the CWD-diseased state. In contrast, demographic parameters were found in principle to be capable of altering both population size and disease incidence, though the observed effects were very small. While our quantitative findings should be validated using more detailed models of CWD transmission before they are taken as specific predictions about this system, our fundamental qualitative result appears to be quite general. That is, our conclusion that epidemiological rates only influence population size, but demographic rates may affect both population size and disease incidence, can be derived not only from the model we studied but also from classical epidemiological models as well. Our work extends the understanding of the far-reaching impacts of resource pulses through ecological communities by highlighting the vastly different consequences of the same resource pulse acting in different ways. 
    more » « less
  5. Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for perception-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the predictions. Here, we present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture. Furthermore, our class of networks, Ajna, adds minimal computation and requires only a small change to the loss function while training neural networks to obtain uncertainty of predictions, enabling real-time operation even on resource-constrained robots. In addition, we study the informational cues present in the uncertainties of predicted values and their utility in the unification of common robotics problems. In particular, we present an approach to dodge dynamic obstacles, navigate through a cluttered scene, fly through unknown gaps, and segment an object pile, without computing depth but rather using the uncertainties of optical flow obtained from a monocular camera with onboard sensing and computation. We successfully evaluate and demonstrate the proposed Ajna network on four aforementioned common robotics and computer vision tasks and show comparable results to methods directly using depth. Our work demonstrates a generalized deep uncertainty method and demonstrates its utilization in robotics applications. 
    more » « less