In situ sensors that collect high-frequency data are used increasingly to monitor aquatic environments. These sensors are prone to technical errors, resulting in unrecorded observations and/or anomalous values that are subsequently removed and create gaps in time series data. We present a framework based on generalized additive and auto-regressive models to recover these missing data. To mimic sporadically missing (i) single observations and (ii) periods of contiguous observations, we randomly removed (i) point data and (ii) day- and week-long sequences of data from a two-year time series of nitrate concentration data collected from Arikaree River, USA, where synoptically collected water temperature, turbidity, conductance, elevation, and dissolved oxygen data were available. In 72% of cases with missing point data, predicted values were within the sensor precision interval of the original value, although predictive ability declined when sequences of missing data occurred. Precision also depended on the availability of other water quality covariates. When covariates were available, even a sudden, event-based peak in nitrate concentration was reconstructed well. By providing a promising method for accurate prediction of missing data, the utility and confidence in summary statistics and statistical trends will increase, thereby assisting the effective monitoring and management of fresh waters and other at-risk ecosystems.
more »
« less
Evaluating the Impact of Uncertainty on Risk Prediction: Towards More Robust Prediction Models
Risk prediction models are crucial for assessing the pretest probability of cancer and are applied to stratify patient management strategies. These models are frequently based on multivariate regression analysis, requiring that all risk factors be specified, and do not convey the confidence in their predictions. We present a framework for uncertainty analysis that accounts for variability in input values. Uncertain or missing values are replaced with a range of plausible values. These ranges are used to compute individualized risk confidence intervals. We demonstrate our approach using the Gail model to evaluate the impact of uncertainty on management decisions. Up to 13% of cases (uncertain) had a risk interval that falls within the decision threshold (e.g., 1.67% 5-year absolute risk). A small number of cases changed from low- to high-risk when missing values were present. Our analysis underscores the need for better communication of input assumptions that influence the resulting predictions.
more »
« less
- Award ID(s):
- 1722516
- PAR ID:
- 10064294
- Date Published:
- Journal Name:
- AMIA ... Annual Symposium proceedings
- ISSN:
- 1559-4076
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
BACKGROUND Timely interventions, such as antibiotics and intravenous fluids, have been associated with reduced mortality in patients with sepsis. Artificial intelligence (AI) models that accurately predict risk of sepsis onset could speed the delivery of these interventions. Although sepsis models generally aim to predict its onset, clinicians might recognize and treat sepsis before the sepsis definition is met. Predictions occurring after sepsis is clinically recognized (i.e., after treatment begins) may be of limited utility. Researchers have not previously investigated the accuracy of sepsis risk predictions that are made before treatment begins. Thus, we evaluate the discriminative performance of AI sepsis predictions made throughout a hospitalization relative to the time of treatment. METHODS We used a large retrospective inpatient cohort from the University of Michigan’s academic medical center (2018–2020) to evaluate the Epic sepsis model (ESM). The ability of the model to predict sepsis, both before sepsis criteria are met and before indications of treatment plans for sepsis, was evaluated in terms of the area under the receiver operating characteristic curve (AUROC). Indicators of a treatment plan were identified through electronic data capture and included the receipt of antibiotics, fluids, blood culture, and/or lactate measurement. The definition of sepsis was a composite of the Centers for Disease Control and Prevention’s surveillance criteria and the severe sepsis and septic shock management bundle definition. RESULTS The study included 77,582 hospitalizations. Sepsis occurred in 3766 hospitalizations (4.9%). ESM achieved an AUROC of 0.62 (95% confidence interval [CI], 0.61 to 0.63) when including predictions before sepsis criteria were met and in some cases, after clinical recognition. When excluding predictions after clinical recognition, the AUROC dropped to 0.47 (95% CI, 0.46 to 0.48). CONCLUSIONS We evaluate a sepsis risk prediction model to measure its ability to predict sepsis before clinical recognition. Our work has important implications for future work in model development and evaluation, with the goal of maximizing the clinical utility of these models. (Funded by Cisco Research and others.)more » « less
-
ABSTRACT Conformal predictions transform a measurable, heuristic notion of uncertainty into statistically valid confidence intervals such that, for a future sample, the true class prediction will be included in the conformal prediction set at a predetermined confidence. In a Bayesian perspective, common estimates of uncertainty in multivariate classification, namelyp‐values, only provide the probability that the data fits the presumed class model,P(D|M). Conformal predictions, on the other hand, address the more meaningful probability that a model fits the data,P(M|D). Herein, two methods to perform inductive conformal predictions are investigated—the traditional Split Conformal Prediction that uses an external calibration set and a novel Bagged Conformal Prediction, closely related to Cross Conformal Predictions, that utilizes bagging to calibrate the heuristic notions of uncertainty. Methods for preprocessing the conformal prediction scores to improve performance are discussed and investigated. These conformal prediction strategies are applied to identifying four non‐steroidal anti‐inflammatory drugs (NSAIDs) from hyperspectral Raman imaging data. In addition to assigning meaningful confidence intervals on the model results, we herein demonstrate how conformal predictions can add additional diagnostics for model quality and method stability.more » « less
-
Abstract Estimating uncertainty in flood model predictions is important for many applications, including risk assessment and flood forecasting. We focus on uncertainty in physics‐based urban flooding models. We consider the effects of the model's complexity and uncertainty in key input parameters. The effect of rainfall intensity on the uncertainty in water depth predictions is also studied. As a test study, we choose the Interconnected Channel and Pond Routing (ICPR) model of a part of the city of Minneapolis. The uncertainty in the ICPR model's predictions of the floodwater depth is quantified in terms of the ensemble variance using the multilevel Monte Carlo (MC) simulation method. Our results show that uncertainties in the studied domain are highly localized. Model simplifications, such as disregarding the groundwater flow, lead to overly confident predictions, that is, predictions that are both less accurate and uncertain than those of the more complex model. We find that for the same number of uncertain parameters, increasing the model resolution reduces uncertainty in the model predictions (and increases the MC method's computational cost). We employ the multilevel MC method to reduce the cost of estimating uncertainty in a high‐resolution ICPR model. Finally, we use the ensemble estimates of the mean and covariance of the flood depth for real‐time flood depth forecasting using the physics‐informed Gaussian process regression method. We show that even with few measurements, the proposed framework results in a more accurate forecast than that provided by the mean prediction of the ICPR model.more » « less
-
We focus on an efficient approach for quantification of uncertainty in complex chemical reaction networks with a large number of uncertain parameters and input conditions. Parameter dimension reduction is accomplished by computing an active subspace that predominantly captures the variability in the quantity of interest (QoI). In the present work, we compute the active subspace for a H2/O2 mechanism that involves 19 chemical reactions, using an efficient iterative strategy. The active subspace is first computed for a 19-parameter problem wherein only the uncertainty in the pre-exponents of the individual reaction rates us considered. This is followed by the analysis of a 36-dimensional case wherein the activation energies and initial conditions are also considered uncertain. In both cases, a 1-dimensional active subspace is observed to capture the uncertainty in the QoI, which indicates enormous potential for efficient statistical analysis of complex chemical systems. In addition, we explore links between active subspaces and global sensitivity analysis, and exploit these links for identification of key contributors to the variability in the model response.more » « less
An official website of the United States government

