skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Uncertainty Estimation Framework for Risk Assessment in Deep Learning-based Atrial Fibrillation Classification
Atrial Fibrillation (AF) is among one of the most common types of heart arrhythmia afflicting more than 3 million people in the U.S. alone. AF is estimated to be the cause of death of 1 in 4 individuals. Recent advancements in Artificial Intelligence (AI) algorithms have led to the capability of reliably detecting AF from ECG signals. While these algorithms can accurately detect AF with high precision, the discrete and deterministic classifications mean that these networks are likely to erroneously classify the given ECG signal. This paper proposes a variational autoencoder classifier network that provides an uncertainty estimation of the network's output in addition to reliable classification accuracy. This framework can increase physicians' trust in using AI-based AF detection algorithms by providing them with a confidence score which reflects how uncertain the algorithm is about a case and recommending them to put more attention to the cases with a lower confidence score. The uncertainty is estimated by conducting multiple passes of the input through the network to build a distribution; the mean of the standard deviations is reported as the network's uncertainty. Our proposed network obtains 97.64% accuracy in addition to reporting the uncertainty.  more » « less
Award ID(s):
1657260
PAR ID:
10233266
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Asilomar Conference on Signals, Systems, and Computers
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundAtrial fibrillation (AF) is often asymptomatic and thus under-observed. Given the high risks of stroke and heart failure among patients with AF, early prediction and effective management are crucial. Importantly, obstructive sleep apnea is highly prevalent among AF patients (60–90%); therefore, electrocardiogram (ECG) analysis from polysomnography (PSG), a standard diagnostic tool for subjects with suspected sleep apnea, presents a unique opportunity for the early prediction of AF. Our goal is to identify individuals at a high risk of developing AF in the future from a single-lead ECG recorded during standard PSGs. MethodsWe analyzed 18,782 single-lead ECG recordings from 13,609 subjects at Massachusetts General Hospital, identifying AF presence using ICD-9/10 codes in medical records. Our dataset comprises 15,913 recordings without a medical record for AF and 2,056 recordings from patients who were first diagnosed with AF between 1 day to 15 years after the PSG recording. The PSG data were partitioned into training, validation, and test cohorts. In the first phase, a signal quality index (SQI) was calculated in 30-second windows and those with SQI<0.95 were removed. From each remaining window, 150 hand-crafted features were extracted from time, frequency, time-frequency domains, and phase-space reconstructions of the ECG. A compilation of 12 statistical features summarized these window-specific features per recording, resulting in 1,800 features. We then updated a pre-trained deep neural network and data from the PhysioNet Challenge 2021 using transfer-learning to discriminate between recordings with and without AF using the same Challenge data. The model was applied to the PSG ECGs in 16-second windows to generate the probability of AF for each window. From the resultant probability sequence, 13 statistical features were extracted. Subsequently, we trained a shallow neural network to predict future AF using the extracted ECG and probability features. ResultsOn the test set, our model demonstrated a sensitivity of 0.67, specificity of 0.81, and precision of 0.3 for predicting AF. Further, survival analysis for AF outcomes, using the log-rank test, revealed a hazard ratio of 8.36 (p-value of 1.93 × 10−52). ConclusionsOur proposed ECG analysis method, utilizing overnight PSG data, shows promise in AF prediction despite a modest precision indicating the presence of false positive cases. This approach could potentially enable low-cost screening and proactive treatment for high-risk patients. Ongoing refinement, such as integrating additional physiological parameters could significantly reduce false positives, enhancing its clinical utility and accuracy. 
    more » « less
  2. Keathley, H.; Enos, J.; Parrish, M. (Ed.)
    The role of human-machine teams in society is increasing, as big data and computing power explode. One popular approach to AI is deep learning, which is useful for classification, feature identification, and predictive modeling. However, deep learning models often suffer from inadequate transparency and poor explainability. One aspect of human systems integration is the design of interfaces that support human decision-making. AI models have multiple types of uncertainty embedded, which may be difficult for users to understand. Humans that use these tools need to understand how much they should trust the AI. This study evaluates one simple approach for communicating uncertainty, a visual confidence bar ranging from 0-100%. We perform a human-subject online experiment using an existing image recognition deep learning model to test the effect of (1) providing single vs. multiple recommendations from the AI and (2) including uncertainty information. For each image, participants described the subject in an open textbox and rated their confidence in their answers. Performance was evaluated at four levels of accuracy ranging from the same as the image label to the correct category of the image. The results suggest that AI recommendations increase accuracy, even if the human and AI have different definitions of accuracy. In addition, providing multiple ranked recommendations, with or without the confidence bar, increases operator confidence and reduces perceived task difficulty. More research is needed to determine how people approach uncertain information from an AI system and develop effective visualizations for communicating uncertainty. 
    more » « less
  3. The use of Artificial Intelligence (AI) decision support is increasing in high-stakes contexts, such as healthcare, defense, and finance. Uncertainty information may help users better leverage AI predictions, especially when combined with their domain knowledge. We conducted a human-subject experiment with an online sample to examine the effects of presenting uncertainty information with AI recommendations. The experimental stimuli and task, which included identifying plant and animal images, are from an existing image recognition deep learning model, a popular approach to AI. The uncertainty information was predicted probabilities for whether each label was the true label. This information was presented numerically and visually. In the study, we tested the effect of AI recommendations in a within-subject comparison and uncertainty information in a between-subject comparison. The results suggest that AI recommendations increased both participants’ accuracy and confidence. Further, providing uncertainty information significantly increased accuracy but not confidence, suggesting that it may be effective for reducing overconfidence. In this task, participants tended to have higher domain knowledge for animals than plants based on a self-reported measure of domain knowledge. Participants with more domain knowledge were appropriately less confident when uncertainty information was provided. This suggests that people use AI and uncertainty information differently, such as an expert versus second opinion, depending on their level of domain knowledge. These results suggest that if presented appropriately, uncertainty information can potentially decrease overconfidence that is induced by using AI recommendations. 
    more » « less
  4. Inter-beat interval (IBI) measurement enables estimation of heart-tare variability (HRV) which, in turn, can provide early indication of potential cardiovascular diseases (CVDs). However, extracting IBIs from noisy signals is challenging since the morphology of the signal gets distorted in the presence of noise. Electrocardiogram (ECG) of a person in heavy motion is highly corrupted with noise, known as motion-artifact, and IBI extracted from it is inaccurate. As a part of remote health monitoring and wearable system development, denoising ECG signals and estimating IBIs correctly from them have become an emerging topic among signal-processing researchers. Apart from conventional methods, deep-learning techniques have been successfully used in signal denoising recently, and diagnosis process has become easier, leading to accuracy levels that were previously unachievable. We propose a deep-learning approach leveraging tiramisu autoencoder model to suppress motion-artifact noise and make the R-peaks of the ECG signal prominent even in the presence of high-intensity motion. After denoising, IBIs are estimated more accurately expediting diagnosis tasks. Results illustrate that our method enables IBI estimation from noisy ECG signals with SNR up to -30 dB with average root mean square error (RMSE) of 13 milliseconds for estimated IBIs. At this noise level, our error percentage remains below 8% and outperforms other state-of-the-art techniques. 
    more » « less
  5. Discourse parsing has proven to be useful for a number of NLP tasks that require complex reasoning. However, over a decade since the advent of the Penn Discourse Treebank, predicting implicit discourse relations in text remains challenging. There are several possible reasons for this, and we hypothesize that models should be exposed to more context as it plays an important role in accurate human annotation; meanwhile adding uncertainty measures can improve model accuracy and calibration. To thoroughly investigate this phenomenon, we perform a series of experiments to determine 1) the effects of context on human judgments, and 2) the effect of quantifying uncertainty with annotator confidence ratings on model accuracy and calibration (which we measure using the Brier score (Brier et al, 1950)). We find that including annotator accuracy and confidence improves model accuracy, and incorporating confidence in the model’s temperature function can lead to models with significantly better-calibrated confidence measures. We also find some insightful qualitative results regarding human and model behavior on these datasets. 
    more » « less