skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Evaluating Alarm Classifiers with High-confidence Data Programming
Classification of clinical alarms is at the heart of prioritization, suppression, integration, postponement, and other methods of mitigating alarm fatigue. Since these methods directly affect clinical care, alarm classifiers, such as intelligent suppression systems, need to be evaluated in terms of their sensitivity and specificity, which is typically calculated on a labeled dataset of alarms. Unfortunately, the collection and particularly labeling of such datasets requires substantial effort and time, thus deterring hospitals from investigating mitigations of alarm fatigue. This article develops a lightweight method for evaluating alarm classifiers without perfect alarm labels. The method relies on probabilistic labels obtained from data programming—a labeling paradigm based on combining noisy and cheap-to-obtain labeling heuristics. Based on these labels, the method produces confidence bounds for the sensitivity/specificity values from a hypothetical evaluation with manual labeling. Our experiments on five alarm datasets collected at Children’s Hospital of Philadelphia show that the proposed method provides accurate bounds on the classifier’s sensitivity/specificity, appropriately reflecting the uncertainty from noisy labeling and limited sample sizes.  more » « less
Award ID(s):
1915398
PAR ID:
10461825
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Computing for Healthcare
Volume:
3
Issue:
4
ISSN:
2691-1957
Page Range / eLocation ID:
1 to 24
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. False alarms generated by physiological monitors can overwhelm clinical caretakers with a variety of alarms. The resulting alarm fatigue can be mitigated with alarm suppression. Before being deployed, such suppression mechanisms need to be evaluated through a costly observational study, which would determine and label the truly suppressible alarms. This paper proposes a lightweight method for evaluating alarm suppression without access to the true alarm labels. The method is based on the data programming paradigm, which combines noisy and cheap-to-obtain labeling heuristics into probabilistic labels. Based on these labels, the method estimates the sensitivity/specificity of a suppression mechanism and describes the likely outcomes of an observational study in the form of confidence bounds. We evaluate the proposed method in a case study of low SpO2 alarms using a dataset collected at Children's Hospital of Philadelphia and show that our method provides tight and accurate bounds that significantly outperform the naive comparative method. 
    more » « less
  2. Abstract Alarm fatigue is a complex phenomenon that needs to be assessed within the context of the clinical setting. Considering that complexity, the available information on how to address alarm fatigue and improve alarm system safety is relatively scarce. This article summarizes the state of science in alarm system safety based on the eight dimensions of a sociotechnical model for studying health information technology in complex adaptive healthcare systems. The summary and recommendations were guided by available systematic reviews on the topic, interventional studies published between January 2019 and February 2022, and recommendations and evidence-based practice interventions published by professional organizations. The current article suggests implications to help researchers respond to the gap in science related to alarm safety, help vendors design safe monitoring systems, and help clinical leaders apply evidence-based strategies to improve alarm safety in their settings. Physiologic monitors in intensive care units—the devices most commonly used in complex care environments and associated with the highest number of alarms and deaths—are the focus of the current work. 
    more » « less
  3. A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact. Previous studies have utilized supervised ML techniques that require substantial amounts of hand-labeled data. However, manually harvesting such data can be costly, time-consuming, and mundane, and is a key factor limiting the widespread adoption of ML in healthcare (HC). Instead, we explore the use of multiple, individually imperfect heuristics to automatically assign probabilistic labels to unlabeled training data using weak supervision. Our weakly supervised models perform competitively with traditional supervised techniques and require less involvement from domain experts, demonstrating their use as efficient and practical alternatives to supervised learning in HC applications of ML. 
    more » « less
  4. High false alarm rate in intensive care units (ICUs) has been identified as one of the most critical medical challenges in recent years. This often results in overwhelming the clinical staff by numerous false or unurgent alarms and decreasing the quality of care through enhancing the probability of missing true alarms as well as causing delirium, stress, sleep deprivation and depressed immune systems for patients. One major cause of false alarms in clinical practice is that the collected signals from different devices are processed individually to trigger an alarm, while there exists a considerable chance that the signal collected from one device is corrupted by noise or motion artifacts. In this paper, we propose a low-computational complexity yet accurate game-theoretic feature selection method which is based on a genetic algorithm that identifies the most informative biomarkers across the signals collected from various monitoring devices and can considerably reduce the rate of false alarms. 
    more » « less
  5. Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts. Most existing methods first use the label names as static keyword-based features to generate pseudo labels, which are then used for final classifier training. While reasonable, such a commonly adopted framework suffers from two limitations: (1) keywords can have different meanings in different contexts and some text may not have any keyword, so keyword matching can induce noisy and inadequate pseudo labels; (2) the errors made in the pseudo label generation stage will directly propagate to the classifier training stage without a chance of being corrected. In this paper, we propose a new method, PIEClass, consisting of two modules: (1) a pseudo label acquisition module that uses zero-shot prompting of pre-trained language models (PLM) to get pseudo labels based on contextualized text understanding beyond static keyword matching, and (2) a noise-robust iterative ensemble training module that iteratively trains classifiers and updates pseudo labels by utilizing two PLM fine-tuning methods that regularize each other. Extensive experiments show that PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets and even achieves similar performance to fully-supervised classifiers on sentiment classification tasks. 
    more » « less