NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TransEHR: Self-Supervised Transformer for Clinical Time Series Data

Xu, Yanbo; Xu, Shangqing; Ramprassad, Manav; Tumanov, Alexey; Zhang, Chao (December 2023, Proceedings of Machine Learning Research)

Deep neural networks, including the Transformer architecture, have achieved remarkable performance in various time series tasks. However, their effectiveness in handling clinical time series data is hindered by specific challenges: 1) Sparse event sequences collected asynchronously with multivariate time series, and 2) Limited availability of labeled data. To address these challenges, we propose Our code is available at https://github.com/SigmaTsing/TransEHR.git . , a self-supervised Transformer model designed to encode multi-sourced asynchronous sequential data, such as structured Electronic Health Records (EHRs), efficiently. We introduce three pretext tasks for pre-training the Transformer model, utilizing large amounts of unlabeled structured EHR data, followed by fine-tuning on downstream prediction tasks using the limited labeled data. Through extensive experiments on three real-world health datasets, we demonstrate that our model achieves state-of-the-art performance on benchmark clinical tasks, including in-hospital mortality classification, phenotyping, and length-of-stay prediction. Our findings highlight the efficacy of in effectively addressing the challenges associated with clinical time series data, thus contributing to advancements in healthcare analytics.
more » « less
Full Text Available
SMURF-THP: score matching-based uncertainty quantification for transformer Hawkes process

Li, Zichong; Xu, Yanbo; Zuo, Simiao; Jiang, Haoming; Zhang, Chao; Zhao, Tuo; Zha, Hongyuan (September 2023, International Conference on Machine Learning)

Full Text Available
UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

Xu, Yanbo; Khare, Alind; Matlin, Glenn; Ramadoss, Monish; Kamaleswaran, Rishikesan; Zhang, Chao; Tumanov, Alexey (December 2022, Annual Conference on Neural Information Processing Systems)
UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

Xu, Yanbo; Khare, Alind; Matlin, Glenn; Ramadoss, Monish; Kamaleswaran, Rishikesan; Zhang, Chao; Tumanov, Alexey (January 2022, Advances in neural information processing systems)
Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A. (Ed.)
Machine Learning (ML) research has focused on maximizing the accuracy of predictive tasks. ML models, however, are increasingly more complex, resource intensive, and costlier to deploy in resource-constrained environments. These issues are exacerbated for prediction tasks with sequential classification on progressively transitioned stages with “happens-before” relation between them.We argue that it is possible to “unfold” a monolithic single multi-class classifier, typically trained for all stages using all data, into a series of single-stage classifiers. Each single- stage classifier can be cascaded gradually from cheaper to more expensive binary classifiers that are trained using only the necessary data modalities or features required for that stage. UnfoldML is a cost-aware and uncertainty-based dynamic 2D prediction pipeline for multi-stage classification that enables (1) navigation of the accuracy/cost tradeoff space, (2) reducing the spatio-temporal cost of inference by orders of magnitude, and (3) early prediction on proceeding stages. UnfoldML achieves orders of magnitude better cost in clinical settings, while detecting multi- stage disease development in real time. It achieves within 0.1% accuracy from the highest-performing multi-class baseline, while saving close to 20X on spatio- temporal cost of inference and earlier (3.5hrs) disease onset prediction. We also show that UnfoldML generalizes to image classification, where it can predict different level of labels (from coarse to fine) given different level of abstractions of a image, saving close to 5X cost with as little as 0.4% accuracy reduction.
more » « less
Full Text Available

Search for: All records