NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Federated learning of robust individualized decision rules with application to heterogeneous multihospital sepsis population

https://doi.org/10.1214/25-AOAS2017

Chen, Xinlei; Talisa, Victor B; Tan, Xiaoqing; Qi, Zhengling; Kennedy, Jason N; Chang, Chung-Chou H; Seymour, Christopher W; Tang, Lu (June 2025, The Annals of Applied Statistics)

Free, publicly-accessible full text available June 1, 2026
Development and evaluation of a machine learning model to predict acute care for opioid use disorder among Medicaid enrollees engaged in a community‐based treatment program

https://doi.org/10.1111/add.70079

Xue, Lingshu; Yin, Ruofei; Cole, Evan S; Lo‐Ciganic, Wei‐Hsuan; Gellad, Walid F; Donohue, Julie; Tang, Lu (April 2025, Addiction)

Abstract AimsTo develop machine‐learning algorithms for predicting the risk of a hospitalization or emergency department (ED) visit for opioid use disorder (OUD) (i.e. OUD acute events) in Pennsylvania Medicaid enrollees in the Opioid Use Disorder Centers of Excellence (COE) program and to evaluate the fairness of model performance across racial groups. MethodsWe studied 20 983 United States Medicaid enrollees aged 18 years or older who had COE visits between April 2019 and March 2021. We applied multivariate logistic regression, least absolute shrinkage and selection operator models, random forests, and eXtreme Gradient Boosting (XGB), to predict OUD acute events following the initial COE visit. Our models included predictors at the system, patient, and regional levels. We assessed model performance using multiple metrics by racial groups. Individuals were divided into a low, medium and high‐risk group based on predicted risk scores. ResultsThe training (n = 13 990) and testing (n = 6993) samples displayed similar characteristics (mean age 38.1 ± 9.3 years, 58% male, 80% White enrollees) with 4% experiencing OUD acute events at baseline. XGB demonstrated the best prediction performance (C‐statistic = 76.6% [95% confidence interval = 75.6%–77.7%] vs. 72.8%–74.7% for other methods). At the balanced cutoff, XGB achieved a sensitivity of 68.2%, specificity of 70.0%, and positive predictive value of 8.3%. The XGB model classified the testing sample into high‐risk (6%), medium‐risk (30%), and low‐risk (63%) groups. In the high‐risk group, 40.7% had OUD acute events vs. 16.5% and 5.0% in the medium‐ and low‐risk groups. The high‐ and medium‐risk groups captured 44% and 26% of individuals with OUD events. The XGB model exhibited lower false negative rates and higher false positive rates in racial/ethnic minority groups than White enrollees. ConclusionsNew machine‐learning algorithms perform well to predict risks of opioid use disorder (OUD) acute care use among United States Medicaid enrollees and improve fairness of prediction across racial and ethnic groups compared with previous OUD‐related models.
more » « less
Free, publicly-accessible full text available April 29, 2026
PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

https://doi.org/10.1145/3637528.3671611

Ye, Yuyang; Tang, Lu-An; Wang, Haoyu; Yu, Runlong; Yu, Wenchao; He, Erhu; Chen, Haifeng; Xiong, Hui (August 2024, ACM)

Full Text Available
Outcome-guided disease subtyping by generative model and weighted joint likelihood in transcriptomic applications

https://doi.org/10.1214/23-AOAS1865

Li, Yujia; Liu, Peng; Wang, Wenjia; Zong, Wei; Fang, Yusi; Ren, Zhao; Tang, Lu; Celedón, Juan C; Oesterreich, Steffi; Tseng, George C (September 2024, The Annals of Applied Statistics)

With advances in high-throughput technology, molecular disease subtyping by high-dimensional omics data has been recognized as an effective approach for identifying subtypes of complex diseases with distinct disease mechanisms and prognoses. Conventional cluster analysis takes omics data as input and generates patient clusters with similar gene expression pattern. The omics data, however, usually contain multifaceted cluster structures that can be defined by different sets of genes. If the gene set associated with irrelevant clinical variables (e.g., sex or age) dominates the clustering process, the resulting clusters may not capture clinically meaningful disease subtypes. This motivates the development of a clustering framework with guidance from a prespecified disease outcome, such as lung function measurement or survival, in this paper. We propose two disease subtyping methods by omics data with outcome guidance using a generative model or a weighted joint likelihood. Both methods connect an outcome association model and a disease subtyping model by a latent variable of cluster labels. Compared to the generative model, weighted joint likelihood contains a data-driven weight parameter to balance the likelihood contributions from outcome association and gene cluster separation, which improves generalizability in independent validation but requires heavier computing. Extensive simulations and two real applications in lung disease and triple-negative breast cancer demonstrate superior disease subtyping performance of the outcome-guided clustering methods in terms of disease subtyping accuracy, gene selection and outcome association. Unlike existing clustering methods, the outcome-guided disease subtyping framework creates a new precision medicine paradigm to directly identify patient subgroups with clinical association.
more » « less
Full Text Available
Heterogeneity in the Effect of Early Goal-Directed Therapy for Septic Shock: A Secondary Analysis of Two Multicenter International Trials

https://doi.org/10.1097/CCM.0000000000006463

Shah, Faraaz Ali; Talisa, Victor B; Chang, Chung-Chou H; Triantafyllou, Sofia; Tang, Lu; Mayr, Florian B; Higgins, Alisa M; Peake, Sandra L; Mouncey, Paul; Harrison, David A; et al (January 2025, Critical Care Medicine)

OBJECTIVES:The optimal approach for resuscitation in septic shock remains unclear despite multiple randomized controlled trials (RCTs). Our objective was to investigate whether previously uncharacterized variation across individuals in their response to resuscitation strategies may contribute to conflicting average treatment effects in prior RCTs. DESIGN:We randomly split study sites from the Australian Resuscitation of Sepsis Evaluation (ARISE) and Protocolized Care for Early Septic Shock (ProCESS) trials into derivation and validation cohorts. We trained machine learning models to predict individual absolute risk differences (iARDs) in 90-day mortality in derivation cohorts and tested for heterogeneity of treatment effect (HTE) in validation cohorts and swapped these cohorts in sensitivity analyses. We fit the best-performing model in a combined dataset to explore roles of patient characteristics and individual components of early goal-directed therapy (EGDT) to determine treatment responses. SETTING:Eighty-one sites in Australia, New Zealand, Hong Kong, Finland, Republic of Ireland, and the United States. PATIENTS:Adult patients presenting to the emergency department with severe sepsis or septic shock. INTERVENTIONS:EGDT vs. usual care. MEASUREMENTS AND MAIN RESULTS:A local-linear random forest model performed best in predicting iARDs. In the validation cohort, HTE was confirmed, evidenced by an interaction between iARD prediction and treatment (p< 0.001). When patients were grouped based on predicted iARDs, treatment response increased from the lowest to the highest quintiles (absolute risk difference [95% CI], –8% [–19% to 4%] and relative risk reduction, 1.34 [0.89–2.01] in quintile 1 suggesting harm from EGDT, and 12% [1–23%] and 0.64 [0.42–0.96] in quintile 5 suggesting benefit). Sensitivity analyses showed similar findings. Pre-intervention albumin contributed the most to HTE. Analyses of individual EGDT components were inconclusive. CONCLUSIONS:Treatment response to EGDT varied across patients in two multicenter RCTs with large benefits for some patients while others were harmed. Patient characteristics, including albumin, were most important in identifying HTE.
more » « less
Full Text Available
Covariate-guided Bayesian mixture of spline experts for the analysis of multivariate high-density longitudinal data

https://doi.org/10.1093/biostatistics/kxad034

Fu, Haoyi; Tang, Lu; Rosen, Ori; Hipwell, Alison E; Huppert, Theodore J; Krafty, Robert T (December 2023, Biostatistics)

Summary With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.
more » « less
Full Text Available
ProvIoT: Detecting Stealthy Attacks in IoT through Federated Edge-Cloud Security

Mukherjee, Kunal; Wiedemeier, Joshua; Wang, Qi; Kamimura, Junpei; Rhee, John Junghwan; Wei, James; Li, Zhichun; Yu, Xiao; Tang, Lu-An; Gui, Jiaping; et al (March 2024, ACM)

Internet of Things (IoT) devices have increased drastically in complexity and prevalence within the last decade. Alongside the proliferation of IoT devices and applications, attacks targeting them have gained popularity. Recent large-scale attacks such as Mirai and VPNFilter highlight the lack of comprehensive defenses for IoT devices. Existing security solutions are inadequate against skilled adversaries with sophisticated and stealthy attacks against IoT devices. Powerful provenance-based intrusion detection systems have been successfully deployed in resource-rich servers and desktops to identify advanced stealthy attacks. However, IoT devices lack the memory, storage, and computing resources to directly apply these provenance analysis techniques on the device. This paper presents ProvIoT, a novel federated edge-cloud security framework that enables on-device syscall-level behavioral anomaly detection in IoT devices. ProvIoT applies federated learning techniques to overcome data and privacy limitations while minimizing network overhead. Infrequent on-device training of the local model requires less than 10% CPU overhead; syncing with the global models requires sending and receiving 2MB over the network. During normal offline operation, ProvIoT periodically incurs less than 10% CPU overhead and less than 65MB memory usage for data summarization and anomaly detection. Our evaluation shows that ProvIoT detects fileless malware and stealthy APT attacks with an average F1 score of 0.97 in heterogeneous real-world IoT applications. ProvIoT is a step towards extending provenance analysis to resource-constrained IoT devices, beginning with well-resourced IoT devices such as the RaspberryPi, Jetson Nano, and Google TPU.
more » « less
Full Text Available
Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit

https://doi.org/10.1145/3613904.3641896

Yildirim, Nur; Zlotnikov, Susanna; Sayar, Deniz; Kahn, Jeremy M; Bukowski, Leigh A; Amin, Sher Shah; Riman, Kathryn A; Davis, Billie S; Minturn, John S; King, Andrew J; et al (May 2024, ACM)
Mueller, Florian Floyd; Kyburz, Penny; Williamson, Julie R; Sas, Corina; Wilson, Max L; Dugas, Phoebe Toups; Shklovski, Irina (Ed.)
Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use cases? This paper presents a first hand account of ideating AI concepts to improve critical care medicine. As a team of data scientists, clinicians, and HCI researchers, we conducted a series of design workshops to explore more effective approaches to AI concept ideation and problem formulation. We detail our process, the challenges we encountered, and practices and artifacts that proved effective. We discuss the research implications for improved collaboration and stakeholder engagement, and discuss the role HCI might play in reducing the high failure rate experienced in AI innovation.
more » « less
Full Text Available
Machine learning-based prediction of low-value care for hospitalized patients

https://doi.org/10.1016/j.ibmed.2023.100115

King, Andrew J; Tang, Lu; Davis, Billie S; Preum, Sarah M; Bukowski, Leigh A; Zimmerman, John; Kahn, Jeremy M (January 2023, Intelligence-Based Medicine)

Full Text Available
Distributed simultaneous inference in generalized linear models via confidence distribution

https://doi.org/10.1016/j.jmva.2019.104567

Tang, Lu; Zhou, Ling; Song, Peter X.-K. (March 2020, Journal of Multivariate Analysis)

Full Text Available

« Prev Next »

Search for: All records