skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Wiens, Jenna"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Current causal inference approaches for estimating conditional average treatment effects (CATEs) often prioritize accuracy. However, in resource constrained settings, decision makers may only need a ranking of individuals based on their estimated CATE. In these scenarios, exact CATE estimation may be an unnecessarily challenging task, particularly when the underlying function is difficult to learn. In this work, we study the relationship between CATE estimation and optimizing for CATE ranking, demonstrating that optimizing for ranking may be more appropriate than optimizing for accuracy in certain settings. Guided by our analysis, we propose an approach to directly optimize for rankings of individuals to inform treatment assignment that aims to maximize benefit. Our tree-based approach maximizes the expected benefit of the treatment assignment using a novel splitting criteria. In an empirical case-study across synthetic datasets, our approach leads to better treatment assignments compared to CATE estimation methods as measured by expected total benefit. By providing a practical and efficient approach to learning a CATE ranking, this work offers an important step towards bridging the gap between CATE estimation techniques and their downstream applications. 
    more » « less
    Free, publicly-accessible full text available May 2, 2025
  2. BACKGROUND Timely interventions, such as antibiotics and intravenous fluids, have been associated with reduced mortality in patients with sepsis. Artificial intelligence (AI) models that accurately predict risk of sepsis onset could speed the delivery of these interventions. Although sepsis models generally aim to predict its onset, clinicians might recognize and treat sepsis before the sepsis definition is met. Predictions occurring after sepsis is clinically recognized (i.e., after treatment begins) may be of limited utility. Researchers have not previously investigated the accuracy of sepsis risk predictions that are made before treatment begins. Thus, we evaluate the discriminative performance of AI sepsis predictions made throughout a hospitalization relative to the time of treatment. METHODS We used a large retrospective inpatient cohort from the University of Michigan’s academic medical center (2018–2020) to evaluate the Epic sepsis model (ESM). The ability of the model to predict sepsis, both before sepsis criteria are met and before indications of treatment plans for sepsis, was evaluated in terms of the area under the receiver operating characteristic curve (AUROC). Indicators of a treatment plan were identified through electronic data capture and included the receipt of antibiotics, fluids, blood culture, and/or lactate measurement. The definition of sepsis was a composite of the Centers for Disease Control and Prevention’s surveillance criteria and the severe sepsis and septic shock management bundle definition. RESULTS The study included 77,582 hospitalizations. Sepsis occurred in 3766 hospitalizations (4.9%). ESM achieved an AUROC of 0.62 (95% confidence interval [CI], 0.61 to 0.63) when including predictions before sepsis criteria were met and in some cases, after clinical recognition. When excluding predictions after clinical recognition, the AUROC dropped to 0.47 (95% CI, 0.46 to 0.48). CONCLUSIONS We evaluate a sepsis risk prediction model to measure its ability to predict sepsis before clinical recognition. Our work has important implications for future work in model development and evaluation, with the goal of maximizing the clinical utility of these models. (Funded by Cisco Research and others.) 
    more » « less
    Free, publicly-accessible full text available February 7, 2025
  3. Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise. 
    more » « less
  4. Abstract Background

    Vestibular deficits can impair an individual’s ability to maintain postural and/or gaze stability. Characterizing gait abnormalities among individuals affected by vestibular deficits could help identify patients at high risk of falling and inform rehabilitation programs. Commonly used gait assessment tools rely on simple measures such as timing and visual observations of path deviations by clinicians. These simple measures may not capture subtle changes in gait kinematics. Therefore, we investigated the use of wearable inertial measurement units (IMUs) and machine learning (ML) approaches to automatically discriminate between gait patterns of individuals with vestibular deficits and age-matched controls. The goal of this study was to examine the effects of IMU placement and gait task selection on the performance of automatic vestibular gait classifiers.

    Methods

    Thirty study participants (15 with vestibular deficits and 15 age-matched controls) participated in a single-session gait study during which they performed seven gait tasks while donning a full-body set of IMUs. Classification performance was reported in terms of area under the receiver operating characteristic curve (AUROC) scores for Random Forest models trained on data from each IMU placement for each gait task.

    Results

    Several models were able to classify vestibular gait better than random (AUROC > 0.5), but their performance varied according to IMU placement and gait task selection. Results indicated that a single IMU placed on the left arm when walking with eyes closed resulted in the highest AUROC score for a single IMU (AUROC = 0.88 [0.84, 0.89]). Feature permutation results indicated that participants with vestibular deficits reduced their arm swing compared to age-matched controls while they walked with eyes closed.

    Conclusions

    These findings highlighted differences in upper extremity kinematics during walking with eyes closed that were characteristic of vestibular deficits and showed evidence of the discriminative ability of IMU-based automated screening for vestibular deficits. Further research should explore the mechanisms driving arm swing differences in the vestibular population.

     
    more » « less
  5. During training, models can exploit spurious correlations as shortcuts, resulting in poor generalization performance when shortcuts do not persist. In this work, assuming access to a representation based on domain knowledge (i.e., known concepts) that is invariant to shortcuts, we aim to learn robust and accurate models from biased training data. In contrast to previous work, we do not rely solely on known concepts, but allow the model to also learn unknown concepts. We propose two approaches for mitigating shortcuts that incorporate domain knowledge, while accounting for potentially important yet unknown concepts. The first approach is two-staged. After fitting a model using known concepts, it accounts for the residual using unknown concepts. While flexible, we show that this approach is vulnerable when shortcuts are correlated with the unknown concepts. This limitation is addressed by our second approach that extends a recently proposed regularization penalty. Applied to two real-world datasets, we demonstrate that both approaches can successfully mitigate shortcut learning. 
    more » « less
  6. Many reinforcement learning (RL) applications have combinatorial action spaces, where each action is a composition of sub-actions. A standard RL approach ignores this inherent factorization structure, resulting in a potential failure to make meaningful inferences about rarely observed sub-action combinations; this is particularly problematic for offline settings, where data may be limited. In this work, we propose a form of linear Q-function decomposition induced by factored action spaces. We study the theoretical properties of our approach, identifying scenarios where it is guaranteed to lead to zero bias when used to approximate the Q-function. Outside the regimes with theoretical guarantees, we show that our approach can still be useful because it leads to better sample efficiency without necessarily sacrificing policy optimality, allowing us to achieve a better bias-variance trade-off. Across several offline RL problems using simulators and real-world datasets motivated by healthcare, we demonstrate that incorporating factored action spaces into valuebased RL can result in better-performing policies. Our approach can help an agent make more accurate inferences within underexplored regions of the state-action space when applying RL to observational datasets. 
    more » « less
  7. Abstract INTRODUCTION

    Identifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all‐cause dementia (ACD) conversion at 5 years.

    METHODS

    Cox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held‐out data subset.

    RESULTS

    Of 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72–0.74]), and calibration (Brier score 0.18 [95% CI 0.17–0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors.

    DISCUSSION

    EHR‐based prediction model had good performance in identifying 5‐year MCI to ACD conversion and has potential to assist triaging of at‐risk patients.

    Highlights

    Of 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all‐cause dementia within 5 years.

    Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18).

    Age and vascular‐related morbidities were predictors of dementia conversion.

    Synthetic data was comparable to real data in modeling MCI to dementia conversion.

    Key Points

    An electronic health record–based model using demographic and co‐morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all‐cause dementia (ACD) within 5 years.

    Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5‐year conversion from MCI to ACD.

    High body mass index, alcohol abuse, and sleep apnea were protective factors for 5‐year conversion from MCI to ACD.

    Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health‐care data with minimized patient privacy concern that could accelerate scientific discoveries.

     
    more » « less
  8. Abstract Background

    Recently, machine learning techniques have been applied to data collected from inertial measurement units to automatically assess balance, but rely on hand-engineered features. We explore the utility of machine learning to automatically extract important features from inertial measurement unit data for balance assessment.

    Findings

    Ten participants with balance concerns performed multiple balance exercises in a laboratory setting while wearing an inertial measurement unit on their lower back. Physical therapists watched video recordings of participants performing the exercises and rated balance on a 5-point scale. We trained machine learning models using different representations of the unprocessed inertial measurement unit data to estimate physical therapist ratings. On a held-out test set, we compared these learned models to one another, to participants’ self-assessments of balance, and to models trained using hand-engineered features. Utilizing the unprocessed kinematic data from the inertial measurement unit provided significant improvements over both self-assessments and models using hand-engineered features (AUROC of 0.806 vs. 0.768, 0.665).

    Conclusions

    Unprocessed data from an inertial measurement unit used as input to a machine learning model produced accurate estimates of balance performance. The ability to learn from unprocessed data presents a potentially generalizable approach for assessing balance without the need for labor-intensive feature engineering, while maintaining comparable model performance.

     
    more » « less