skip to main content

Search for: All records

Award ID contains: 1664644

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Introduction

    Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.


    This is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.


    We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.


    Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

    more » « less
    Free, publicly-accessible full text available January 30, 2025
  2. Abstract Purpose of Review

    Preparing for pandemics requires a degree of interdisciplinary work that is challenging under the current paradigm. This review summarizes the challenges faced by the field of pandemic science and proposes how to address them.

    Recent Findings

    The structure of current siloed systems of research organizations hinders effective interdisciplinary pandemic research. Moreover, effective pandemic preparedness requires stakeholders in public policy and health to interact and integrate new findings rapidly, relying on a robust, responsive, and productive research domain. Neither of these requirements are well supported under the current system.


    We propose a new paradigm for pandemic preparedness wherein interdisciplinary research and close collaboration with public policy and health practitioners can improve our ability to prevent, detect, and treat pandemics through tighter integration among domains, rapid and accurate integration, and translation of science to public policy, outreach and education, and improved venues and incentives for sustainable and robust interdisciplinary work.

    more » « less
  3. Abstract Background

    Hypertension is a prevalent cardiovascular disease with severe longer-term implications. Conventional management based on clinical guidelines does not facilitate personalized treatment that accounts for a richer set of patient characteristics.


    Records from 1/1/2012 to 1/1/2020 at the Boston Medical Center were used, selecting patients with either a hypertension diagnosis or meeting diagnostic criteria (≥ 130 mmHg systolic or ≥ 90 mmHg diastolic, n = 42,752). Models were developed to recommend a class of antihypertensive medications for each patient based on their characteristics. Regression immunized against outliers was combined with a nearest neighbor approach to associate with each patient an affinity group of other patients. This group was then used to make predictions of future Systolic Blood Pressure (SBP) under each prescription type. For each patient, we leveraged these predictions to select the class of medication that minimized their future predicted SBP.


    The proposed model, built with a distributionally robust learning procedure, leads to a reduction of 14.28 mmHg in SBP, on average. This reduction is 70.30% larger than the reduction achieved by the standard-of-care and 7.08% better than the corresponding reduction achieved by the 2nd best model which uses ordinary least squares regression. All derived models outperform following the previous prescription or the current ground truth prescription in the record. We randomly sampled and manually reviewed 350 patient records; 87.71% of these model-generated prescription recommendations passed a sanity check by clinicians.


    Our data-driven approach for personalized hypertension treatment yielded significant improvement compared to the standard-of-care. The model implied potential benefits of computationally deprescribing and can support situations with clinical equipoise.

    more » « less
  4. Abstract Objective

    To develop predictive models of coronavirus disease 2019 (COVID-19) outcomes, elucidate the influence of socioeconomic factors, and assess algorithmic racial fairness using a racially diverse patient population with high social needs.

    Materials and Methods

    Data included 7,102 patients with positive (RT-PCR) severe acute respiratory syndrome coronavirus 2 test at a safety-net system in Massachusetts. Linear and nonlinear classification methods were applied. A score based on a recurrent neural network and a transformer architecture was developed to capture the dynamic evolution of vital signs. Combined with patient characteristics, clinical variables, and hospital occupancy measures, this dynamic vital score was used to train predictive models.


    Hospitalizations can be predicted with an area under the receiver-operating characteristic curve (AUC) of 92% using symptoms, hospital occupancy, and patient characteristics, including social determinants of health. Parsimonious models to predict intensive care, mechanical ventilation, and mortality that used the most recent labs and vitals exhibited AUCs of 92.7%, 91.2%, and 94%, respectively. Early predictive models, using labs and vital signs closer to admission had AUCs of 81.1%, 84.9%, and 92%, respectively.


    The most accurate models exhibit racial bias, being more likely to falsely predict that Black patients will be hospitalized. Models that are only based on the dynamic vital score exhibited accuracies close to the best parsimonious models, although the latter also used laboratories.


    This large study demonstrates that COVID-19 severity may accurately be predicted using a score that accounts for the dynamic evolution of vital signs. Further, race, social determinants of health, and hospital occupancy play an important role.

    more » « less
  5. Abstract Introduction

    Automated computational assessment of neuropsychological tests would enable widespread, cost‐effective screening for dementia.


    A novel natural language processing approach is developed and validated to identify different stages of dementia based on automated transcription of digital voice recordings of subjects’ neuropsychological tests conducted by the Framingham Heart Study (n= 1084). Transcribed sentences from the test were encoded into quantitative data and several models were trained and tested using these data and the participants’ demographic characteristics.


    Average area under the curve (AUC) on the held‐out test data reached 92.6%, 88.0%, and 74.4% for differentiating Normal cognition from Dementia, Normal or Mild Cognitive Impairment (MCI) from Dementia, and Normal from MCI, respectively.


    The proposed approach offers a fully automated identification of MCI and dementia based on a recorded neuropsychological test, providing an opportunity to develop a remote screening tool that could be adapted easily to any language.

    more » « less
  6. Abstract

    The aim of this study is to determine the most informative pre- and in-cycle variables for predicting success for a first autologous oocyte in-vitro fertilization (IVF) cycle. This is a retrospective study using 22,413 first autologous oocyte IVF cycles from 2001 to 2018. Models were developed to predict pregnancy following an IVF cycle with a fresh embryo transfer. The importance of each variable was determined by its coefficient in a logistic regression model and the prediction accuracy based on different variable sets was reported. The area under the receiver operating characteristic curve (AUC) on a validation patient cohort was the metric for prediction accuracy. Three factors were found to be of importance when predicting IVF success: age in three groups (38–40, 41–42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos. For predicting first-cycle IVF pregnancy using all available variables, the predictive model achieved an AUC of 68% + /− 0.01%. A parsimonious predictive model utilizing age (38–40, 41–42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos achieved an AUC of 65% + /− 0.01%. The proposed models accurately predict a single IVF cycle pregnancy outcome and identify important predictive variables associated with the outcome. These models are limited to predicting pregnancy immediately after the IVF cycle and not live birth. These models do not include indicators of multiple gestation and are not intended for clinical application.

    more » « less
  7. Abstract

    Behavioral data shows that humans and animals have the capacity to learn rules of associations applied to specific examples, and generalize these rules to a broad variety of contexts. This article focuses on neural circuit mechanisms to perform a context‐dependent association task that requires linking sensory stimuli to behavioral responses and generalizing to multiple other symmetrical contexts. The model uses neural gating units that regulate the pattern of physiological connectivity within the circuit. These neural gating units can be used in a learning framework that performs low‐rank matrix factorization analogous to recommender systems, allowing generalization with high accuracy to a wide range of additional symmetrical contexts. The neural gating units are trained with a biologically inspired framework involving traces of Hebbian modification that are updated based on the correct behavioral output of the network. This modeling demonstrates potential neural mechanisms for learning context‐dependent association rules and for the change in selectivity of neurophysiological responses in the hippocampus. The proposed computational model is evaluated using simulations of the learning process and the application of the model to new stimuli. Further, human subject behavioral experiments were performed and the results validate the key observation of a low‐rank synaptic matrix structure linking stimuli to responses.

    more » « less
  8. Free, publicly-accessible full text available January 1, 2025
  9. Free, publicly-accessible full text available December 13, 2024
  10. Free, publicly-accessible full text available December 10, 2024