skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Search for: All records

Award ID contains: 1914792

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Introduction

    Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.


    This is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.


    We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.


    Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

    more » « less
    Free, publicly-accessible full text available January 30, 2025
  2. Abstract Purpose of Review

    Preparing for pandemics requires a degree of interdisciplinary work that is challenging under the current paradigm. This review summarizes the challenges faced by the field of pandemic science and proposes how to address them.

    Recent Findings

    The structure of current siloed systems of research organizations hinders effective interdisciplinary pandemic research. Moreover, effective pandemic preparedness requires stakeholders in public policy and health to interact and integrate new findings rapidly, relying on a robust, responsive, and productive research domain. Neither of these requirements are well supported under the current system.


    We propose a new paradigm for pandemic preparedness wherein interdisciplinary research and close collaboration with public policy and health practitioners can improve our ability to prevent, detect, and treat pandemics through tighter integration among domains, rapid and accurate integration, and translation of science to public policy, outreach and education, and improved venues and incentives for sustainable and robust interdisciplinary work.

    more » « less
  3. Objectives

    To evaluate the association between preconception contraceptive use and miscarriage.


    Prospective cohort study.


    Residents of the United States of America or Canada, recruited from 2013 until the end of 2022.


    13 460 female identified participants aged 21-45 years who were planning a pregnancy were included, of whom 8899 conceived. Participants reported data for contraceptive history, early pregnancy, miscarriage, and potential confounders during preconception and pregnancy.

    Main outcome measure

    Miscarriage, defined as pregnancy loss before 20 weeks of gestation.


    Preconception use of combined and progestin-only oral contraceptives, hormonal intrauterine devices, copper intrauterine devices, rings, implants, or natural methods was not associated with miscarriage compared with use of barrier methods. Participants who most recently used patch (incidence rate ratios 1.34 (95% confidence interval 0.81 to 2.21)) or injectable contraceptives (1.44 (0.99 to 2.12)) had higher rates of miscarriage compared with recent users of barrier methods, although results were imprecise due to the small numbers of participants who used patch and injectable contraceptives.


    Use of most contraceptives before conception was not appreciably associated with miscarriage rate. Individuals who used patch and injectable contraceptives had higher rates of miscarriage relative to users of barrier methods, although these results were imprecise and residual confounding was possible.

    more » « less
    Free, publicly-accessible full text available September 1, 2024
  4. Abstract STUDY QUESTION

    To what extent is male fatty acid intake associated with fecundability among couples planning pregnancy?


    We observed weak positive associations of male dietary intakes of total and saturated fatty acids with fecundability; no other fatty acid subtypes were appreciably associated with fecundability.


    Male fatty acid intake has been associated with semen quality in previous studies. However, little is known about the extent to which male fatty acid intake is associated with fecundability among couples attempting spontaneous conception.


    We conducted an internet-based preconception prospective cohort study of 697 couples who enrolled during 2015–2022. During 12 cycles of observation, 53 couples (7.6%) were lost to follow-up.


    Participants were residents of the USA or Canada, aged 21–45 years, and not using fertility treatment at enrollment. At baseline, male participants completed a food frequency questionnaire from which we estimated intakes of total fat and fatty acid subtypes. We ascertained time to pregnancy using questionnaires completed every 8 weeks by female participants until conception or up to 12 months. We used proportional probabilities regression models to estimate fecundability ratios (FRs) and 95% CIs for the associations of fat intakes with fecundability, adjusting for male and female partner characteristics. We used the multivariate nutrient density method to account for energy intake, allowing for interpretation of results as fat intake replacing carbohydrate intake. We conducted several sensitivity analyses to assess the potential for confounding, selection bias, and reverse causation.


    Among 697 couples, we observed 465 pregnancies during 2970 menstrual cycles of follow-up. The cumulative incidence of pregnancy during 12 cycles of follow-up after accounting for censoring was 76%. Intakes of total and saturated fatty acids were weakly, positively associated with fecundability. Fully adjusted FRs for quartiles of total fat intake were 1.32 (95% CI 1.01–1.71), 1.16 (95% CI 0.88–1.51), and 1.43 (95% CI 1.09–1.88) for the second, third, and fourth vs the first quartile, respectively. Fully adjusted FRs for saturated fatty acid intake were 1.21 (95% CI 0.94–1.55), 1.16 (95% CI 0.89–1.51), and 1.23 (95% CI 0.94–1.62) for the second, third, and fourth vs the first quartile, respectively. Intakes of monounsaturated, polyunsaturated, trans-, omega-3, and omega-6 fatty acids were not strongly associated with fecundability. Results were similar after adjustment for the female partner’s intakes of trans- and omega-3 fats.


    Dietary intakes estimated from the food frequency questionnaire may be subject to non-differential misclassification, which is expected to bias results toward the null in the extreme categories when exposures are modeled as quartiles. There may be residual confounding by unmeasured dietary, lifestyle, or environmental factors. Sample size was limited, especially in subgroup analyses.


    Our results do not support a strong causal effect of male fatty acid intakes on fecundability among couples attempting to conceive spontaneously. The weak positive associations we observed between male dietary fat intakes and fecundability may reflect a combination of causal associations, measurement error, chance, and residual confounding.


    The study was funded by the National Institutes of Health, grant numbers R01HD086742 and R01HD105863. In the last 3 years, PRESTO has received in-kind donations from Swiss Precision Diagnostics (home pregnancy tests) and (fertility app). L.A.W. is a consultant for AbbVie, Inc. M.L.E. is an advisor to Sandstone, Ro, Underdog, Dadi, Hannah, Doveras, and VSeat. The other authors have no competing interests to report.



    more » « less
    Free, publicly-accessible full text available May 23, 2024
  5. Abstract Background

    Hypertension is a prevalent cardiovascular disease with severe longer-term implications. Conventional management based on clinical guidelines does not facilitate personalized treatment that accounts for a richer set of patient characteristics.


    Records from 1/1/2012 to 1/1/2020 at the Boston Medical Center were used, selecting patients with either a hypertension diagnosis or meeting diagnostic criteria (≥ 130 mmHg systolic or ≥ 90 mmHg diastolic, n = 42,752). Models were developed to recommend a class of antihypertensive medications for each patient based on their characteristics. Regression immunized against outliers was combined with a nearest neighbor approach to associate with each patient an affinity group of other patients. This group was then used to make predictions of future Systolic Blood Pressure (SBP) under each prescription type. For each patient, we leveraged these predictions to select the class of medication that minimized their future predicted SBP.


    The proposed model, built with a distributionally robust learning procedure, leads to a reduction of 14.28 mmHg in SBP, on average. This reduction is 70.30% larger than the reduction achieved by the standard-of-care and 7.08% better than the corresponding reduction achieved by the 2nd best model which uses ordinary least squares regression. All derived models outperform following the previous prescription or the current ground truth prescription in the record. We randomly sampled and manually reviewed 350 patient records; 87.71% of these model-generated prescription recommendations passed a sanity check by clinicians.


    Our data-driven approach for personalized hypertension treatment yielded significant improvement compared to the standard-of-care. The model implied potential benefits of computationally deprescribing and can support situations with clinical equipoise.

    more » « less
  6. Abstract Background

    Psychological stress is prevalent among reproductive‐aged men. Assessment of semen quality for epidemiological studies is challenging as data collection is expensive and cumbersome, and studies evaluating the effect of perceived stress on semen quality are inconsistent.


    To examine the association between perceived stress and semen quality.

    Material and methods

    We analyzed baseline data on 644 men (1,159 semen samples) from two prospective preconception cohort studies during 2015–2021: 592 in Pregnancy Study Online (PRESTO) and 52 in (SF). At study entry, men aged ≥21 years (PRESTO) and ≥18 years (SF) trying to conceive without fertility treatment completed a questionnaire on reproductive and medical history, socio‐demographics, lifestyle, and the 10‐item version of the Perceived Stress Scale (PSS; interquartile range [IQR] of scores: 0–40). After enrollment (median weeks: 2.1, IQR: 1.3–3.7), men were invited to perform in‐home semen testing, twice with 7–10 days between tests, using the Trak Male Fertility Testing System. Semen quality was characterized by semen volume, sperm concentration, and total sperm count. We fit generalized estimating equation linear regression models to estimate the percent difference in mean log‐transformed semen parameters by four PSS groups (<10, 10–14, 15–19, ≥20), adjusting for potential confounders.


    The median PSS score and IQR was 15 (10–19), and 136 men (21.1%) had a PSS score ≥20. Comparing men with PSS scores ≥20 with <10, the adjusted percent difference was −2.7 (95% CI: −9.8; 5.0) for semen volume, 6.8 (95% CI: ‐10.9; 28.1) for sperm concentration, and 4.3 (95% CI: −13.8; 26.2) for total sperm count.


    Our findings indicate that perceived stress is not materially associated with semen volume, sperm concentration, or total sperm count.

    more » « less
  7. Abstract Objective

    To develop predictive models of coronavirus disease 2019 (COVID-19) outcomes, elucidate the influence of socioeconomic factors, and assess algorithmic racial fairness using a racially diverse patient population with high social needs.

    Materials and Methods

    Data included 7,102 patients with positive (RT-PCR) severe acute respiratory syndrome coronavirus 2 test at a safety-net system in Massachusetts. Linear and nonlinear classification methods were applied. A score based on a recurrent neural network and a transformer architecture was developed to capture the dynamic evolution of vital signs. Combined with patient characteristics, clinical variables, and hospital occupancy measures, this dynamic vital score was used to train predictive models.


    Hospitalizations can be predicted with an area under the receiver-operating characteristic curve (AUC) of 92% using symptoms, hospital occupancy, and patient characteristics, including social determinants of health. Parsimonious models to predict intensive care, mechanical ventilation, and mortality that used the most recent labs and vitals exhibited AUCs of 92.7%, 91.2%, and 94%, respectively. Early predictive models, using labs and vital signs closer to admission had AUCs of 81.1%, 84.9%, and 92%, respectively.


    The most accurate models exhibit racial bias, being more likely to falsely predict that Black patients will be hospitalized. Models that are only based on the dynamic vital score exhibited accuracies close to the best parsimonious models, although the latter also used laboratories.


    This large study demonstrates that COVID-19 severity may accurately be predicted using a score that accounts for the dynamic evolution of vital signs. Further, race, social determinants of health, and hospital occupancy play an important role.

    more » « less
  8. Abstract Introduction

    Automated computational assessment of neuropsychological tests would enable widespread, cost‐effective screening for dementia.


    A novel natural language processing approach is developed and validated to identify different stages of dementia based on automated transcription of digital voice recordings of subjects’ neuropsychological tests conducted by the Framingham Heart Study (n= 1084). Transcribed sentences from the test were encoded into quantitative data and several models were trained and tested using these data and the participants’ demographic characteristics.


    Average area under the curve (AUC) on the held‐out test data reached 92.6%, 88.0%, and 74.4% for differentiating Normal cognition from Dementia, Normal or Mild Cognitive Impairment (MCI) from Dementia, and Normal from MCI, respectively.


    The proposed approach offers a fully automated identification of MCI and dementia based on a recorded neuropsychological test, providing an opportunity to develop a remote screening tool that could be adapted easily to any language.

    more » « less
  9. Abstract

    The aim of this study is to determine the most informative pre- and in-cycle variables for predicting success for a first autologous oocyte in-vitro fertilization (IVF) cycle. This is a retrospective study using 22,413 first autologous oocyte IVF cycles from 2001 to 2018. Models were developed to predict pregnancy following an IVF cycle with a fresh embryo transfer. The importance of each variable was determined by its coefficient in a logistic regression model and the prediction accuracy based on different variable sets was reported. The area under the receiver operating characteristic curve (AUC) on a validation patient cohort was the metric for prediction accuracy. Three factors were found to be of importance when predicting IVF success: age in three groups (38–40, 41–42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos. For predicting first-cycle IVF pregnancy using all available variables, the predictive model achieved an AUC of 68% + /− 0.01%. A parsimonious predictive model utilizing age (38–40, 41–42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos achieved an AUC of 65% + /− 0.01%. The proposed models accurately predict a single IVF cycle pregnancy outcome and identify important predictive variables associated with the outcome. These models are limited to predicting pregnancy immediately after the IVF cycle and not live birth. These models do not include indicators of multiple gestation and are not intended for clinical application.

    more » « less
  10. Abstract

    Behavioral data shows that humans and animals have the capacity to learn rules of associations applied to specific examples, and generalize these rules to a broad variety of contexts. This article focuses on neural circuit mechanisms to perform a context‐dependent association task that requires linking sensory stimuli to behavioral responses and generalizing to multiple other symmetrical contexts. The model uses neural gating units that regulate the pattern of physiological connectivity within the circuit. These neural gating units can be used in a learning framework that performs low‐rank matrix factorization analogous to recommender systems, allowing generalization with high accuracy to a wide range of additional symmetrical contexts. The neural gating units are trained with a biologically inspired framework involving traces of Hebbian modification that are updated based on the correct behavioral output of the network. This modeling demonstrates potential neural mechanisms for learning context‐dependent association rules and for the change in selectivity of neurophysiological responses in the hippocampus. The proposed computational model is evaluated using simulations of the learning process and the application of the model to new stimuli. Further, human subject behavioral experiments were performed and the results validate the key observation of a low‐rank synaptic matrix structure linking stimuli to responses.

    more » « less