skip to main content


Title: Learning endometriosis phenotypes from patient-generated data
Abstract

Endometriosis is a systemic and chronic condition in women of childbearing age, yet a highly enigmatic disease with unresolved questions: there are no known biomarkers, nor established clinical stages. We here investigate the use of patient-generated health data and data-driven phenotyping to characterize endometriosis patient subtypes, based on their reported signs and symptoms. We aim at unsupervised learning of endometriosis phenotypes using self-tracking data from personal smartphones. We leverage data from an observational research study of over 4000 women with endometriosis that track their condition over more than 2 years. We extend a classical mixed-membership model to accommodate the idiosyncrasies of the data at hand, i.e., the multimodality and uncertainty of the self-tracked variables. The proposed method, by jointly modeling a wide range of observations (i.e., participant symptoms, quality of life, treatments), identifies clinically relevant endometriosis subtypes. Experiments show that our method is robust to different hyperparameter choices and the biases of self-tracking data (e.g., the wide variations in tracking frequency among participants). With this work, we show the promise of unsupervised learning of endometriosis subtypes from self-tracked data, as learned phenotypes align well with what is already known about the disease, but also suggest new clinically actionable findings. More generally, we argue that a continued research effort on unsupervised phenotyping methods with patient-generated health data via new mobile and digital technologies will have significant impact on the study of enigmatic diseases in particular, and health in general.

 
more » « less
NSF-PAR ID:
10164406
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
npj Digital Medicine
Volume:
3
Issue:
1
ISSN:
2398-6352
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The menstrual cycle is a key indicator of overall health for women of reproductive age. Previously, menstruation was primarily studied through survey results; however, as menstrual tracking mobile apps become more widely adopted, they provide an increasingly large, content-rich source of menstrual health experiences and behaviors over time. By exploring a database of user-tracked observations from the Clue app by BioWink GmbH of over 378,000 users and 4.9 million natural cycles, we show that self-reported menstrual tracker data can reveal statistically significant relationships between per-person cycle length variability and self-reported qualitative symptoms. A concern for self-tracked data is that they reflect not only physiological behaviors, but also the engagement dynamics of app users. To mitigate such potential artifacts, we develop a procedure to exclude cycles lacking user engagement, thereby allowing us to better distinguish true menstrual patterns from tracking anomalies. We uncover that women located at different ends of the menstrual variability spectrum, based on the consistency of their cycle length statistics, exhibit statistically significant differences in their cycle characteristics and symptom tracking patterns. We also find that cycle and period length statistics are stationary over the app usage timeline across the variability spectrum. The symptoms that we identify as showing statistically significant association with timing data can be useful to clinicians and users for predicting cycle variability from symptoms, or as potential health indicators for conditions like endometriosis. Our findings showcase the potential of longitudinal, high-resolution self-tracked data to improve understanding of menstruation and women’s health as a whole.

     
    more » « less
  2. Polycystic Ovary Syndrome (PCOS) is a condition that causes hormonal imbalance and infertility in women and people with female reproductive organs. PCOS causes different symptoms for different people, with no singular or universal cure. Being a stigmatized and enigmatic condition, it is challenging to discover, diagnose, and manage PCOS. This work aims to inform the design of inclusive health technologies through an understanding of people’s lived experiences and challenges with PCOS. We conducted semi-structured interviews with 10 women diagnosed with PCOS and analyzed a PCOS-specific subreddit forum. We report people’s support-seeking, sense-making, and self-experimentation practices, and find uncertainty and stigma to be key in shaping their unique experiences of the condition. We further identify potential avenues for designing technology to support their diverse needs, such as personalized and contextual tracking, accelerated self-discovery, and co-management, contributing to a growing body of HCI literature on stigmatized topics in women’s health and well-being. 
    more » « less
  3. Abstract Background Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative—an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients ( N =36,736). Methods We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. Results We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals’ SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p -value=2.32×10 −16 , EAA p -value=6.73×10 −11 ). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. Conclusions Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping. 
    more » « less
  4. Abstract

    Logistic regression is the primary analysis tool for binary traits in genome‐wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia packageOrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case–control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.

     
    more » « less
  5. Abstract STUDY QUESTION To what extent does the use of mobile computing apps to track the menstrual cycle and the fertile window influence fecundability among women trying to conceive? SUMMARY ANSWER After adjusting for potential confounders, use of any of several different apps was associated with increased fecundability ranging from 12% to 20% per cycle of attempt. WHAT IS KNOWN ALREADY Many women are using mobile computing apps to track their menstrual cycle and the fertile window, including while trying to conceive. STUDY DESIGN, SIZE, DURATION The Pregnancy Study Online (PRESTO) is a North American prospective internet-based cohort of women who are aged 21–45 years, trying to conceive and not using contraception or fertility treatment at baseline. PARTICIPANTS/MATERIALS, SETTING, METHODS We restricted the analysis to 8363 women trying to conceive for no more than 6 months at baseline; the women were recruited from June 2013 through May 2019. Women completed questionnaires at baseline and every 2 months for up to 1 year. The main outcome was fecundability, i.e. the per-cycle probability of conception, which we assessed using self-reported data on time to pregnancy (confirmed by positive home pregnancy test) in menstrual cycles. On the baseline and follow-up questionnaires, women reported whether they used mobile computing apps to track their menstrual cycles (‘cycle apps’) and, if so, which one(s). We estimated fecundability ratios (FRs) for the use of cycle apps, adjusted for female age, race/ethnicity, prior pregnancy, BMI, income, current smoking, education, partner education, caffeine intake, use of hormonal contraceptives as the last method of contraception, hours of sleep per night, cycle regularity, use of prenatal supplements, marital status, intercourse frequency and history of subfertility. We also examined the impact of concurrent use of fertility indicators: basal body temperature, cervical fluid, cervix position and/or urine LH. MAIN RESULTS AND THE ROLE OF CHANCE Among 8363 women, 6077 (72.7%) were using one or more cycle apps at baseline. A total of 122 separate apps were reported by women. We designated five of these apps before analysis as more likely to be effective (Clue, Fertility Friend, Glow, Kindara, Ovia; hereafter referred to as ‘selected apps’). The use of any app at baseline was associated with 20% increased fecundability, with little difference between selected apps versus other apps (selected apps FR (95% CI): 1.20 (1.13, 1.28); all other apps 1.21 (1.13, 1.30)). In time-varying analyses, cycle app use was associated with 12–15% increased fecundability (selected apps FR (95% CI): 1.12 (1.04, 1.21); all other apps 1.15 (1.07, 1.24)). When apps were used at baseline with one or more fertility indicators, there was higher fecundability than without fertility indicators (selected apps with indicators FR (95% CI): 1.23 (1.14, 1.34) versus without indicators 1.17 (1.05, 1.30); other apps with indicators 1.30 (1.19, 1.43) versus without indicators 1.16 (1.06, 1.27)). In time-varying analyses, results were similar when stratified by time trying at study entry (<3 vs. 3–6 cycles) or cycle regularity. For use of the selected apps, we observed higher fecundability among women with a history of subfertility: FR 1.33 (1.05–1.67). LIMITATIONS, REASONS FOR CAUTION Neither regularity nor intensity of app use was ascertained. The prospective time-varying assessment of app use was based on questionnaires completed every 2 months, which would not capture more frequent changes. Intercourse frequency was also reported retrospectively and we do not have data on timing of intercourse relative to the fertile window. Although we controlled for a wide range of covariates, we cannot exclude the possibility of residual confounding (e.g. choosing to use an app in this observational study may be a marker for unmeasured health habits promoting fecundability). Half of the women in the study received a free premium subscription for one of the apps (Fertility Friend), which may have increased the overall prevalence of app use in the time-varying analyses, but would not affect app use at baseline. Most women in the study were college educated, which may limit application of results to other populations. WIDER IMPLICATIONS OF THE FINDINGS Use of a cycle app, especially in combination with observation of one or more fertility indicators (basal body temperature, cervical fluid, cervix position and/or urine LH), may increase fecundability (per-cycle pregnancy probability) by about 12–20% for couples trying to conceive. We did not find consistent evidence of improved fecundability resulting from use of one specific app over another. STUDY FUNDING/COMPETING INTEREST(S) This research was supported by grants, R21HD072326 and R01HD086742, from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, USA. In the last 3 years, Dr L.A.W. has served as a fibroid consultant for AbbVie.com. Dr L.A.W. has also received in-kind donations from Sandstone Diagnostics, Swiss Precision Diagnostics, FertilityFriend.com and Kindara.com for primary data collection and participant incentives in the PRESTO cohort. Dr J.B.S. reports personal fees from Swiss Precision Diagnostics, outside the submitted work. The remaining authors have nothing to declare. TRIAL REGISTRATION NUMBER N/A. 
    more » « less