Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
ABSTRACT ObjectivesMost bipolar disorder (BD) patients initially present with depressive symptoms, resulting in a delayed diagnosis of BD and poor clinical outcomes. This study aims to identify features predictive of the conversion from Major Depressive Disorder (MDD) to BD by leveraging electronic health record (EHR) data from the Clínica San Juan de Dios Manizales in Colombia. MethodsWe employed a multivariable Cox regression model to identify important predictors of conversion from MDD to BD. ResultsAnalyzing 15 years of EHR data from 13,607 patients diagnosed with MDD, a total of 1610 (11.8%) transitioned to BD. Predictive features of the conversion to BD included severity of the initial MDD episode, presence of psychosis and hospitalization at first episode, family history of BD, and female gender. Additionally, we observed associations with medication classes (positive associations with prescriptions of mood stabilizers, antipsychotics, and negative associations with antidepressants) and a positive association with suicidality, a feature derived from natural language processing (NLP) of clinical notes. Together, these risk factors predicted BD conversion within 5 years of the initial MDD diagnosis, with a recall of 72% and a precision of 38%. ConclusionsOur study confirms previously identified risk factors identified through registry‐based studies (female gender and psychotic depression at the index MDD episode) and identifies novel ones (suicidality extracted from clinical notes). These results simultaneously demonstrate the validity of using EHR data for predicting BD conversion and underscore its potential for the identification of novel risk factors, thereby improving early diagnosis.more » « lessFree, publicly-accessible full text available February 1, 2026
-
Gao, Xin (Ed.)Abstract MotivationConditional testing via the knockoff framework allows one to identify—among a large number of possible explanatory variables—those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance. ResultsWhile conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct “group knockoffs.” While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. Availability and implementationThe described algorithms are implemented in an open-source Julia package Knockoffs.jl. R and Python wrappers are available as knockoffsr and knockoffspy packages.more » « less
-
Abstract We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. This is the case, for example, when researchers are interested both in individual hypotheses as well as group hypotheses corresponding to intersections of sets of the original hypotheses, at several resolution levels. A concrete application is in genome-wide association studies, where, depending on the signal strengths, it might be possible to resolve the influence of individual genetic variants on a phenotype with greater or lower precision. To adapt to the unknown signal strength, analyses are conducted at multiple resolutions and researchers are most interested in the more precise discoveries. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage e-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analysing data from the UK Biobank.more » « less
-
Universities have been expanding undergraduate data science programs. Involving graduate students in these new opportunities can foster their growth as data science educators. We describe two programs that employ a near-peer mentoring structure, in which graduate students mentor undergraduates, to (a) strengthen their teaching and mentoring skills and (b) provide research and learning experiences for undergraduates from diverse backgrounds. In the Data Science for Social Good program, undergraduate participants work in teams to tackle a data science project with social impact. Graduate mentors guide project work and provide just-in-time teaching and feedback. The Stanford Mentoring in Data Science course offers training in effective and inclusive mentorship strategies. In an experiential learning framework, enrolled graduate students are paired with undergraduate students from non-R1 schools, whom they mentor through weekly one-on-one remote meetings. In end-of-program surveys, mentors reported growth through both programs. Drawing from these experiences, we developed a self-paced mentor training guide, which engages teaching, mentoring and project management abilities. These initiatives and the shared materials can serve as prototypes of future programs that cultivate mutual growth of both undergraduate and graduate students in a high-touch, inclusive, and encouraging environment.more » « lessFree, publicly-accessible full text available October 2, 2026
-
In recent decades, there has been an explosion of data streams spanning the entire spectrum of biomedicine, opening novel opportunities to tackle biological and medical research questions, increasing our ability to provide effective and efficient health care. In parallel, augmented computational power has allowed the development and deployment of quantitative approaches at unprecedented scales. To effectively take advantage of this progress, it is important to invest in the training of a new generation of biomedical data scientists. Designing a graduate curriculum in the backdrop of a rapidly changing landscape of data, methods, and computing power demands flexibility and openness to adaptation. At the same time, we strive to ensure that the students acquire foundational competencies that might fuel productive and evolving careers, without being constrained to and defined by a niche trendy topic. We offer here a view of graduate training in biomedical data science from the standpoint of our experience at Stanford University. We conclude with a series of open challenges, the answers to which we believe will shape training in biomedical data science.more » « lessFree, publicly-accessible full text available August 11, 2026
-
Abstract Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer’s disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.more » « less
An official website of the United States government
