skip to main content

Search for: All records

Award ID contains: 2054253

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Importance

    Body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) is a commonly used estimate of obesity, which is a complex trait affected by genetic and lifestyle factors. Marked weight gain and loss could be associated with adverse biological processes.


    To evaluate the association between BMI variability and incident cardiovascular disease (CVD) events in 2 distinct cohorts.

    Design, Setting, and Participants

    This cohort study used data from the Million Veteran Program (MVP) between 2011 and 2018 and participants in the UK Biobank (UKB) enrolled between 2006 and 2010. Participants were followed up for a median of 3.8 (5th-95th percentile, 3.5) years. Participants with baseline CVD or cancer were excluded. Data were analyzed from September 2022 and September 2023.


    BMI variability was calculated by the retrospective SD and coefficient of variation (CV) using multiple clinical BMI measurements up to the baseline.

    Main Outcomes and Measures

    The main outcome was incident composite CVD events (incident nonfatal myocardial infarction, acute ischemic stroke, and cardiovascular death), assessed using Cox proportional hazards modeling after adjustment for CVD risk factors, including age, sex, mean BMI, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking status, diabetes status, and statin use. Secondary analysis assessed whether associations were dependent on the polygenic score of BMI.


    Among 92 363 US veterans in the MVP cohort (81 675 [88%] male; mean [SD] age, 56.7 [14.1] years), there were 9695 Hispanic participants, 22 488 non-Hispanic Black participants, and 60 180 non-Hispanic White participants. A total of 4811 composite CVD events were observed from 2011 to 2018. The CV of BMI was associated with 16% higher risk for composite CVD across all groups (hazard ratio [HR], 1.16; 95% CI, 1.13-1.19). These associations were unchanged among subgroups and after adjustment for the polygenic score of BMI. The UKB cohort included 65 047 individuals (mean [SD] age, 57.30 (7.77) years; 38 065 [59%] female) and had 6934 composite CVD events. Each 1-SD increase in BMI variability in the UKB cohort was associated with 8% increased risk of cardiovascular death (HR, 1.08; 95% CI, 1.04-1.11).

    Conclusions and Relevance

    This cohort study found that among US veterans, higher BMI variability was a significant risk marker associated with adverse cardiovascular events independent of mean BMI across major racial and ethnic groups. Results were consistent in the UKB for the cardiovascular death end point. Further studies should investigate the phenotype of high BMI variability.

    more » « less
    Free, publicly-accessible full text available March 4, 2025

    To characterize high type 1 diabetes (T1D) genetic risk in a population where type 2 diabetes (T2D) predominates.


    Characteristics typically associated with T1D were assessed in 109,594 Million Veteran Program participants with adult-onset diabetes, 2011–2021, who had T1D genetic risk scores (GRS) defined as low (0 to <45%), medium (45 to <90%), high (90 to <95%), or highest (≥95%).


    T1D characteristics increased progressively with higher genetic risk (P < 0.001 for trend). A GRS ≥ 90% was more common with diabetes diagnoses before age 40 years, but 95% of those participants were diagnosed at age ≥40 years, and they resembled T2D in mean age (64.3 years) and BMI (32.3 kg/m2). Compared with the low risk group, the highest-risk group was more likely to have diabetic ketoacidosis (low 0.9% vs. highest GRS 3.7%), hypoglycemia prompting emergency visits (3.7% vs. 5.8%), outpatient plasma glucose <50 mg/dL (7.5% vs. 13.4%), a shorter median time to start insulin (3.5 vs. 1.4 years), use of a T1D diagnostic code (16.3% vs. 28.1%), low C-peptide levels if tested (1.8% vs. 32.4%), and glutamic acid decarboxylase antibodies (6.9% vs. 45.2%), all P < 0.001.


    Characteristics associated with T1D were increased with higher genetic risk, and especially with the top 10% of risk. However, the age and BMI of those participants resemble people with T2D, and a substantial proportion did not have diagnostic testing or use of T1D diagnostic codes. T1D genetic screening could be used to aid identification of adult-onset T1D in settings in which T2D predominates.

    more » « less
    Free, publicly-accessible full text available April 12, 2025
  3. Marschall, Tobias (Ed.)
    Abstract Motivation

    In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive.


    We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA’s linear mixed models and mv-PLINK’s canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits.

    Availability and implementation

    Software, documentation, and scripts to reproduce our results are available from

    more » « less
  4. Abstract

    Allogeneic Vγ9Vδ2 (Vδ2) T cells have emerged as attractive candidates for developing cancer therapy due to their established safety in allogeneic contexts and inherent tumor-fighting capabilities. Nonetheless, the limited clinical success of Vδ2 T cell-based treatments may be attributed to donor variability, short-lived persistence, and tumor immune evasion. To address these constraints, we engineer Vδ2 T cells with enhanced attributes. By employing CD16 as a donor selection biomarker, we harness Vδ2 T cells characterized by heightened cytotoxicity and potent antibody-dependent cell-mediated cytotoxicity (ADCC) functionality. RNA sequencing analysis supports the augmented effector potential of Vδ2 T cells derived from CD16 high (CD16Hi) donors. Substantial enhancements are further achieved through CAR and IL-15 engineering methodologies. Preclinical investigations in two ovarian cancer models substantiate the effectiveness and safety of engineered CD16HiVδ2 T cells. These cells target tumors through multiple mechanisms, exhibit sustained in vivo persistence, and do not elicit graft-versus-host disease. These findings underscore the promise of engineered CD16HiVδ2 T cells as a viable therapeutic option for cancer treatment.

    more » « less
    Free, publicly-accessible full text available December 1, 2024
  5. <p>Both long- and short-term glycemic variability have been associated with incident diabetes complications. We evaluated their relative and potential additive effects on incident renal complications in the Action to Control Cardiovascular Risk in Diabetes trial. A marker of short-term glycemic variability, 1,5-anhydroglucitol (1,5-AG), was measured in 4,000 random 12-month postrandomization plasma samples (when hemoglobin A1c [HbA1c] was stable). Visit-to-visit fasting plasma glucose coefficient of variation (CV-FPG) was determined from 4 months postrandomization until the end point of microalbuminuria or macroalbuminuria. Using Cox proportional hazards models, high CV-FPG and low 1,5-AG were independently associated with microalbuminuria after adjusting for clinical risk factors. However, only the CV-FPG association remained after additional adjustment for average HbA1c. Only CV-FPG was a significant risk factor for macroalbuminuria. This post hoc analysis indicates that long-term rather than short-term glycemic variability better predicts the risk of renal disease in type 2 diabetes.</p></sec> <sec><title>Article Highlights

    The relative and potential additive effects of long- and short-term glycemic variability on the development of diabetic complications are unknown. We aimed to assess the individual and combined relationships of long-term visit-to-visit glycemic variability, measured as the coefficient of variation of fasting plasma glucose, and short-term glucose fluctuation, estimated by the biomarker 1,5-anhydroglucitol, with the development of proteinuria. Both estimates of glycemic variability were independently associated with microalbuminuria, but only long-term glycemic variability remained significant after adjusting for average hemoglobin A1c. Our findings suggest that longer-term visit-to-visit glucose variability improves renal disease prediction in type 2 diabetes.

    more » « less
    Free, publicly-accessible full text available September 19, 2024

    To determine the benefit of starting continuous glucose monitoring (CGM) in adult-onset type 1 diabetes (T1D) and type 2 diabetes (T2D) with regard to longer-term glucose control and serious clinical events.


    A retrospective observational cohort study within the Veterans Affairs Health Care System was used to compare glucose control and hypoglycemia- or hyperglycemia-related admission to an emergency room or hospital and all-cause hospitalization between propensity score overlap weighted initiators of CGM and nonusers over 12 months.


    CGM users receiving insulin (n = 5,015 with T1D and n = 15,706 with T2D) and similar numbers of nonusers were identified from 1 January 2015 to 31 December 2020. Declines in HbA1c were significantly greater in CGM users with T1D (−0.26%; 95% CI −0.33, −0.19%) and T2D (−0.35%; 95% CI −0.40, −0.31%) than in nonusers at 12 months. Percentages of patients achieving HbA1c <8 and <9% after 12 months were greater in CGM users. In T1D, CGM initiation was associated with significantly reduced risk of hypoglycemia (hazard ratio [HR] 0.69; 95% CI 0.48, 0.98) and all-cause hospitalization (HR 0.75; 95% CI 0.63, 0.90). In patients with T2D, there was a reduction in risk of hyperglycemia in CGM users (HR 0.87; 95% CI 0.77, 0.99) and all-cause hospitalization (HR 0.89; 95% CI 0.83, 0.97). Several subgroups (based on baseline age, HbA1c, hypoglycemic risk, or follow-up CGM use) had even greater responses.


    In a large national cohort, initiation of CGM was associated with sustained improvement in HbA1c in patients with later-onset T1D and patients with T2D using insulin. This was accompanied by a clear pattern of reduced risk of admission to an emergency room or hospital for hypoglycemia or hyperglycemia and of all-cause hospitalization.

    more » « less
  7. Abstract Background

    Idiopathic pulmonary fibrosis (IPF) is a progressive, irreversible, and usually fatal lung disease of unknown reasons, generally affecting the elderly population. Early diagnosis of IPF is crucial for triaging patients’ treatment planning into anti‐fibrotic treatment or treatments for other causes of pulmonary fibrosis. However, current IPF diagnosis workflow is complicated and time‐consuming, which involves collaborative efforts from radiologists, pathologists, and clinicians and it is largely subject to inter‐observer variability.


    The purpose of this work is to develop a deep learning‐based automated system that can diagnose subjects with IPF among subjects with interstitial lung disease (ILD) using an axial chest computed tomography (CT) scan. This work can potentially enable timely diagnosis decisions and reduce inter‐observer variability.


    Our dataset contains CT scans from 349 IPF patients and 529 non‐IPF ILD patients. We used 80% of the dataset for training and validation purposes and 20% as the holdout test set. We proposed a two‐stage model: at stage one, we built a multi‐scale, domain knowledge‐guided attention model (MSGA) that encouraged the model to focus on specific areas of interest to enhance model explainability, including both high‐ and medium‐resolution attentions; at stage two, we collected the output from MSGA and constructed a random forest (RF) classifier for patient‐level diagnosis, to further boost model accuracy. RF classifier is utilized as a final decision stage since it is interpretable, computationally fast, and can handle correlated variables. Model utility was examined by (1) accuracy, represented by the area under the receiver operating characteristic curve (AUC) with standard deviation (SD), and (2) explainability, illustrated by the visual examination of the estimated attention maps which showed the important areas for model diagnostics.


    During the training and validation stage, we observe that when we provide no guidance from domain knowledge, the IPF diagnosis model reaches acceptable performance (AUC±SD = 0.93±0.07), but lacks explainability; when including only guided high‐ or medium‐resolution attention, the learned attention maps are not satisfactory; when including both high‐ and medium‐resolution attention, under certain hyperparameter settings, the model reaches the highest AUC among all experiments (AUC±SD = 0.99±0.01) and the estimated attention maps concentrate on the regions of interests for this task. Three best‐performing hyperparameter selections according to MSGA were applied to the holdout test set and reached comparable model performance to that of the validation set.


    Our results suggest that, for a task with only scan‐level labels available, MSGA+RF can utilize the population‐level domain knowledge to guide the training of the network, which increases both model accuracy and explainability.

    more » « less
  8. Summary

    Nan Laird has an enormous and growing impact on computational statistics. Her paper with Dempster and Rubin on the expectation‐maximisation (EM) algorithm is the second most cited paper in statistics. Her papers and book on longitudinal modelling are nearly as impressive. In this brief survey, we revisit the derivation of some of her most useful algorithms from the perspective of the minorisation‐maximisation (MM) principle. The MM principle generalises the EM principle and frees it from the shackles of missing data and conditional expectations. Instead, the focus shifts to the construction of surrogate functions via standard mathematical inequalities. The MM principle can deliver a classical EM algorithm with less fuss or an entirely new algorithm with a faster rate of convergence. In any case, the MM principle enriches our understanding of the EM principle and suggests new algorithms of considerable potential in high‐dimensional settings where standard algorithms such as Newton's method and Fisher scoring falter.

    more » « less
  9. Abstract Background

    Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce to analyze population structure of low-depth sequencing data.


    The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common.


    We apply to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The package is available on

    more » « less
  10. Abstract Background

    Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited to Gaussian traits or traits transformable to normality, while ignoring qualitative traits and realistic, non-normal trait distributions. Also, modern computer languages, such as Julia, that accommodate parallelization and cloud-based computing are now mainstream but rarely used in older applications. To meet the challenges of contemporary big studies, it is important for geneticists to adopt new computational tools.


    We present , an open-source Julia package that makes it trivial to quickly simulate phenotypes under a variety of genetic architectures. This package is integrated into our OpenMendel suite for easy downstream analyses. Julia was purpose-built for scientific programming and provides tremendous speed and memory efficiency, easy access to multi-CPU and GPU hardware, and to distributed and cloud-based parallelization. is designed to encourage flexible trait simulation, including via the standard devices of applied statistics, generalized linear models (GLMs) and generalized linear mixed models (GLMMs). also accommodates many study designs: unrelateds, sibships, pedigrees, or a mixture of all three. (Of course, for data with pedigrees or cryptic relationships, the simulation process must include the genetic dependencies among the individuals.) We consider an assortment of trait models and study designs to illustrate integrated simulation and analysis pipelines. Step-by-step instructions for these analyses are available in our electronic Jupyter notebooks on Github. These interactive notebooks are ideal for reproducible research.


    The package has three main advantages. (1) It leverages the computational efficiency and ease of use of Julia to provide extremely fast, straightforward simulation of even the most complex genetic models, including GLMs and GLMMs. (2) It can be operated entirely within, but is not limited to, the integrated analysis pipeline of OpenMendel. And finally (3), by allowing a wider range of more realistic phenotype models, brings power calculations and diagnostic tools closer to what investigators might see in real-world analyses.

    more » « less