skip to main content

This content will become publicly available on June 29, 2023

Title: Assessing the contribution of rare genetic variants to phenotypes of chronic obstructive pulmonary disease using whole-genome sequence data
Abstract Rationale: Genetic variation has a substantial contribution to chronic obstructive pulmonary disease (COPD) and lung function measurements. Heritability estimates using genome-wide genotyping data can be biased if analyses do not appropriately account for the nonuniform distribution of genetic effects across the allele frequency and linkage disequilibrium (LD) spectrum. In addition, the contribution of rare variants has been unclear. Objectives: We sought to assess the heritability of COPD and lung function using whole-genome sequence data from the Trans-Omics for Precision Medicine program. Methods: Using the genome-based restricted maximum likelihood method, we partitioned the genome into bins based on minor allele frequency and LD scores and estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio in 11 051 European ancestry and 5853 African-American participants. Measurements and Main Results: In European ancestry participants, the estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio were 35.5%, 55.6% and 32.5%, of which 18.8%, 19.7%, 17.8% were from common variants, and 16.6%, 35.8%, and 14.6% were from rare variants. These estimates had wide confidence intervals, with common variants and some sets of rare variants showing a statistically significant contribution (P-value < 0.05). In African-Americans, common variant heritability was similar to European ancestry participants, but lower sample size more » precluded calculation of rare variant heritability. Conclusions: Our study provides updated and unbiased estimates of heritability for COPD and lung function, and suggests an important contribution of rare variants. Larger studies of more diverse ancestry will improve accuracy of these estimates. « less
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; « less
Award ID(s):
Publication Date:
Journal Name:
Human Molecular Genetics
Sponsoring Org:
National Science Foundation
More Like this
  1. Buchner, David A. (Ed.)
    Several studies have found associations between higher pancreatic fat content and adverse health outcomes, such as diabetes and the metabolic syndrome, but investigations into the genetic contributions to pancreatic fat are limited. This genome-wide association study, comprised of 804 participants with MRI-assessed pancreatic fat measurements, was conducted in the ethnically diverse Multiethnic Cohort-Adiposity Phenotype Study (MEC-APS). Two genetic variants reaching genome-wide significance, rs73449607 on chromosome 13q21.2 (Beta = -0.67, P = 4.50x10 -8 ) and rs7996760 on chromosome 6q14 (Beta = -0.90, P = 4.91x10 -8 ) were associated with percent pancreatic fat on the log scale. Rs73449607 was most common in the African American population (13%) and rs79967607 was most common in the European American population (6%). Rs73449607 was also associated with lower risk of type 2 diabetes (OR = 0.95, 95% CI = 0.89–1.00, P = 0.047) in the Population Architecture Genomics and Epidemiology (PAGE) Study and the DIAbetes Genetics Replication and Meta-analysis (DIAGRAM), which included substantial numbers of non-European ancestry participants (53,102 cases and 193,679 controls). Rs73449607 is located in an intergenic region between GSX1 and PLUTO , and rs79967607 is in intron 1 of EPM2A . PLUTO , a lncRNA , regulates transcription of an adjacentmore »gene, PDX1 , that controls beta-cell function in the mature pancreas, and EPM2A encodes the protein laforin, which plays a critical role in regulating glycogen production. If validated, these variants may suggest a genetic component for pancreatic fat and a common etiologic link between pancreatic fat and type 2 diabetes.« less
  2. INTRODUCTION To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how thesemore »sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry.« less
  3. Abstract

    Infertility is a major health problem affecting ~15% of couples worldwide. Except for cases involving readily detectable chromosome aberrations, confident identification of a causative genetic defect is problematic. Despite the advent of genome sequencing for diagnostic purposes, the preponderance of segregating genetic variants complicates identification of culprit genetic alleles or mutations. Many algorithms have been developed to predict the effects of ‘variants of unknown significance’, typically single nucleotide polymorphisms (SNPs), but these predictions are not sufficiently accurate for clinical action. As part of a project to identify population variants that impact fertility, we have been generating clustered regularly interspaced short palindromic repeats-Cas9 edited mouse models of suspect SNPs in genes that are known to be required for fertility in mice. Here, we present data on a non-synonymous (amino acid altering) SNP (rs140107488) in the meiosis gene Mnd1, which is predicted bioinformatically to be deleterious to protein function. We report that when modeled in mice, this allele (MND1K85M), which is present at an allele frequency of ~ 3% in East Asians, has no discernable effect upon fertility, fecundity or gametogenesis, although it may cause sex skewing of progeny from homozygous males. In sum, assuming the mouse model accurately reflects the impactmore »of this variant in humans, rs140107488 appears to be a benign allele that can be eliminated or de-prioritized in clinical genomic analyses of infertility patients.

    « less
  4. Abstract Aims

    To investigate whether the cumulative exposure risks of incident T2D are shared with other common chronic diseases.

    Research design and methods

    We first establish and report the cross-sectional prevalence, cross-sectional co-prevalence, and incidence of seven T2D-associated chronic diseases [hypertension, atrial fibrillation, coronary artery disease, obesity, chronic obstructive pulmonary disease (COPD), and chronic kidney and liver diseases] in the UK Biobank. We use published weights of genetic variants and exposure variables to derive the T2D polygenic (PGS) and polyexposure (PXS) risk scores and test their associations to incident diseases.


    PXS was associated with higher levels of clinical risk factors including BMI, systolic blood pressure, blood glucose, triglycerides, and HbA1c in individuals without overt or diagnosed T2D. In addition to predicting incident T2D, PXS and PGS were significantly and positively associated with the incidence of all 7 other chronic diseases. There were 4% and 8% of individuals in the bottom deciles of PXS and PGS, respectively, who were prediabetic at baseline but had low risks of T2D and other chronic diseases. Compared to the remaining population, individuals in the top deciles of PGS and PXS had particularly high risks of developing chronic diseases. For instance, the hazard ratio of COPD and obesitymore »for individuals in the top T2D PXS deciles was 2.82 (95% CI 2.39–3.35,P = 4.00 × 10−33) and 2.54 (95% CI 2.24–2.87,P = 9.86 × 10−50), respectively, compared to the remaining population. We also found that PXS and PGS were both significantly (P < 0.0001) and positively associated with the total number of incident diseases.


    T2D shares polyexposure risks with other common chronic diseases. Individuals with an elevated genetic and non-genetic risk of T2D also have high risks of cardiovascular, liver, lung, and kidney diseases.

    « less
  5. Kim, Yuseob (Ed.)
    Abstract Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. Wemore »also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.« less