skip to main content


Title: Assessing the contribution of rare genetic variants to phenotypes of chronic obstructive pulmonary disease using whole-genome sequence data
Abstract Rationale: Genetic variation has a substantial contribution to chronic obstructive pulmonary disease (COPD) and lung function measurements. Heritability estimates using genome-wide genotyping data can be biased if analyses do not appropriately account for the nonuniform distribution of genetic effects across the allele frequency and linkage disequilibrium (LD) spectrum. In addition, the contribution of rare variants has been unclear. Objectives: We sought to assess the heritability of COPD and lung function using whole-genome sequence data from the Trans-Omics for Precision Medicine program. Methods: Using the genome-based restricted maximum likelihood method, we partitioned the genome into bins based on minor allele frequency and LD scores and estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio in 11 051 European ancestry and 5853 African-American participants. Measurements and Main Results: In European ancestry participants, the estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio were 35.5%, 55.6% and 32.5%, of which 18.8%, 19.7%, 17.8% were from common variants, and 16.6%, 35.8%, and 14.6% were from rare variants. These estimates had wide confidence intervals, with common variants and some sets of rare variants showing a statistically significant contribution (P-value < 0.05). In African-Americans, common variant heritability was similar to European ancestry participants, but lower sample size precluded calculation of rare variant heritability. Conclusions: Our study provides updated and unbiased estimates of heritability for COPD and lung function, and suggests an important contribution of rare variants. Larger studies of more diverse ancestry will improve accuracy of these estimates.  more » « less
Award ID(s):
2054253 2205441
NSF-PAR ID:
10340242
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; « less
Date Published:
Journal Name:
Human Molecular Genetics
ISSN:
0964-6906
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Curated databases of genetic variants assist clinicians and researchers in interpreting genetic variation. Yet, these databases contain some misclassified variants. It is unclear whether variant misclassification is abating as these databases rapidly grow and implement new guidelines.

    Methods

    Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over 6 years, across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were classified by the databases as pathogenic. Due to the rarity of IEMs, nearly all such classified pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD.

    Results

    While the false-positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently imply two orders of magnitude more affected individuals in 1KGP than ClinVar variants. We observed that African ancestry individuals have a significantly increased chance of being incorrectly indicated to be affected by a screened IEM when HGMD variants are used. However, this bias affecting genomes of African ancestry was no longer significant once common variants were removed in accordance with recent variant classification guidelines. We discovered that ClinVar variants classified as Pathogenic or Likely Pathogenic are reclassified sixfold more often than DM or DM? variants in HGMD, which has likely resulted in ClinVar’s lower false-positive rate.

    Conclusions

    Considering misclassified variants that have since been reclassified reveals our increasing understanding of rare genetic variation. We found that variant classification guidelines and allele frequency databases comprising genetically diverse samples are important factors in reclassification. We also discovered that ClinVar variants common in European and South Asian individuals were more likely to be reclassified to a lower confidence category, perhaps due to an increased chance of these variants being classified by multiple submitters. We discuss features for variant classification databases that would support their continued improvement.

     
    more » « less
  2. Buchner, David A. (Ed.)
    Several studies have found associations between higher pancreatic fat content and adverse health outcomes, such as diabetes and the metabolic syndrome, but investigations into the genetic contributions to pancreatic fat are limited. This genome-wide association study, comprised of 804 participants with MRI-assessed pancreatic fat measurements, was conducted in the ethnically diverse Multiethnic Cohort-Adiposity Phenotype Study (MEC-APS). Two genetic variants reaching genome-wide significance, rs73449607 on chromosome 13q21.2 (Beta = -0.67, P = 4.50x10 -8 ) and rs7996760 on chromosome 6q14 (Beta = -0.90, P = 4.91x10 -8 ) were associated with percent pancreatic fat on the log scale. Rs73449607 was most common in the African American population (13%) and rs79967607 was most common in the European American population (6%). Rs73449607 was also associated with lower risk of type 2 diabetes (OR = 0.95, 95% CI = 0.89–1.00, P = 0.047) in the Population Architecture Genomics and Epidemiology (PAGE) Study and the DIAbetes Genetics Replication and Meta-analysis (DIAGRAM), which included substantial numbers of non-European ancestry participants (53,102 cases and 193,679 controls). Rs73449607 is located in an intergenic region between GSX1 and PLUTO , and rs79967607 is in intron 1 of EPM2A . PLUTO , a lncRNA , regulates transcription of an adjacent gene, PDX1 , that controls beta-cell function in the mature pancreas, and EPM2A encodes the protein laforin, which plays a critical role in regulating glycogen production. If validated, these variants may suggest a genetic component for pancreatic fat and a common etiologic link between pancreatic fat and type 2 diabetes. 
    more » « less
  3. Abstract Objectives

    Height is a complex, highly heritable polygenic trait subject to both genetic composition and environmental influences. Recent studies suggest that a large proportion of height heritability is determined by the cumulative effect of many low allele frequency variants across the genome. Previous research has also identified an inverse relationship between height and runs of homozygosity (ROH); however, this has yet to be examined within African populations. We aim to identify this association within the Himba, an endogamous Namibian population who are recently bottlenecked, resulting in elevated haplotype sharing and increased homozygosity.

    Materials and Methods

    Here, we calculate the fraction of the genome composed of long runs of homozygosity (FROH) in a sample of 245 adults and use mixed effects models to assess its effect on height.

    Results

    We find that Himba adults exhibit increased homozygosity. However, in contrast to previous studies in other populations, we do not find a significant effect ofFROHon height within the Himba. We further estimated heritability of height, noting both an enrichment of distant relatives and greater developmental homogeneity across households; we find that (SE ± 0.146), comparable to estimates reported in Europeans.

    Discussion

    Our results may be due to other environmental variables we were not able to include, measurement error, or low statistical power, but may also imply that phenotypic expression resulting from increased homozygosity may vary from population to population.

     
    more » « less
  4. INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P < 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P < 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability. 
    more » « less
  5. Abstract Background Alzheimer’s disease (AD) is a complex neurodegenerative disorder and the most common type of dementia. AD is characterized by a decline of cognitive function and brain atrophy, and is highly heritable with estimated heritability ranging from 60 to 80 $$\%$$ % . The most straightforward and widely used strategy to identify AD genetic basis is to perform genome-wide association study (GWAS) of the case-control diagnostic status. These GWAS studies have identified over 50 AD related susceptibility loci. Recently, imaging genetics has emerged as a new field where brain imaging measures are studied as quantitative traits to detect genetic factors. Given that many imaging genetics studies did not involve the diagnostic outcome in the analysis, the identified imaging or genetic markers may not be related or specific to the disease outcome. Results We propose a novel method to identify disease-related genetic variants enriched by imaging endophenotypes, which are the imaging traits associated with both genetic factors and disease status. Our analysis consists of three steps: (1) map the effects of a genetic variant (e.g., single nucleotide polymorphism or SNP) onto imaging traits across the brain using a linear regression model, (2) map the effects of a diagnosis phenotype onto imaging traits across the brain using a linear regression model, and (3) detect SNP-diagnosis association via correlating the SNP effects with the diagnostic effects on the brain-wide imaging traits. We demonstrate the promise of our approach by applying it to the Alzheimer’s Disease Neuroimaging Initiative database. Among 54 AD related susceptibility loci reported in prior large-scale AD GWAS, our approach identifies 41 of those from a much smaller study cohort while the standard association approaches identify only two of those. Clearly, the proposed imaging endophenotype enriched approach can reveal promising AD genetic variants undetectable using the traditional method. Conclusion We have proposed a novel method to identify AD genetic variants enriched by brain-wide imaging endophenotypes. This approach can not only boost detection power, but also reveal interesting biological pathways from genetic determinants to intermediate brain traits and to phenotypic AD outcomes. 
    more » « less