skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden
Abstract Background

Curated databases of genetic variants assist clinicians and researchers in interpreting genetic variation. Yet, these databases contain some misclassified variants. It is unclear whether variant misclassification is abating as these databases rapidly grow and implement new guidelines.

Methods

Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over 6 years, across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were classified by the databases as pathogenic. Due to the rarity of IEMs, nearly all such classified pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD.

Results

While the false-positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently imply two orders of magnitude more affected individuals in 1KGP than ClinVar variants. We observed that African ancestry individuals have a significantly increased chance of being incorrectly indicated to be affected by a screened IEM when HGMD variants are used. However, this bias affecting genomes of African ancestry was no longer significant once common variants were removed in accordance with recent variant classification guidelines. We discovered that ClinVar variants classified as Pathogenic or Likely Pathogenic are reclassified sixfold more often than DM or DM? variants in HGMD, which has likely resulted in ClinVar’s lower false-positive rate.

Conclusions

Considering misclassified variants that have since been reclassified reveals our increasing understanding of rare genetic variation. We found that variant classification guidelines and allele frequency databases comprising genetically diverse samples are important factors in reclassification. We also discovered that ClinVar variants common in European and South Asian individuals were more likely to be reclassified to a lower confidence category, perhaps due to an increased chance of these variants being classified by multiple submitters. We discuss features for variant classification databases that would support their continued improvement.

 
more » « less
Award ID(s):
2109912
NSF-PAR ID:
10431579
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Medicine
Volume:
15
Issue:
1
ISSN:
1756-994X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    The goal of this study is to evaluate germline genetic variants in African American men with metastatic prostate cancer as compared to those in Caucasian men with metastatic prostate cancer in an effort to understand the role of genetic factors in these populations.

    Methods

    African American and Caucasian men with metastatic prostate cancer who had germline testing using multigene panels were used to generate comparisons. Germline genetic results, clinical parameters, and family histories between the two populations were analyzed.

    Results

    A total of 867 patients were included in this retrospective study, including 188 African American and 669 Caucasian patients. There was no significant difference in the likelihood of a pathogenic or likely‐pathogenic variants (PV/LPVs) between African American and Caucasian patients (p = .09). African American patients were more likely to have a variant of unknown significance than Caucasians (odds ratio [OR] = 1.95;p < .0001). BRCA1 PV/LPVs were higher in African Americans (OR = 4.86;p = .04). African American patients were less likely to have a PV/LPV in non‐BRCA DNA repair genes (OR = 0.30;p = .008). Family history of breast (OR = 2.09;p = .002) or ovarian cancer (OR = 2.33;p = .04) predicted PV/LPVs in Caucasians but not African‐Americans. This underscores the limitations of family history in AA men and the importance of personal history to guide germline testing in AA men.

    Conclusions

    In metastatic prostate cancer patients, PV/LPVs of tested genes did not vary by race, BRCA1 PV/LPVs were more common in the African American subset. However, PV/LPVs in non‐BRCA DNA repair genes were less likely to be encountered in African Americans. Family history associated with genetic testing results in Caucasians only.

     
    more » « less
  2. Abstract Background

    Up to one of every six individuals diagnosed with one cancer will be diagnosed with a second primary cancer in their lifetime. Genetic factors contributing to the development of multiple primary cancers, beyond known cancer syndromes, have been underexplored.

    Methods

    To characterize genetic susceptibility to multiple cancers, we conducted a pan-cancer, whole-exome sequencing study of individuals drawn from two large multi-ancestry populations (6429 cases, 165,853 controls). We created two groupings of individuals diagnosed with multiple primary cancers: (1) an overall combined set with at least two cancers across any of 36 organ sites and (2) cancer-specific sets defined by an index cancer at one of 16 organ sites with at least 50 cases from each study population. We then investigated whether variants identified from exome sequencing were associated with these sets of multiple cancer cases in comparison to individuals with one and, separately, no cancers.

    Results

    We identified 22 variant-phenotype associations, 10 of which have not been previously discovered and were significantly overrepresented among individuals with multiple cancers, compared to those with a single cancer.

    Conclusions

    Overall, we describe variants and genes that may play a fundamental role in the development of multiple primary cancers and improve our understanding of shared mechanisms underlying carcinogenesis.

     
    more » « less
  3. Abstract Rationale: Genetic variation has a substantial contribution to chronic obstructive pulmonary disease (COPD) and lung function measurements. Heritability estimates using genome-wide genotyping data can be biased if analyses do not appropriately account for the nonuniform distribution of genetic effects across the allele frequency and linkage disequilibrium (LD) spectrum. In addition, the contribution of rare variants has been unclear. Objectives: We sought to assess the heritability of COPD and lung function using whole-genome sequence data from the Trans-Omics for Precision Medicine program. Methods: Using the genome-based restricted maximum likelihood method, we partitioned the genome into bins based on minor allele frequency and LD scores and estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio in 11 051 European ancestry and 5853 African-American participants. Measurements and Main Results: In European ancestry participants, the estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio were 35.5%, 55.6% and 32.5%, of which 18.8%, 19.7%, 17.8% were from common variants, and 16.6%, 35.8%, and 14.6% were from rare variants. These estimates had wide confidence intervals, with common variants and some sets of rare variants showing a statistically significant contribution (P-value < 0.05). In African-Americans, common variant heritability was similar to European ancestry participants, but lower sample size precluded calculation of rare variant heritability. Conclusions: Our study provides updated and unbiased estimates of heritability for COPD and lung function, and suggests an important contribution of rare variants. Larger studies of more diverse ancestry will improve accuracy of these estimates. 
    more » « less
  4. ABSTRACT Background

    Epistasis and gene‐environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene‐environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies.

    Results

    In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at. We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for lowP‐values ().

    Conclusion

    We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits.

     
    more » « less
  5. Summary Objective

    Copy number variations (CNVs) represent a significant genetic risk for several neurodevelopmental disorders including epilepsy. As knowledge increases, reanalysis of existing data is essential. Reliable estimates of the contribution ofCNVs to epilepsies from sizeable populations are not available.

    Methods

    We assembled a cohort of 1255 patients with preexisting array comparative genomic hybridization or single nucleotide polymorphism array basedCNVdata. All patients had “epilepsy plus,” defined as epilepsy with comorbid features, including intellectual disability, psychiatric symptoms, and other neurological and nonneurological features.CNVclassification was conducted using a systematic filtering workflow adapted to epilepsy.

    Results

    Of 1097 patients remaining after genetic data quality control, 120 individuals (10.9%) carried at least one autosomalCNVclassified as pathogenic; 19 individuals (1.7%) carried at least one autosomalCNVclassified as possibly pathogenic. Eleven patients (1%) carried more than one (possibly) pathogenicCNV. We identifiedCNVs covering recently reported (HNRNPU)or emerging (RORB) epilepsy genes, and further delineated the phenotype associated with mutations of these genes. Additional novel epilepsy candidate genes emerge from our study. Comparing phenotypic features of pathogenicCNVcarriers to those of noncarriers of pathogenicCNVs, we show that patients with nonneurological comorbidities, especially dysmorphism, were more likely to carry pathogenicCNVs (odds ratio = 4.09, confidence interval = 2.51‐6.68;P = 2.34 × 10−9). Meta‐analysis including data from published control groups showed that the presence or absence of epilepsy did not affect the detected frequency ofCNVs.

    Significance

    The use of a specifically adapted workflow enabled identification of pathogenic autosomalCNVs in 10.9% of patients with epilepsy plus, which rose to 12.7% when we also considered possibly pathogenicCNVs. Our data indicate that epilepsy with comorbid features should be considered an indication for patients to be selected for a diagnostic algorithm includingCNVdetection. Collaborative large‐scaleCNVreanalysis leads to novel declaration of pathogenicity in unexplained cases and can promote discovery of promising candidate epilepsy genes.

     
    more » « less