skip to main content


Title: Integrative modeling of transmitted and de novo variants identifies novel risk genes for congenital heart disease
Background

Whole‐exome sequencing (WES) studies have identified multiple genes enriched forde novomutations (DNMs) in congenital heart disease (CHD) probands. However, risk gene identification based on DNMs alone remains statistically challenging due to heterogenous etiology of CHD and low mutation rate in each gene.

Methods

In this manuscript, we introduce a hierarchical Bayesian framework for gene‐level association test which jointly analyzesde novoand rare transmitted variants. Through integrative modeling of multiple types of genetic variants, gene‐level annotations, and reference data from large population cohorts, our method accurately characterizes the expected frequencies of bothde novoand transmitted variants and shows improved statistical power compared to analyses based on DNMs only.

Results

Applied to WES data of 2,645 CHD proband‐parent trios, our method identified 15 significant genes, half of which are novel, leading to new insights into the genetic bases of CHD.

Conclusion

These results showcase the power of integrative analysis of transmitted andde novovariants for disease gene discovery.

 
more » « less
NSF-PAR ID:
10474507
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Quantitative Biology
Volume:
9
Issue:
2
ISSN:
2095-4689
Format(s):
Medium: X Size: p. 216-227
Size(s):
["p. 216-227"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    CpG sites within the same genomic region often share similar methylation patterns and tend to be co-regulated by multiple genetic variants that may interact with one another.

    Results

    We propose a multi-trait methylation random field (multi-MRF) method to evaluate the joint association between a set of CpG sites and a set of genetic variants. The proposed method has several advantages. First, it is a multi-trait method that allows flexible correlation structures between neighboring CpG sites (e.g. distance-based correlation). Second, it is also a multi-locus method that integrates the effect of multiple common and rare genetic variants. Third, it models the methylation traits with a beta distribution to characterize their bimodal and interval properties. Through simulations, we demonstrated that the proposed method had improved power over some existing methods under various disease scenarios. We further illustrated the proposed method via an application to a study of congenital heart defects (CHDs) with 83 cardiac tissue samples. Our results suggested that gene BACE2, a methylation quantitative trait locus (QTL) candidate, colocalized with expression QTLs in artery tibial and harbored genetic variants with nominal significant associations in two genome-wide association studies of CHD.

    Availability and implementation

    https://github.com/chenlyu2656/Multi-MRF.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Background

    Co‐occurrence of two genetic diseases is challenging for accurate diagnosis and genetic counseling. The recent availability of whole exome sequencing (WES) has dramatically improved the molecular diagnosis of rare genetic diseases in particular in consanguineous populations.

    Methods

    We report here on a consanguineous family from Southern Tunisia including three members affected with congenital ichthyosis. The index case had a hearing loss (HL) and ichthyosis and was primarily suspected as suffering from keratitis‐ichthyosis‐deafness (KID) syndrome.WESwas performed for the index case, and all members of the nuclear family were sequenced (Sanger method).

    Results

    TheWESapproach allowed the identification of two strong candidate variants in two different genes; a missense mutation c.1334T>G (p.Leu445Trp) in exon 11 ofSLC26A4gene, associated with isolatedHLand a novel missense mutation c.728G>T (p.Arg243Leu) in exon 8 ofCYP4F22gene likely responsible for ichthyosis. These two mutations were predicted to be pathogenic by three pathogenicity prediction softwares (Scale‐Invariant Feature Transform [SIFT], Polymorphism Phenotyping [PolyPhen], Mutation Taster) to underlie theHLand ichthyosis, respectively.

    Conclusions

    The present study raises awareness about the importance of familial history for accurate diagnosis of syndromic genetic diseases and differential diagnosis with co‐occurrence of two distinct clinical entities. In addition, in countries with limited resources,WESsequencing for a single individual provides a cost effective tool for molecular diagnosis confirmation and genetic counseling.

     
    more » « less
  3. Abstract Motivation

    Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed and produce functional proteins.

    Results

    We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and non-coding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or non-coding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products and we propose that they may commonly act as cryptic factors in disease.

    Availability and implementation

    The software is available from geneprediction.org/SGRF.

    Supplementary information

    Supplementary information is available at Bioinformatics online.

     
    more » « less
  4. ABSTRACT Genome-wide association studies (GWAS) can identify genetic variants responsible for naturally occurring and quantitative phenotypic variation. Association studies therefore provide a powerful complement to approaches that rely on de novo mutations for characterizing gene function. Although bacteria should be amenable to GWAS, few GWAS have been conducted on bacteria, and the extent to which nonindependence among genomic variants (e.g., linkage disequilibrium [LD]) and the genetic architecture of phenotypic traits will affect GWAS performance is unclear. We apply association analyses to identify candidate genes underlying variation in 20 biochemical, growth, and symbiotic phenotypes among 153 strains of Ensifer meliloti . For 11 traits, we find genotype-phenotype associations that are stronger than expected by chance, with the candidates in relatively small linkage groups, indicating that LD does not preclude resolving association candidates to relatively small genomic regions. The significant candidates show an enrichment for nucleotide polymorphisms (SNPs) over gene presence-absence variation (PAV), and for five traits, candidates are enriched in large linkage groups, a possible signature of epistasis. Many of the variants most strongly associated with symbiosis phenotypes were in genes previously identified as being involved in nitrogen fixation or nodulation. For other traits, apparently strong associations were not stronger than the range of associations detected in permuted data. In sum, our data show that GWAS in bacteria may be a powerful tool for characterizing genetic architecture and identifying genes responsible for phenotypic variation. However, careful evaluation of candidates is necessary to avoid false signals of association. IMPORTANCE Genome-wide association analyses are a powerful approach for identifying gene function. These analyses are becoming commonplace in studies of humans, domesticated animals, and crop plants but have rarely been conducted in bacteria. We applied association analyses to 20 traits measured in Ensifer meliloti , an agriculturally and ecologically important bacterium because it fixes nitrogen when in symbiosis with leguminous plants. We identified candidate alleles and gene presence-absence variants underlying variation in symbiosis traits, antibiotic resistance, and use of various carbon sources; some of these candidates are in genes previously known to affect these traits whereas others were in genes that have not been well characterized. Our results point to the potential power of association analyses in bacteria, but also to the need to carefully evaluate the potential for false associations. 
    more » « less
  5. Genomic structural variants (SVs) can play important roles in adaptation and speciation. Yet the overall fitness effects of SVs are poorly understood, partly because accurate population-level identification of SVs requires multiple high-quality genome assemblies. Here, we use 31 chromosome-scale, haplotype-resolved genome assemblies ofTheobroma cacao—an outcrossing, long-lived tree species that is the source of chocolate—to investigate the fitness consequences of SVs in natural populations. Among the 31 accessions, we find over 160,000 SVs, which together cover eight times more of the genome than single-nucleotide polymorphisms and short indels (125 versus 15 Mb). Our results indicate that a vast majority of these SVs are deleterious: they segregate at low frequencies and are depleted from functional regions of the genome. We show that SVs influence gene expression, which likely impairs gene function and contributes to the detrimental effects of SVs. We also provide empirical support for a theoretical prediction that SVs, particularly inversions, increase genetic load through the accumulation of deleterious nucleotide variants as a result of suppressed recombination. Despite the overall detrimental effects, we identify individual SVs bearing signatures of local adaptation, several of which are associated with genes differentially expressed between populations. Genes involved in pathogen resistance are strongly enriched among these candidates, highlighting the contribution of SVs to this important local adaptation trait. Beyond revealing empirical evidence for the evolutionary importance of SVs, these 31 de novo assemblies provide a valuable resource for genetic and breeding studies inT.cacao.

     
    more » « less