skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A two‐phase Bayesian methodology for the analysis of binary phenotypes in genome‐wide association studies
Abstract Recent advances in sequencing and genotyping technologies are contributing to a data revolution in genome‐wide association studies that is characterized by the challenging largepsmallnproblem in statistics. That is, given these advances, many such studies now consider evaluating an extremely large number of genetic markers (p) genotyped on a small number of subjects (n). Given the dimension of the data, a joint analysis of the markers is often fraught with many challenges, while a marginal analysis is not sufficient. To overcome these obstacles, herein, we propose a Bayesian two‐phase methodology that can be used to jointly relate genetic markers to binary traits while controlling for confounding. The first phase of our approach makes use of a marginal scan to identify a reduced set of candidate markers that are then evaluated jointly via a hierarchical model in the second phase. Final marker selection is accomplished through identifying a sparse estimator via a novel and computationally efficient maximum a posteriori estimation technique. We evaluate the performance of the proposed approach through extensive numerical studies, and consider a genome‐wide application involving colorectal cancer.  more » « less
Award ID(s):
1826715
PAR ID:
10458722
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biometrical Journal
Volume:
62
Issue:
1
ISSN:
0323-3847
Page Range / eLocation ID:
p. 191-201
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The study of ecological speciation is inherently linked to the study of selection. Methods for estimating phenotypic selection within a generation based on associations between trait values and fitness (e.g. survival) of individuals are established. These methods attempt to disentangle selection acting directly on a trait from indirect selection caused by correlations with other traits via multivariate statistical approaches (i.e. inference of selection gradients). The estimation of selection on genotypic or genomic variation could also benefit from disentangling direct and indirect selection on genetic loci. However, achieving this goal is difficult with genomic data because the number of potentially correlated genetic loci (p) is very large relative to the number of individuals sampled (n). In other words, the number of model parameters exceeds the number of observations (p ≫ n). We present simulations examining the utility of whole‐genome regression approaches (i.e. Bayesian sparse linear mixed models) for quantifying direct selection in cases wherep ≫ n. Such models have been used for genome‐wide association mapping and are common in artificial breeding. Our results show they hold promise for studies of natural selection in the wild and thus of ecological speciation. But we also demonstrate important limitations to the approach and discuss study designs required for more robust inferences. 
    more » « less
  2. Abstract Background Alzheimer’s disease (AD) is a complex neurodegenerative disorder and the most common type of dementia. AD is characterized by a decline of cognitive function and brain atrophy, and is highly heritable with estimated heritability ranging from 60 to 80 $$\%$$ % . The most straightforward and widely used strategy to identify AD genetic basis is to perform genome-wide association study (GWAS) of the case-control diagnostic status. These GWAS studies have identified over 50 AD related susceptibility loci. Recently, imaging genetics has emerged as a new field where brain imaging measures are studied as quantitative traits to detect genetic factors. Given that many imaging genetics studies did not involve the diagnostic outcome in the analysis, the identified imaging or genetic markers may not be related or specific to the disease outcome. Results We propose a novel method to identify disease-related genetic variants enriched by imaging endophenotypes, which are the imaging traits associated with both genetic factors and disease status. Our analysis consists of three steps: (1) map the effects of a genetic variant (e.g., single nucleotide polymorphism or SNP) onto imaging traits across the brain using a linear regression model, (2) map the effects of a diagnosis phenotype onto imaging traits across the brain using a linear regression model, and (3) detect SNP-diagnosis association via correlating the SNP effects with the diagnostic effects on the brain-wide imaging traits. We demonstrate the promise of our approach by applying it to the Alzheimer’s Disease Neuroimaging Initiative database. Among 54 AD related susceptibility loci reported in prior large-scale AD GWAS, our approach identifies 41 of those from a much smaller study cohort while the standard association approaches identify only two of those. Clearly, the proposed imaging endophenotype enriched approach can reveal promising AD genetic variants undetectable using the traditional method. Conclusion We have proposed a novel method to identify AD genetic variants enriched by brain-wide imaging endophenotypes. This approach can not only boost detection power, but also reveal interesting biological pathways from genetic determinants to intermediate brain traits and to phenotypic AD outcomes. 
    more » « less
  3. Springer, Mark (Ed.)
    Abstract Despite the increasing feasibility of sequencing whole genomes from diverse taxa, a persistent problem in phylogenomics is the selection of appropriate genetic markers or loci for a given taxonomic group or research question. In this review, we aim to streamline the decision-making process when selecting specific markers to use in phylogenomic studies by introducing commonly used types of genomic markers, their evolutionary characteristics, and their associated uses in phylogenomics. Specifically, we review the utilities of ultraconserved elements (including flanking regions), anchored hybrid enrichment loci, conserved nonexonic elements, untranslated regions, introns, exons, mitochondrial DNA, single nucleotide polymorphisms, and anonymous regions (nonspecific regions that are evenly or randomly distributed across the genome). These various genomic elements and regions differ in their substitution rates, likelihood of neutrality or of being strongly linked to loci under selection, and mode of inheritance, each of which are important considerations in phylogenomic reconstruction. These features may give each type of marker important advantages and disadvantages depending on the biological question, number of taxa sampled, evolutionary timescale, cost effectiveness, and analytical methods used. We provide a concise outline as a resource to efficiently consider key aspects of each type of genetic marker. There are many factors to consider when designing phylogenomic studies, and this review may serve as a primer when weighing options between multiple potential phylogenomic markers. 
    more » « less
  4. Abstract Classical genetic studies have identified many cases of pleiotropy where mutations in individual genes alter many different phenotypes. Quantitative genetic studies of natural genetic variants frequently examine one or a few traits, limiting their potential to identify pleiotropic effects of natural genetic variants. Widely adopted community association panels have been employed by plant genetics communities to study the genetic basis of naturally occurring phenotypic variation in a wide range of traits. High-density genetic marker data—18M markers—from 2 partially overlapping maize association panels comprising 1,014 unique genotypes grown in field trials across at least 7 US states and scored for 162 distinct trait data sets enabled the identification of of 2,154 suggestive marker-trait associations and 697 confident associations in the maize genome using a resampling-based genome-wide association strategy. The precision of individual marker-trait associations was estimated to be 3 genes based on a reference set of genes with known phenotypes. Examples were observed of both genetic loci associated with variation in diverse traits (e.g., above-ground and below-ground traits), as well as individual loci associated with the same or similar traits across diverse environments. Many significant signals are located near genes whose functions were previously entirely unknown or estimated purely via functional data on homologs. This study demonstrates the potential of mining community association panel data using new higher-density genetic marker sets combined with resampling-based genome-wide association tests to develop testable hypotheses about gene functions, identify potential pleiotropic effects of natural genetic variants, and study genotype-by-environment interaction. 
    more » « less
  5. IntroductionGene expression is often controlled via cis-regulatory elements (CREs) that modulate the production of transcripts. For multi-gene genetic engineering and synthetic biology, precise control of transcription is crucial, both to insulate the transgenes from unwanted native regulation and to prevent readthrough or cross-regulation of transgenes within a multi-gene cassette. To prevent this activity, insulator-like elements, more properly referred to as transcriptional blockers, could be inserted to separate the transgenes so that they are independently regulated. However, only a few validated insulator-like elements are available for plants, and they tend to be larger than ideal. MethodsTo identify additional potential insulator-like sequences, we conducted a genome-wide analysis ofUtricularia gibba(humped bladderwort), one of the smallest known plant genomes, with genes that are naturally close together. The 10 best insulator-like candidates were evaluated in vivo for insulator-like activity. ResultsWe identified a total of 4,656 intergenic regions with expression profiles suggesting insulator-like activity. Comparisons of these regions across 45 other plant species (representing Monocots, Asterids, and Rosids) show low levels of syntenic conservation of these regions. Genome-wide analysis of unmethylated regions (UMRs) indicates ~87% of the targeted regions are unmethylated; however, interpretation of this is complicated becauseU. gibbahas remarkably low levels of methylation across the genome, so that large UMRs frequently extend over multiple genes and intergenic spaces. We also could not identify any conserved motifs among our selected intergenic regions or shared with existing insulator-like elements for plants. Despite this lack of conservation, however, testing of 10 selected intergenic regions for insulator-like activity found two elements on par with a previously published element (EXOB) while being significantly smaller. DiscussionGiven the small number of insulator-like elements currently available for plants, our results make a significant addition to available tools. The high hit rate (2 out of 10) also implies that more useful sequences are likely present in our selected intergenic regions; additional validation work will be required to identify which will be most useful for plant genetic engineering. 
    more » « less