skip to main content

Search for: All records

Creators/Authors contains: "Bates, Stephen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available February 1, 2024
  2. We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets. 
    more » « less
  3. We introduce a method to draw causal inferences—inferences immune to all possible confounding—from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.

    more » « less
  4. null (Ed.)
  5. Larochelle, H ; Ranzato, M ; null ; null ; null ; Lin, H. (Ed.)
  6. Abstract

    In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report onKnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we applyKnockoffZoomto data from 350k subjects in the UK Biobank and report many new findings.

    more » « less