NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Second-order group knockoffs with applications to genome-wide association studies

https://doi.org/10.1093/bioinformatics/btae580

Chu, Benjamin B; Gu, Jiaqi; Chen, Zhaomeng; Morrison, Tim; Candès, Emmanuel; He, Zihuai; Sabatti, Chiara (October 2024, Bioinformatics)
Gao, Xin (Ed.)
Abstract MotivationConditional testing via the knockoff framework allows one to identify—among a large number of possible explanatory variables—those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance. ResultsWhile conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct “group knockoffs.” While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. Availability and implementationThe described algorithms are implemented in an open-source Julia package Knockoffs.jl. R and Python wrappers are available as knockoffsr and knockoffspy packages.
more » « less
Full Text Available
Multivariate genome-wide association analysis by iterative hard thresholding

https://doi.org/10.1093/bioinformatics/btad193

Chu, Benjamin B; Ko, Seyoon; Zhou, Jin J; Jensen, Aubrey; Zhou, Hua; Sinsheimer, Janet S; Lange, Kenneth (April 2023, Bioinformatics)
Marschall, Tobias (Ed.)
Abstract MotivationIn a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive. ResultsWe present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA’s linear mixed models and mv-PLINK’s canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits. Availability and implementationSoftware, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.
more » « less
Full Text Available
Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets

https://doi.org/10.1016/j.ajhg.2022.12.008

Ko, Seyoon; Chu, Benjamin B.; Peterson, Daniel; Okenwa, Chidera; Papp, Jeanette C.; Alexander, David H.; Sobel, Eric M.; Zhou, Hua; Lange, Kenneth L. (February 2023, The American Journal of Human Genetics)

Full Text Available
A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl

https://doi.org/10.1093/bioinformatics/btab489

Chu, Benjamin B; Sobel, Eric M; Wasiolek, Rory; Ko, Seyoon; Sinsheimer, Janet S; Zhou, Hua; Lange, Kenneth (July 2021, Bioinformatics)
Kelso, Janet (Ed.)
Abstract Motivation Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models (HMMs). Existing programs all have essentially the same imputation accuracy, are computationally intensive and generally require prephasing the typed markers. Results We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for HMM calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs (single nucleotide polymorphisms). Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing. Availability and implementation Software, documentation and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelImpute.jl. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

Search for: All records