Abstract Along with the development of high-throughput sequencing technologies, both sample size and SNP number are increasing rapidly in genome-wide association studies (GWAS), and the associated computation is more challenging than ever. Here, we present a memory-efficient, visualization-enhanced, and parallel-accelerated R package called “rMVP” to address the need for improved GWAS computation. rMVP can 1) effectively process large GWAS data, 2) rapidly evaluate population structure, 3) efficiently estimate variance components by Efficient Mixed-Model Association eXpedited (EMMAX), Factored Spectrally Transformed Linear Mixed Models (FaST-LMM), and Haseman-Elston (HE) regression algorithms, 4) implement parallel-accelerated association tests of markers using general linear model (GLM), mixed linear model (MLM), and fixed and random model circulating probability unification (FarmCPU) methods, 5) compute fast with a globally efficient design in the GWAS processes, and 6) generate various visualizations of GWAS-related information. Accelerated by block matrix multiplication strategy and multiple threads, the association test methods embedded in rMVP are significantly faster than PLINK, GEMMA, and FarmCPU_pkg. rMVP is freely available at https://github.com/xiaolei-lab/rMVP.
more »
« less
Matrix sketching framework for linear mixed models in association studies
Linear mixed models (LMMs) have been widely used in genome-wide association studies to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relationship matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveragingmatrix sketching, which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method calledMatrix-Sketching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to the current state-of-the-art for simulated traits and complex diseases.
more »
« less
- Award ID(s):
- 2152687
- PAR ID:
- 10613477
- Publisher / Repository:
- Genome Research
- Date Published:
- Journal Name:
- Genome Research
- Volume:
- 34
- Issue:
- 9
- ISSN:
- 1088-9051
- Page Range / eLocation ID:
- 1304 to 1311
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a fast algorithm for computing the entire ridge regression regularization path in nearly linear time. Our method constructs a basis on which the solution of ridge regression can be computed instantly for any value of the regularization parameter. Consequently, linear models can be tuned via cross-validation or other risk estimation strategies with substantially better efciency. The algorithm is based on iteratively sketching the Krylov subspace with a binomial decomposition over the regularization path. We provide a convergence analysis with various sketching matrices and show that it improves the state-of-the-art computational complexity. We also provide a technique to adaptively estimate the sketching dimension. This algorithm works for both the over-determined and under-determined problems. We also provide an extension for matrix-valued ridge regression. The numerical results on real medium and large-scale ridge regression tasks illustrate the efectiveness of the proposed method compared to standard baselines which require super-linear computational time.more » « less
-
Paczkowski, Jon (Ed.)Many pathogenic bacteria form biofilms as a protective measure against environmental and host hazards. The underlying structure of the biofilm matrix consists of secreted macromolecules, often including exopolysaccharides. To escape the biofilm, bacteria may produce a number of matrix-degrading enzymes, including glycosidic enzymes that digest exopolysaccharide scaffolds. The human pathogenVibrio choleraeassembles and secretes an exopolysaccharide called VPS (Vibriopolysaccharide) which is essential in most cases for the formation of biofilms and consists of a repeating tetrasaccharide unit. Previous studies have indicated that a secreted glycosidase called RbmB is involved inV.choleraebiofilm dispersal, although the mechanism by which this occurs is not understood. To approach the question of RbmB function, we recombinantly expressed and purified RbmB and tested its activity against purified VPS. Using a fluorescence-based biochemical assay, we show that RbmB specifically cleaves VPSin vitrounder physiological conditions. Analysis of the cleavage process using mass spectrometry, solid-state NMR, and solution NMR indicates that RbmB cleaves VPS at a specific site (at the α-1,4 linkage between D-galactose and a modified L-gulose) into a mixture of tetramers and octamers. We demonstrate that the product of the cleavage contains a double bond in the modified guluronic acid ring, strongly suggesting that RbmB is cleaving using a glycoside lyase mechanism. Finally, we show that recombinant RbmB fromV.choleraeand the related aquatic speciesVibrio coralliilyticusare both able to disrupt livingV.choleraebiofilms. Our results support the role of RbmB as a polysaccharide lyase involved in biofilm dispersal, as well as an additional glycolytic enzyme to add to the toolbox of potential therapeutic antibacterial enzymes.more » « less
-
ABSTRACT Interpolative decompositions (ID) involve “natural bases” of row and column subsets, or skeletons, of a given matrix that approximately span its row and column spaces. Although finding optimal skeleton subsets is combinatorially hard, classical greedy pivoting algorithms with rank‐revealing properties like column‐pivoted QR (CPQR) often provide good heuristics in practice. To select skeletons efficiently for large matrices, randomized sketching is commonly leveraged as a preprocessing step to reduce the problem dimension while preserving essential information in the matrix. In addition to accelerating computations, randomization via sketching improves robustness against adversarial inputs while relaxing the rank‐revealing assumption on the pivoting scheme. This enables faster skeleton selection based on LU with partial pivoting (LUPP) as a reliable alternative to rank‐revealing pivoting methods like CPQR. However, while coupling sketching with LUPP provides an efficient solution for ID with a given rank, the lack of rank‐revealing properties of LUPP makes it challenging to adaptively determine a suitable rank without prior knowledge of the matrix spectrum. As a remedy, in this work, we introduce an adaptive randomized LUPP algorithm that approximates the desired rank via fast estimation of the residual error. The resulting algorithm is not only adaptive but also parallelizable, attaining much higher practical speed due to the lower communication requirements of LUPP over CPQR. The method has been implemented for both CPUs and GPUs, and the resulting software has been made publicly available.more » « less
-
Abstract In recent years, wave-based analog computing has been at the center of attention for providing ultra-fast and power-efficient signal processing enabled by wave propagation through artificially engineered structures. Building on these structures, various proposals have been put forward for performing computations with waves. Most of these proposals have been aimed at linear operations, such as vector-matrix multiplications. The weak and hardly controllable nonlinear response of electromagnetic materials imposes challenges in the design of wave-based structures for performing nonlinear operations. In the present work, first, by using the method of inverse design we propose a three-port device, which consists of a combination of linear and Kerr nonlinear materials, exhibiting the desired power-dependent transmission properties. Then, combining a proper arrangement of such devices with a collection of Mach–Zehnder interferometers (MZIs), we propose a reconfigurable nonlinear optical architecture capable of implementing a variety of nonlinear functions of the input signal. The proposed device may pave the way for wave-based reconfigurable nonlinear signal processing that can be combined with linear networks for full-fledged wave-based analog computing.more » « less
An official website of the United States government

