Abstract Along with the development of high-throughput sequencing technologies, both sample size and SNP number are increasing rapidly in genome-wide association studies (GWAS), and the associated computation is more challenging than ever. Here, we present a memory-efficient, visualization-enhanced, and parallel-accelerated R package called “rMVP” to address the need for improved GWAS computation. rMVP can 1) effectively process large GWAS data, 2) rapidly evaluate population structure, 3) efficiently estimate variance components by Efficient Mixed-Model Association eXpedited (EMMAX), Factored Spectrally Transformed Linear Mixed Models (FaST-LMM), and Haseman-Elston (HE) regression algorithms, 4) implement parallel-accelerated association tests of markers using general linear model (GLM), mixed linear model (MLM), and fixed and random model circulating probability unification (FarmCPU) methods, 5) compute fast with a globally efficient design in the GWAS processes, and 6) generate various visualizations of GWAS-related information. Accelerated by block matrix multiplication strategy and multiple threads, the association test methods embedded in rMVP are significantly faster than PLINK, GEMMA, and FarmCPU_pkg. rMVP is freely available at https://github.com/xiaolei-lab/rMVP.
more »
« less
Matrix sketching framework for linear mixed models in association studies
Linear mixed models (LMMs) have been widely used in genome-wide association studies to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relationship matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveragingmatrix sketching, which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method calledMatrix-Sketching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to the current state-of-the-art for simulated traits and complex diseases.
more »
« less
- Award ID(s):
- 2152687
- PAR ID:
- 10613477
- Publisher / Repository:
- Genome Research
- Date Published:
- Journal Name:
- Genome Research
- Volume:
- 34
- Issue:
- 9
- ISSN:
- 1088-9051
- Page Range / eLocation ID:
- 1304 to 1311
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a fast algorithm for computing the entire ridge regression regularization path in nearly linear time. Our method constructs a basis on which the solution of ridge regression can be computed instantly for any value of the regularization parameter. Consequently, linear models can be tuned via cross-validation or other risk estimation strategies with substantially better efciency. The algorithm is based on iteratively sketching the Krylov subspace with a binomial decomposition over the regularization path. We provide a convergence analysis with various sketching matrices and show that it improves the state-of-the-art computational complexity. We also provide a technique to adaptively estimate the sketching dimension. This algorithm works for both the over-determined and under-determined problems. We also provide an extension for matrix-valued ridge regression. The numerical results on real medium and large-scale ridge regression tasks illustrate the efectiveness of the proposed method compared to standard baselines which require super-linear computational time.more » « less
-
ABSTRACT Interpolative decompositions (ID) involve “natural bases” of row and column subsets, or skeletons, of a given matrix that approximately span its row and column spaces. Although finding optimal skeleton subsets is combinatorially hard, classical greedy pivoting algorithms with rank‐revealing properties like column‐pivoted QR (CPQR) often provide good heuristics in practice. To select skeletons efficiently for large matrices, randomized sketching is commonly leveraged as a preprocessing step to reduce the problem dimension while preserving essential information in the matrix. In addition to accelerating computations, randomization via sketching improves robustness against adversarial inputs while relaxing the rank‐revealing assumption on the pivoting scheme. This enables faster skeleton selection based on LU with partial pivoting (LUPP) as a reliable alternative to rank‐revealing pivoting methods like CPQR. However, while coupling sketching with LUPP provides an efficient solution for ID with a given rank, the lack of rank‐revealing properties of LUPP makes it challenging to adaptively determine a suitable rank without prior knowledge of the matrix spectrum. As a remedy, in this work, we introduce an adaptive randomized LUPP algorithm that approximates the desired rank via fast estimation of the residual error. The resulting algorithm is not only adaptive but also parallelizable, attaining much higher practical speed due to the lower communication requirements of LUPP over CPQR. The method has been implemented for both CPUs and GPUs, and the resulting software has been made publicly available.more » « less
-
Abstract In recent years, wave-based analog computing has been at the center of attention for providing ultra-fast and power-efficient signal processing enabled by wave propagation through artificially engineered structures. Building on these structures, various proposals have been put forward for performing computations with waves. Most of these proposals have been aimed at linear operations, such as vector-matrix multiplications. The weak and hardly controllable nonlinear response of electromagnetic materials imposes challenges in the design of wave-based structures for performing nonlinear operations. In the present work, first, by using the method of inverse design we propose a three-port device, which consists of a combination of linear and Kerr nonlinear materials, exhibiting the desired power-dependent transmission properties. Then, combining a proper arrangement of such devices with a collection of Mach–Zehnder interferometers (MZIs), we propose a reconfigurable nonlinear optical architecture capable of implementing a variety of nonlinear functions of the input signal. The proposed device may pave the way for wave-based reconfigurable nonlinear signal processing that can be combined with linear networks for full-fledged wave-based analog computing.more » « less
-
Trent, M Stephen; Konovalova, Anna (Ed.)ABSTRACT Almost all integral membrane proteins that reside in the outer membrane (OM) of gram-negative bacteria contain a closed amphipathic β sheet (“β barrel”) that serves as a membrane anchor. The membrane integration of β barrel structures is catalyzed by a highly conserved heterooligomer called thebarrelassemblymachine (BAM). Although charged residues that are exposed to the lipid bilayer are infrequently found in outer membrane protein β barrels, the β barrels of OmpC/OmpF-type trimeric porins produced by Enterobacterales contain multiple conserved lipid-facing basic residues located near the extracellular side of the OM. Here, we show that these residues are required for the efficient insertion of theEscherichia coliOmpC protein into the OMin vivo. We found that the mutation of multiple basic residues to glutamine or alanine slowed insertion and reduced insertion efficiency. Furthermore, molecular dynamics simulations provided evidence that the basic residues promote the formation of hydrogen bonds and salt bridges with lipopolysaccharide, a unique glycolipid located exclusively in the outer leaflet of the OM. Taken together, our results support a model in which hydrophilic interactions between OmpC and LPS help to anchor the protein in the OM when the local environment is perturbed by BAM during membrane insertion and suggest a surprising role for membrane lipids in the insertion reaction.IMPORTANCEThe assembly (folding and membrane insertion) of bacterial outer membrane proteins (OMPs) is an essential cellular process that is a potential target for novel antibiotics. A heterooligomer called thebarrelassemblymachine (BAM) plays a major role in catalyzing OMP assembly. Here, we show that a group of highly conserved lipid-facing basic residues inEscherichia coliOmpC, a member of a major family of abundant OMPs known as trimeric porins, is required for the efficient integration of the protein into the outer membrane (OM). Based on our work and previous studies, we propose that the basic residues form interactions with a unique OM lipid (lipopolysaccharide) that promotes the insertion reaction. Our results provide strong evidence that interactions between specific membrane lipids and at least a subset of OMPs are required to supplement the activity of BAM and facilitate the integration of the proteins into the membrane.more » « less
An official website of the United States government

