skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Computing rank‐revealing factorizations of matrices stored out‐of‐core
Summary This paper describes efficient algorithms for computing rank‐revealing factorizations of matrices that are too large to fit in main memory (RAM), and must instead be stored on slow external memory devices such as disks (out‐of‐core or out‐of‐memory). Traditional algorithms for computing rank‐revealing factorizations (such as the column pivoted QR factorization and the singular value decomposition) are very communication intensive as they require many vector‐vector and matrix‐vector operations, which become prohibitively expensive when data is not in RAM. Randomization allows to reformulate new methods so that large contiguous blocks of the matrix are processed in bulk. The paper describes two distinct methods. The first is a blocked version of column pivoted Householder QR, organized as a “left‐looking” method to minimize the number of the expensive write operations. The second method results employs a UTV factorization. It is organized as an algorithm‐by‐blocks to overlap computations and I/O operations. As it incorporates power iterations, it is much better at revealing the numerical rank. Numerical experiments on several computers demonstrate that the new algorithms are almost as fast when processing data stored on slow memory devices as traditional algorithms are for data stored in RAM.  more » « less
Award ID(s):
1952735
PAR ID:
10454845
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Concurrency and Computation: Practice and Experience
ISSN:
1532-0626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently proposed algorithm “randUTV” computes a full factorization of a given matrix that provides low-rank approximations with near-optimal error. Because the bulk of randUTV is cast in terms of communication-efficient operations such as matrix-matrix multiplication and unpivoted QR factorizations, it is faster than competing rank-revealing factorization methods such as column-pivoted QR in most high-performance computational settings. In this article, optimized randUTV implementations are presented for both shared-memory and distributed-memory computing environments. For shared memory, randUTV is redesigned in terms of an algorithm-by-blocks that, together with a runtime task scheduler, eliminates bottlenecks from data synchronization points to achieve acceleration over the standard blocked algorithm based on a purely fork-join approach. The distributed-memory implementation is based on the ScaLAPACK library. The performance of our new codes compares favorably with competing factorizations available on both shared-memory and distributed-memory architectures. 
    more » « less
  2. QR factorization is a key tool in mathematics, computer science, operations research, and engineering. This paper presents the roundoff-error-free (REF) QR factorization framework comprising integer-preserving versions of the standard and the thin QR factorizations and associated algorithms to compute them. Specifically, the standard REF QR factorization factors a given matrix $$A \in \Z^{m \times n}$$ as $A=QDR$, where $$Q \in \Z^{m \times m}$$ has pairwise orthogonal columns, $$D$$ is a diagonal matrix, and $$R \in \Z^{m \times n}$$ is an upper trapezoidal matrix; notably, the entries of $$Q$$ and $$R$$ are integral, while the entries of $$D$$ are reciprocals of integers. In the thin REF QR factorization, $$Q \in \Z^{m \times n}$$ also has pairwise orthogonal columns, and $$R \in \Z^{n \times n}$$ is also an upper triangular matrix. In contrast to traditional (i.e., floating-point) QR factorizations, every operation used to compute these factors is integral; thus, REF QR is guaranteed to be an exact orthogonal decomposition. Importantly, the bit-length of every entry in the REF QR factorizations (and within the algorithms to compute them) is bounded polynomially. Notable applications of our REF QR factorizations include finding exact least squares or exact basic solutions (i.e., a rational n-dimensional vector $$x$$) to any given full column rank or rank deficient linear system $A x = b$, respectively. In addition, our exact factorizations can be used as a subroutine within exact and/or high-precision quadratic programming. Altogether, REF QR provides a framework to obtain exact orthogonal factorizations of any rational matrix (as any rational/decimal matrix can be easily transformed into an integral matrix). 
    more » « less
  3. ABSTRACT Interpolative decompositions (ID) involve “natural bases” of row and column subsets, or skeletons, of a given matrix that approximately span its row and column spaces. Although finding optimal skeleton subsets is combinatorially hard, classical greedy pivoting algorithms with rank‐revealing properties like column‐pivoted QR (CPQR) often provide good heuristics in practice. To select skeletons efficiently for large matrices, randomized sketching is commonly leveraged as a preprocessing step to reduce the problem dimension while preserving essential information in the matrix. In addition to accelerating computations, randomization via sketching improves robustness against adversarial inputs while relaxing the rank‐revealing assumption on the pivoting scheme. This enables faster skeleton selection based on LU with partial pivoting (LUPP) as a reliable alternative to rank‐revealing pivoting methods like CPQR. However, while coupling sketching with LUPP provides an efficient solution for ID with a given rank, the lack of rank‐revealing properties of LUPP makes it challenging to adaptively determine a suitable rank without prior knowledge of the matrix spectrum. As a remedy, in this work, we introduce an adaptive randomized LUPP algorithm that approximates the desired rank via fast estimation of the residual error. The resulting algorithm is not only adaptive but also parallelizable, attaining much higher practical speed due to the lower communication requirements of LUPP over CPQR. The method has been implemented for both CPUs and GPUs, and the resulting software has been made publicly available. 
    more » « less
  4. null (Ed.)
    This survey describes probabilistic algorithms for linear algebraic computations, such as factorizing matrices and solving linear systems. It focuses on techniques that have a proven track record for real-world problems. The paper treats both the theoretical foundations of the subject and practical computational issues. Topics include norm estimation, matrix approximation by sampling, structured and unstructured random embeddings, linear regression problems, low-rank approximation, subspace iteration and Krylov methods, error estimation and adaptivity, interpolatory and CUR factorizations, Nyström approximation of positive semidefinite matrices, single-view (‘streaming’) algorithms, full rank-revealing factorizations, solvers for linear systems, and approximation of kernel matrices that arise in machine learning and in scientific computing. 
    more » « less
  5. To meet the growing need for extended or exact precision solvers, an efficient framework based on Integer-Preserving Gaussian Elimination (IPGE) has been recently developed, which includes dense/sparse LU/Cholesky factorizations and dense LU/Cholesky factorization updates for column and/or row replacement. This paper discusses our ongoing work developing the sparse LU/Cholesky column/row-replacement update and the sparse rank-l update/downdate. We first present some basic background for the exact factorization framework based on IPGE. Then we give our proposed algorithms along with some implementation and data-structure details. Finally, we provide some experimental results showcasing the performance of our update algorithms. Specifically, we show that updating these exact factorizations can typically be 10x to 100x faster than (re-)factorizing the matrices from scratch. 
    more » « less