- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- The International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
- Page Range or eLocation-ID:
- 3933 to 3937
- Sponsoring Org:
- National Science Foundation
More Like this
We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.
Tucker decomposition is a low-rank tensor approximation that generalizes a truncated matrix singular value decomposition (SVD). Existing parallel software has shown that Tucker decomposition is particularly effective at compressing terabyte-sized multidimensional scientific simulation datasets, computing reduced representations that satisfy a specified approximation error. The general approach is to get a low-rank approximation of the input data by performing a sequence of matrix SVDs of tensor unfoldings, which tend to be short-fat matrices. In the existing approach, the SVD is performed by computing the eigendecomposition of the Gram matrix of the unfolding. This method sacrifices some numerical stability in exchange for lower computation costs and easier parallelization. We propose using a more numerically stable though more computationally expensive way to compute the SVD by pre- processing with a QR decomposition step and computing an SVD of only the small triangular factor. The more numerically stable approach allows us to achieve the same accuracy with half the working precision (for example, single rather than double precision). We demonstrate that our method scales as well as the existing approach, and the use of lower precision leads to an overall reduction in running time of up to a factor of 2 when using 10smore »
GPU-DFC: A GPU-Based Parallel Algorithm for Computing Dynamic-Functional Connectivity of Big fMRI DataStudying dynamic-functional connectivity (DFC) using fMRI data of the brain gives much richer information to neuroscientists than studying the brain as a static entity. Mining of dynamic connectivity graphs from these brain studies can be used to classify diseased versus healthy brains. However, constructing and mining dynamic-functional connectivity graphs of the brain can be time consuming due to size of fMRI data. In this paper, we propose a highly scalable GPU-based parallel algorithm called GPU-DFC for computing dynamic-functional connectivity of fMRI data both at region and voxel level. Our algorithm exploits sparsification of correlation matrix and stores them in CSR format. Further reduction in the correlation matrix is achieved by parallel decomposition techniques. Our GPU-DFC algorithm achieves 2 times speed-up for computing dynamic correlations compared to state-of-the-art GPU-based techniques and more than 40 times compared to a sequential CPU version. In terms of storage, our proposed matrix decomposition technique reduces the size of correlation matrices more than 100 times. Reconstructed values from decomposed matrices show comparable results as compared to the correlations with original data. The implemented code is available as GPL license on GitHub portal of our lab (https://github.com/pcdslab/GPU-DFC).
We study the joint low-rank factorization of the matrices X=[A B]G and Y=[A C]H, in which the columns of the shared factor matrix A correspond to vectorized rank-one matrices, the unshared factors B and C have full column rank, and the matrices G and H have full row rank. The objective is to find the shared factor A, given only X and Y. We first explain that if the matrix [A B C] has full column rank, then a basis for the column space of the shared factor matrix A can be obtained from the null space of the matrix [X Y]. This in turn implies that the problem of finding the shared factor matrix A boils down to a basic Canonical Polyadic Decomposition (CPD) problem that in many cases can directly be solved by means of an eigenvalue decomposition. Next, we explain that by taking the rank-one constraint of the columns of the shared factor matrix A into account when computing the null space of the matrix [X Y], more relaxed identifiability conditions can be obtained that do not require that [A B C] has full column rank. The benefit of the unconstrained null space approach is that itmore »
For a reduced rank multivariate stochastic regression model of rank r*, the regression coefficient matrix can be expressed as a sum of r* unit rank matrices each of which is proportional to the outer product of the left and right singular vectors. For improving predictive accuracy and facilitating interpretation, it is often desirable that these left and right singular vectors be sparse or enjoy some smoothness property. We propose a regularized reduced rank regression approach for solving this problem. Computation algorithms and regularization parameter selection methods are developed, and the properties of the new method are explored both theoretically and by simulation. In particular, the regularization method proposed is shown to be selection consistent and asymptotically normal and to enjoy the oracle property. We apply the proposed model to perform biclustering analysis with microarray gene expression data.