Search for: All records

Award ID contains: 1642385

« Prev Next »

Total Resources

11

Resource Type
Conference Paper

7

Conference Proceeding

0

Dataset

0

Journal Article

4

Workshop Report

0

Availability
Full Text / Resource Available

11

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PLANC: Parallel Low-rank Approximation with Nonnegativity Constraints

https://doi.org/10.1145/3432185

Eswar, Srinivas ; Hayashi, Koby ; Ballard, Grey ; Kannan, Ramakrishnan ; Matheson, Michael A. ; Park, Haesun ( June 2021 , ACM Transactions on Mathematical Software)
null (Ed.)
We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.
more » « less
Full Text Available
Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization

https://doi.org/10.1109/HiPC50609.2020.00028

Manning, Lawton ; Ballard, Grey ; Kannan, Ramakrishnan ; Park, Haesun ( December 2020 , 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC))
null (Ed.)
Full Text Available
Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization

https://doi.org/10.1109/SC41405.2020.00078

Eswar, Srinivas ; Hayashi, Koby ; Ballard, Grey ; Kannan, Ramakrishnan ; Vuduc, Richard ; Park, Haesun ( November 2020 , SC20: International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
Full Text Available
General Memory-Independent Lower Bound for MTTKRP

https://doi.org/10.1137/1.9781611976137.1

Ballard, Grey ; Rouse, Kathryn ( January 2020 , SIAM Conference on Parallel Processing for Scientific Computing)

Our goal is to establish lower bounds on the communication required to perform the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) computation on a distributed-memory parallel machine. MTTKRP is the bottleneck computation within algorithms for computing the CP tensor decomposition, which is an approximation by a sum of rank-one tensors and frequently used in multidimensional data analysis. The main result of this paper is a communication lower bound that generalizes previous results, tightening the bound so that it is attainable even when the tensor dimensions vary (the tensor is not cubical) and when the number of processors is small relative to the tensor dimensions. The attainability of the bound proves that the algorithm that attains it, which is based on a block distribution of the tensor and communicating only factor matrices, is communication optimal. The proof technique utilizes an established inequality that relates computations to data access as well as a novel approach based on convex optimization.
more » « less
Full Text Available
Dynamic Functional Magnetic Resonance Imaging Connectivity Tensor Decomposition: A New Approach to Analyze and Interpret Dynamic Brain Connectivity

https://doi.org/10.1089/brain.2018.0605

Mokhtari, Fatemeh ; Laurienti, Paul J. ; Rejeski, W. Jack ; Ballard, Grey ( February 2019 , Brain Connectivity)

Full Text Available
Parallel Nonnegative CP Decomposition of Dense Tensors

https://doi.org/10.1109/HiPC.2018.00012

Ballard, Grey ; Hayashi, Koby ; Ramakrishnan, Kannan ( December 2018 , 25th IEEE International Conference on High Performance Computing)

The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensors that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software.
more » « less
Full Text Available
Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product

https://doi.org/10.1109/IPDPS.2018.00065

Ballard, Grey ; Knight, Nicholas ; Rouse, Kathryn ( May 2018 , 2018 IEEE International Parallel and Distributed Processing Symposium)

The matricized-tensor times Khatri-Rao product (MTTKRP) computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.
more » « less
Full Text Available
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

https://doi.org/10.1109/TKDE.2017.2767592

Kannan, Ramakrishnan ; Ballard, Grey ; Park, Haesun ( March 2018 , IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
Shared-memory parallelization of MTTKRP for dense tensors

https://doi.org/10.1145/3178487.3178522

Hayashi, Koby ; Ballard, Grey ; Jiang, Yujie ; Tobia, Michael J. ( January 2018 , 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this work, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to 7.4X over existing parallel software.
more » « less
Full Text Available
Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization

https://doi.org/10.1145/3225058.3225127

Kaya, Oguz ; Kannan, Ramakrishnan ; Ballard, Grey ( January 2018 , 47th International Conference on Parallel Processing)

Non-negative matrix factorization (NMF), the problem of finding two non-negative low-rank factors whose product approximates an input matrix, is a useful tool for many data mining and scientific applications such as topic modeling in text mining and unmixing in microscopy. In this paper, we focus on scaling algorithms for NMF to very large sparse datasets and massively parallel machines by employing effective algorithms, communication patterns, and partitioning schemes that leverage the sparsity of the input matrix. We consider two previous works developed for related problems,one that uses a fine-grained partitioning strategy using a point-to-point communication pattern and one that uses a Cartesian, or checkerboard, partitioning strategy using a collective-based communication pattern. We show that a combination of the previous approaches balances the demands of the various computations within NMF algorithms and achieves high efficiency and scalability. From the experiments, we see that our proposed strategy runs up to 10x faster than the state of the art on real-world datasets.
more » « less
Full Text Available

« Prev Next »