skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization
Non-negative matrix factorization (NMF), the problem of finding two non-negative low-rank factors whose product approximates an input matrix, is a useful tool for many data mining and scientific applications such as topic modeling in text mining and unmixing in microscopy. In this paper, we focus on scaling algorithms for NMF to very large sparse datasets and massively parallel machines by employing effective algorithms, communication patterns, and partitioning schemes that leverage the sparsity of the input matrix. We consider two previous works developed for related problems,one that uses a fine-grained partitioning strategy using a point-to-point communication pattern and one that uses a Cartesian, or checkerboard, partitioning strategy using a collective-based communication pattern. We show that a combination of the previous approaches balances the demands of the various computations within NMF algorithms and achieves high efficiency and scalability. From the experiments, we see that our proposed strategy runs up to 10x faster than the state of the art on real-world datasets.  more » « less
Award ID(s):
1642385
PAR ID:
10078537
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
47th International Conference on Parallel Processing
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the non-negative matrix factorization (NMF) problem, the input is an m×n matrix M with non-negative entries and the goal is to factorize it as M ≈ AW. The m × k matrix A and the k × n matrix W are both constrained to have non-negative entries. This is in contrast to singular value decomposition, where the matrices A and W can have negative entries but must satisfy the orthogonality constraint: the columns of A are orthogonal and the rows of W are also orthogonal. The orthogonal non-negative matrix factorization (ONMF) problem imposes both the non-negativity and the orthogonality constraints, and previous work showed that it leads to better performances than NMF on many clustering tasks. We give the first constant-factor approximation algorithm for ONMF when one or both of A and W are subject to the orthogonality constraint. We also show an interesting connection to the correlation clustering problem on bipartite graphs. Our experiments on synthetic and real-world data show that our algorithm achieves similar or smaller errors compared to previous ONMF algorithms while ensuring perfect orthogonality (many previous algorithms do not satisfy the hard orthogonality constraint). 
    more » « less
  2. This article reports the study of algorithms for non-negative matrix factorization (NMF) in various applications involving smoothly varying data such as time or temperature series diffraction data on a dense grid of points. Utilizing the continual nature of the data, a fast two-stage algorithm is developed for highly efficient and accurate NMF. In the first stage, an alternating non-negative least-squares framework is used in combination with the active set method with a warm-start strategy for the solution of subproblems. In the second stage, an interior point method is adopted to accelerate the local convergence. The convergence of the proposed algorithm is proved. The new algorithm is compared with some existing algorithms in benchmark tests using both real-world data and synthetic data. The results demonstrate the advantage of the algorithm in finding high-precision solutions. 
    more » « less
  3. ABSTRACT Non‐negative Matrix Factorization (NMF) is an effective algorithm for multivariate data analysis, including applications to feature selection, pattern recognition, and computer vision. Its variant, Semi‐Nonnegative Matrix Factorization (SNF), extends the ability of NMF to render parts‐based data representations to include mixed‐sign data. Graph Regularized SNF builds upon this paradigm by adding a graph regularization term to preserve the local geometrical structure of the data space. Despite their successes, SNF‐related algorithms to date still suffer from instability caused by the Frobenius norm due to the effects of outliers and noise. In this paper, we present a new SNF algorithm that utilizes the noise‐insensitive norm. We provide monotonic convergence analysis of the SNF algorithm. In addition, we conduct numerical experiments on three benchmark mixed‐sign datasets as well as several randomized mixed‐sign matrices to demonstrate the performance superiority of SNF over conventional SNF algorithms under the influence of Gaussian noise at different levels. 
    more » « less
  4. We analyzed the essays that were written on various topics in an introductory physics course using two unsupervised machine learning algorithms. One of them was Latent Dirichlet Allocation (LDA). This algorithm is used for extracting abstract topics from a collection of text documents. The other algorithm was Non-negative Matrix Factorization (NMF). It is used for similar purposes but also in other domains such as image recognition. We applied these two algorithms to the dataset that consisted of N=683 student essays. Although there were some built-in, important differences between LDA and NMF, they both found similar topics in our data by large. This offers instructors a promising and productive way of accessing useful information about their students' written work, especially in large-enrollment classes. 
    more » « less
  5. Plant leaf waxes and their isotopic composition are important tracers of ecological, environmental, and climate variability, with strong preservation potential in sedimentary archives. However, they represent an integrated, and often complicated, signal of vegetation and hydrology within a watershed. Here, we report a new approach for examining complex mixtures of n-alkanes in sediments and their isotope values: non-negative matrix factorization (NMF). NMF identifies the endmembers in a mixture from the integrated n-alkane data and provides quantitative information on the relative importance of those endmembers across samples. We apply this approach to a synthetic dataset and two previously published datasets to illustrate its uses. Our application of NMF to re-analyse previously published data reveals new insights into past climate and ecological change. We demonstrate that NMF allows a user to 1) identify potential mixing problems, 2) evaluate which specific compounds in a mixture carry the isotope signal that can best address a given scientific objective, 3) determine compound concentrations after excluding contributions from particular endmember sources, and 4) calculate isotope values of different sources. NMF provides a quantitative approach for evaluating the influence of endmember mixing on molecular concentrations and isotope values within a dataset. The re-analysis of two published datasets reveals new quantitative insight into Holocene Arctic climate and Neogene vegetation change. 
    more » « less