We consider algorithms with access to an unknown matrix M ε F n×d via matrix-vector products , namely, the algorithm chooses vectors v 1 , ⃛ , v q , and observes Mv 1 , ⃛ , Mv q . Here the v i can be randomized as well as chosen adaptively as a function of Mv 1 , ⃛ , Mv i-1 . Motivated by applications of sketching in distributed computation, linear algebra, and streaming models, as well as connections to areas such as communication complexity and property testing, we initiate the study of the number q of queries needed to solve various fundamental problems. We study problems in three broad categories, including linear algebra, statistics problems, and graph problems. For example, we consider the number of queries required to approximate the rank, trace, maximum eigenvalue, and norms of a matrix M; to compute the AND/OR/Parity of each column or row of M, to decide whether there are identical columns or rows in M or whether M is symmetric, diagonal, or unitary; or to compute whether a graph defined by M is connected or triangle-free. We also show separations for algorithms that are allowed to obtain matrix-vector products only by querying vectors on the right, versus algorithms that can query vectors on both the left and the right. We also show separations depending on the underlying field the matrix-vector product occurs in. For graph problems, we show separations depending on the form of the matrix (bipartite adjacency versus signed edge-vertex incidence matrix) to represent the graph. Surprisingly, very few works discuss this fundamental model, and we believe a thorough investigation of problems in this model would be beneficial to a number of different application areas.
more »
« less
Mathematical foundations of the GraphBLAS
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.
more »
« less
- PAR ID:
- 10037304
- Date Published:
- Journal Name:
- Proceedings of the IEEE High Performance Extreme Computing Conference
- Page Range / eLocation ID:
- 1 to 9
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We develop algorithms for online topology inference from streaming nodal observations and partial connectivity information; i.e., a priori knowledge on the presence or absence of a few edges may be available as in the link prediction problem. The observations are modeled as stationary graph signals generated by local diffusion dynamics on the unknown network. Said stationarity assumption implies the simultaneous diagonalization of the observations' covariance matrix and the so-called graph shift operator (GSO), here the adjacency matrix of the sought graph. When the GSO eigenvectors are perfectly obtained from the ensemble covariance, we examine the structure of the feasible set of adjacency matrices and its dependency on the prior connectivity information available. In practice one can only form an empirical estimate of the covariance matrix, so we develop an alternating algorithm to find a sparse GSO given its imperfectly estimated eigenvectors. Upon sensing new diffused observations in the streaming setting, we efficiently update eigenvectors and perform only one (or a few) online iteration(s) of the proposed algorithm until a new datum is observed. Numerical tests showcase the effectiveness of the novel batch and online algorithms in recovering real-world graphs.more » « less
-
In machine learning applications, data are often high-dimensional and intricately related. It is often of interest to find the underlying structure and Granger causal relationships among the data and represent these relationships with directed graphs. In this paper, we study multivariate time series, where each series is associated with a node of a graph, and where the objective is to estimate the topology of a sparse graph that reflects how the nodes of the graph affect each other, if at all. We propose a novel fully Bayesian approach that employs a sparsity-encouraging prior on the hyperparameters. The proposed method allows for nonlinear and multiple lag relationships among the time series. The method is based on Gaussian processes, and it treats the entries of the graph adjacency matrix as hyperparameters. It utilizes a modified automatic relevance determination (ARD) kernel and allows for learning the mapping function from selected past data to current data as edges of a graph . We show that the resulting adjacency matrix provides the intrinsic structure of the graph and answers causality-related questions. Numerical tests show that the proposed method has comparable or better performance than state-of-the-art methods.more » « less
-
Graph algorithms can be expressed in terms of linear algebra. GraphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAGraph builds on top of the GraphBLAS to target users of graph algorithms with high-level algorithms common in network analysis. In this paper, we describe the first release of the LAGraph library, the design decisions behind the library, and performance using the GAP benchmark suite. LAGraph, however, is much more than a library. It is also a project to document and analyze the full range of algorithms enabled by the GraphBLAS. To that end, we have developed a compact and intuitive notation for describing these algorithms. In this paper, we present that notation with examples from the GAP benchmark suite.more » « less
-
Benito, Rosa Maria ; Cherifi, Chantal ; Cherifi, Hocine ; Moro, Esteban ; Rocha, Luis M. (Ed.)To characterize the “average” of a set of graphs, one can compute the sample Fr ́echet mean. We prove the following result: if we use the Hamming distance to compute distances between graphs, then the Fr ́echet mean of an ensemble of inhomogeneous random graphs is obtained by thresholding the expected adjacency matrix: an edge exists between the vertices i and j in the Fr ́echet mean graph if and only if the corresponding entry of the expected adjacency matrix is greater than 1/2. We prove that the result also holds for the sample Fr ́echet mean when the expected adjacency matrix is replaced with the sample mean adjacency matrix. This novel theoretical result has some significant practical consequences; for instance, the Fr ́echet mean of an ensemble of sparse inhomogeneous random graphs is the empty graph.more » « less