Due to the emergence of massive real-world graphs, whose sizes may extend to terabytes, new tools must be developed to enable data scientists to handle such graphs efficiently. These graphs may include social networks, computer networks, and genomes. In this paper, we propose a novel graph package, Arachne, to make large-scale graph analytics more effortless and efficient based on the open-source Arkouda framework. Arkouda has been developed to allow users to perform massively parallel computations on distributed data with an interface similar to NumPy. In this package, we developed a fundamental sparse graph data structure and then built several useful graph algorithms around our data structure to form a basic algorithmic library. Benchmarks and tools were also developed to evaluate and demonstrate the use of our graph algorithms. The graph algorithms we have implemented thus far include breadth-first search (BFS), connected components (CC), k-Truss (KT), Jaccard coefficients (JC), triangle counting (TC), and triangle centrality (TCE). Their corresponding experimental results based on realworld and synthetic graphs are presented. Arachne is organized as an Arkouda extension package and is publicly available on GitHub (https://github.com/Bears-R-Us/arkouda-njit).
more »
« less
Triangle Centrality in Arkouda
There are a wide number of graph centrality metrics. Further, the performance of each can vary widely depending on the type of implementation. In this work we present our implementation of triangle centrality in Arkouda with several different triangle counting methods. Triangle Centrality is a robust metric that captures the centrality of a vertex through both a vertex’s own connectedness and that of its neighbors. Arkouda is an open-source framework for data science at the scale of terabytes and beyond. These methods are compared against each other and another shared memory implementation.
more »
« less
- Award ID(s):
- 2109988
- PAR ID:
- 10385348
- Date Published:
- Journal Name:
- The 26th Annual IEEE High Performance Extreme Computing Conference (HPEC)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Identifying key members in large social network graphs is an important graph analytic. Recently, a new centrality measure called triangle centrality finds members based on the triangle support of vertices in graph. In this paper, we describe our rapid implementation of triangle centrality using Graph-BLAS, an API specification for describing graph algorithms in the language of linear algebra. We use triangle centrality’s algebraic algorithm and easily implement it using the SuiteSparse GraphBLAS library. A set of experiments on large, sparse graph datasets is conducted to verify the implementation.more » « less
-
In graph analytics, a truss is a cohesive subgraph based on the number of triangles supporting each edge. It is widely used for community detection applications such as social networks and security analysis, and the performance of truss analytics highly depends on its triangle counting method. This paper proposes a novel triangle counting kernel named Minimum Search (MS). Minimum Search can select two smaller adjacency lists out of three and uses fine-grained parallelism to improve the performance of triangle counting. Then, two basic algorithms, MS-based triangle counting, and MS-based support updating are developed. Based on the novel triangle counting kernel and the two basic algorithms above, three fundamental parallel truss analytics algorithms are designed and implemented to enable different kinds of graph truss analysis. These truss algorithms include an optimized K-Truss algorithm, a Max-Truss algorithm, and a Truss Decomposition algorithm. Moreover, all proposed algorithms have been implemented in the parallel language Chapel and integrated into an open-source framework, Arkouda. Through Arkouda, data scientists can efficiently conduct graph analysis through an easy-to-use Python interface and handle large-scale graph data in powerful back-end computing resources. Experimental results show that the proposed methods can significantly improve the performance of truss analysis on real-world graphs compared with the existing and widely adopted list intersection-based method. The implemented code is publicly available from GitHub (https://github.com/Bears-R-Us/arkoudanjit). }more » « less
-
The K-Truss of a graph is a cohesive subgraph that has been widely used for community detection in applications such as social networks and security analysis. In this paper, we first propose one optimized triangle search kernel with a few operations that can be used in both triangle counting and triangle search to replace the existing list intersection method. Based on the optimized kernel, three truss analytics algorithms, an optimized K-Truss parallel algorithm, a maximal K-Truss parallel algorithm, and a Truss decomposition parallel algorithm, are developed to efficiently enable different kinds of graph analysis. Moreover, all proposed parallel algorithms have been implemented in the highly-productive parallel language Chapel and integrated into the open-source framework Arkouda. Experimental results compared with the existing list intersection-based method show that for both synthetic and real-world graphs, the proposed method can significantly improve the performance of truss analysis on large graphs. The implemented method is publicly available from GitHub.more » « less