skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Property Graphs in Arachne
Analyzing large-scale graphs poses challenges due to their increasing size and the demand for interactive and user-friendly analytics tools. These graphs arise from various domains, including cybersecurity, social sciences, health sciences, and network sciences, where networks can represent interactions between humans, neurons in the brain, or malicious flows in a network. Exploring these large graphs is crucial for revealing hidden structures and metrics that are not easily computable without parallel computing. Currently, Python users can leverage the open-source Arkouda framework to efficiently execute Pandas and NumPy-related tasks on thousands of cores. To address large-scale graph analysis, Arachne, an extension to Arkouda, enables easy transformation of Arkouda dataframes into graphs. This paper proposes and evaluates three distributable data structures for property graphs, implemented in Chapel, that are integrated into Arachne. Enriching Arachne with support for property graphs will empower data scientists to extend their analysis to new problem domains. Property graphs present additional complexities, requiring efficient storage for extra information on vertices and edges, such as labels, relationships, and properties.  more » « less
Award ID(s):
2109988
PAR ID:
10561948
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-0860-0
Page Range / eLocation ID:
1 to 7
Format(s):
Medium: X
Location:
Boston, MA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Analyzing large-scale graphs poses challenges due to their increasing size and the demand for interactive and user-friendly analytics tools. These graphs arise from various domains, including cybersecurity, social sciences, health sciences, and network sciences, where networks can represent interactions between humans, neurons in the brain, or malicious flows in a network. Exploring these large graphs is crucial for revealing hidden structures and metrics that are not easily computable without parallel computing. Currently, Python users can leverage the open-source Arkouda framework to efficiently execute Pandas and NumPy-related tasks on thousands of cores. To address large-scale graph analysis, Arachne, an extension to Arkouda, enables easy transformation of Arkouda dataframes into graphs. This paper proposes and evaluates three distributable data structures for property graphs, implemented in Chapel, that are integrated into Arachne. Enriching Arachne with support for property graphs will empower data scientists to extend their analysis to new problem domains. Property graphs present additional complexities, requiring efficient storage for extra information on vertices and edges, such as labels, relationships, and properties. 
    more » « less
  2. This paper introduces a novel, parallel, and scalable implementation of the VF2 algorithm for subgraph monomorphism developed in the high-productivity language Chapel. Efficient graph analysis in large and complex network datasets is crucial across numerous scientific domains. We address this need through our enhanced VF2 implementation, widely utilized in subgraph matching, and integrating it into Arachneā€”a Python-accessible, open-source, large-scale graph analysis framework. Leveraging the parallel computing capabilities of modern hardware architectures, our implementation achieves significant performance improvements. Benchmarks on synthetic and real-world datasets, including social, communication, and neuroscience networks, demonstrate speedups of up to 97X on 128 cores, compared to existing Python-based tools like NetworkX and DotMotif, which do not exploit parallelization. Our results on large-scale graphs demonstrate scalability and efficiency, establishing it as a viable tool for subgraph monomorphism, the backbone of numerous graph analytics such as motif counting and enumeration. Arachne, including our VF2 implementation, can be found on GitHub: https://github.com/Bears-R-Us/arkouda-njit. 
    more » « less
  3. Due to the emergence of massive real-world graphs, whose sizes may extend to terabytes, new tools must be developed to enable data scientists to handle such graphs efficiently. These graphs may include social networks, computer networks, and genomes. In this paper, we propose a novel graph package, Arachne, to make large-scale graph analytics more effortless and efficient based on the open-source Arkouda framework. Arkouda has been developed to allow users to perform massively parallel computations on distributed data with an interface similar to NumPy. In this package, we developed a fundamental sparse graph data structure and then built several useful graph algorithms around our data structure to form a basic algorithmic library. Benchmarks and tools were also developed to evaluate and demonstrate the use of our graph algorithms. The graph algorithms we have implemented thus far include breadth-first search (BFS), connected components (CC), k-Truss (KT), Jaccard coefficients (JC), triangle counting (TC), and triangle centrality (TCE). Their corresponding experimental results based on realworld and synthetic graphs are presented. Arachne is organized as an Arkouda extension package and is publicly available on GitHub (https://github.com/Bears-R-Us/arkouda-njit). 
    more » « less
  4. Finding connected components in a graph is a fundamental problem in graph analysis. In this work, we present a novel minimum-mapping based Contour algorithm to efficiently solve the connectivity problem. We prove that the Contour algorithm with two or higher order operators can identify all connected components of an undirected graph within O(log d_max) iterations, with each iteration involving O(m) work, where d_max represents the largest diameter among all components in the given graph, and m is the total number of edges in the graph. Importantly, each iteration is highly parallelizable, making use of the efficient minimum-mapping operator applied to all edges. To further enhance its practical performance, we optimize the Contour algorithm through asynchronous updates, early convergence checking, eliminating atomic operations, and choosing more efficient mapping operators. Our implementation of the Contour algorithm has been integrated into the open-source framework Arachne. Arachne extends Arkouda for large-scale interactive graph analytics, providing a Python API powered by the high-productivity parallel language Chapel. Experimental results on both real-world and synthetic graphs demonstrate the superior performance of our proposed Contour algorithm compared to state-of-the-art large-scale parallel algorithm FastSV and the fastest shared memory algorithm ConnectIt. On average, Contour achieves a speedup of 7.3x and 1.4x compared to FastSV and ConnectIt, respectively. All code for the Contour algorithm and the Arachne framework is publicly available on GitHub {https://github.com/Bears-R-Us/arkouda-njit), ensuring transparency and reproducibility of our work. 
    more » « less
  5. Finding connected components in a graph is a fundamental problem in graph analysis. In this work, we present a novel minimum-mapping based Contour algorithm to efficiently solve the connectivity problem. We prove that the Contour algorithm with two or higher order operators can identify all connected components of an undirected graph within O(log d_max) iterations, with each iteration involving O(m) work, where d_max represents the largest diameter among all components in the given graph, and m is the total number of edges in the graph. Importantly, each iteration is highly parallelizable, making use of the efficient minimum-mapping operator applied to all edges. To further enhance its practical performance, we optimize the Contour algorithm through asynchronous updates, early convergence checking, eliminating atomic operations, and choosing more efficient mapping operators. Our implementation of the Contour algorithm has been integrated into the open-source framework Arachne. Arachne extends Arkouda for large-scale interactive graph analytics, providing a Python API powered by the high-productivity parallel language Chapel. Experimental results on both real-world and synthetic graphs demonstrate the superior performance of our proposed Contour algorithm compared to state-of-the-art large-scale parallel algorithm FastSV and the fastest shared memory algorithm ConnectIt. On average, Contour achieves a speedup of 7.3x and 1.4x compared to FastSV and ConnectIt, respectively. All code for the Contour algorithm and the Arachne framework is publicly available on GitHub {https://github.com/Bears-R-Us/arkouda-njit), ensuring transparency and reproducibility of our work. 
    more » « less