Search for: All records

Award ID contains: 2109988

« Prev Next »

Total Resources

26

Resource Type
Conference Paper

21

Conference Proceeding

0

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

17

Citation Only

9

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Contour Algorithm for Connectivity

Du, Zhihui ; Alvarado Rodriguez, Oliver ; Li, Fuhuan ; Dindoost, Mohammad ; Bader, David ( December 2023 , The 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC))

Finding connected components in a graph is a fundamental problem in graph analysis. In this work, we present a novel minimum-mapping based Contour algorithm to efficiently solve the connectivity problem. We prove that the Contour algorithm with two or higher order operators can identify all connected components of an undirected graph within O(log d_max) iterations, with each iteration involving O(m) work, where d_max represents the largest diameter among all components in the given graph, and m is the total number of edges in the graph. Importantly, each iteration is highly parallelizable, making use of the efficient minimum-mapping operator applied to all edges. To further enhance its practical performance, we optimize the Contour algorithm through asynchronous updates, early convergence checking, eliminating atomic operations, and choosing more efficient mapping operators. Our implementation of the Contour algorithm has been integrated into the open-source framework Arachne. Arachne extends Arkouda for large-scale interactive graph analytics, providing a Python API powered by the high-productivity parallel language Chapel. Experimental results on both real-world and synthetic graphs demonstrate the superior performance of our proposed Contour algorithm compared to state-of-the-art large-scale parallel algorithm FastSV and the fastest shared memory algorithm ConnectIt. On average, Contour achieves a speedup of 7.3x and 1.4x compared to FastSV and ConnectIt, respectively. All code for the Contour algorithm and the Arachne framework is publicly available on GitHub {https://github.com/Bears-R-Us/arkouda-njit), ensuring transparency and reproducibility of our work.
more » « less
Free, publicly-accessible full text available December 18, 2024
Tunnel: Parallel-inducing sort for large string analytics

https://doi.org/10.1016/j.future.2023.08.009

Du, Zhihui ; Zhang, Sen ; Bader, David A. ( December 2023 , Future Generation Computer Systems)

The suffix array is a crucial data structure for efficient string analysis. Over the course of twenty-six years, sequential suffix array construction algorithms have achieved O(n) time complexity and in-place sorting. In this paper, we present the Tunnel algorithm, the first large-scale parallel suffix array construction algorithm with a time complexity of O(n/p) based on the parallel random access machine (PRAM) model. The Tunnel algorithm is built on three key ideas: dividing the problem of size O(n) into p sub-problems of reduced size O(n/p) by replacing long suffixes with shorter prefixes of size at most a constant D ; introducing a Tunnel mechanism to efficiently induce the order of a set of suffixes with long common prefixes; developing a strategy to transform a partially ordered suffix set into a total order relation by iteratively applying the Tunnel inducing method. We provide a detailed description of the algorithm, along with a thorough analysis of its time and space complexity, to demonstrate its correctness and efficiency. The proposed Tunnel algorithm exhibits scalable performance, making it suitable for large string analytics on large-scale parallel systems.
more » « less
Free, publicly-accessible full text available December 1, 2024
Fast Triangle Counting

Bader, David ( September 2023 , The 27th Annual IEEE High Performance Extreme Computing Conference (HPEC))

Listing and counting triangles in graphs is a key algorithmic kernel for network analyses including community detection, clustering coefficients, k-trusses, and triangle centrality. We design and implement a new serial algorithm for triangle counting that performs competitively with the fastest previous approaches on both real and synthetic graphs, such as those from the Graph500 Benchmark and the MIT/Amazon/IEEE Graph Challenge. The experimental results use the recently-launched Intel Xeon Platinum 8480+ and CPU Max 9480 processors.
more » « less
Free, publicly-accessible full text available September 25, 2024
Triangle Counting Through Cover-Edges

Bader, David ; Li, Fuhuan ; Ganeshan, Anya ; Gundogdu, Ahmet ; Lew, Jason ; Alvarado Rodriguez, Oliver ; Du, Zhihui ( September 2023 , The 27th Annual IEEE High Performance Extreme Computing Conference (HPEC))

Counting and finding triangles in graphs is often used in real-world analytics to characterize cohesiveness and identify communities in graphs. In this paper, we propose the novel concept of a cover-edge set that can be used to find triangles more efficiently. We use a breadth-first search (BFS) to quickly generate a compact cover-edge set. Novel sequential and parallel triangle counting algorithms are presented that employ cover-edge sets. The sequential algorithm avoids unnecessary triangle-checking operations, and the parallel algorithm is communication-efficient. The parallel algorithm can asymptotically reduce communication on massive graphs such as from real social networks and synthetic graphs from the Graph500 Benchmark. In our estimate from massive-scale Graph500 graphs, our new parallel algorithm can reduce the communication on a scale 36 graph by 1156x and on a scale 42 graph by 2368x.
more » « less
Free, publicly-accessible full text available September 25, 2024
Parallel Longest Common SubSequence Analysis In Chapel

Vahidi, Soroush ; Schieber, Baruch ; Du, Zhihui ; Bader, David ( September 2023 , The 27th Annual IEEE High Performance Extreme Computing Conference (HPEC))

One of the most critical problems in the field of string algorithms is the longest common subsequence problem (LCS). The problem is NP-hard for an arbitrary number of strings but can be solved in polynomial time for a fixed number of strings. In this paper, we select a typical parallel LCS algorithm and integrate it into our large-scale string analysis algorithm library to support different types of large string analysis. Specifically, we take advantage of the high-level parallel language, Chapel, to integrate Lu and Liu’s parallel LCS algorithm into Arkouda, an open-source framework. Through Arkouda, data scientists can easily handle large string analytics on the back-end high-performance computing resources from the front-end Python interface. The Chapel-enabled parallel LCS algorithm can identify the longest common subsequences of two strings, and experimental results are given to show how the number of parallel resources and the length of input strings can affect the algorithm’s performance.
more » « less
Free, publicly-accessible full text available September 25, 2024
Property Graphs in Arachne

Rodriguez, Oliver Alvarado ; Du, Zhihui ; Bader, David ( September 2023 , The 27th Annual IEEE High Performance Extreme Computing Conference (HPEC))

Analyzing large-scale graphs poses challenges due to their increasing size and the demand for interactive and user-friendly analytics tools. These graphs arise from various domains, including cybersecurity, social sciences, health sciences, and network sciences, where networks can represent interactions between humans, neurons in the brain, or malicious flows in a network. Exploring these large graphs is crucial for revealing hidden structures and metrics that are not easily computable without parallel computing. Currently, Python users can leverage the open-source Arkouda framework to efficiently execute Pandas and NumPy-related tasks on thousands of cores. To address large-scale graph analysis, Arachne, an extension to Arkouda, enables easy transformation of Arkouda dataframes into graphs. This paper proposes and evaluates three distributable data structures for property graphs, implemented in Chapel, that are integrated into Arachne. Enriching Arachne with support for property graphs will empower data scientists to extend their analysis to new problem domains. Property graphs present additional complexities, requiring efficient storage for extra information on vertices and edges, such as labels, relationships, and properties.
more » « less
Free, publicly-accessible full text available September 25, 2024
Parallel Triangles and Squares Count for Multigraphs Using Vertex Covers

Cappelletti, Luca ; Fontana, Tommaso ; Green, Oded ; Bader, David ( June 2023 , Lecture notes in computer science)
Mikyška, J. ; de Mulatier, C. ; Paszynski, M. ; Krzhizhanovskaya, V.V. ; Dongarra, J.J. ; Sloot, P.M. (Ed.)
Triangles and squares count are widely-used graph analytic metrics providing insights into the connectivity of a graph. While the literature has focused on algorithms for global counts in simple graphs, this paper presents parallel algorithms for global and per-node triangle and square counts in large multigraphs. The algorithms have linear improvements in computational complexity as the number of cores increases. The triangle count algorithm has the same complexity as the best-known algorithm in the literature. The squares count algorithm has a lower execution time than previous methods. The proposed algorithms are evaluated on six real-world graphs and multigraphs, including protein-protein interaction graphs, knowledge graphs and large web graphs.
more » « less
Free, publicly-accessible full text available June 26, 2024
Minimum-Mapping based Connected Components Algorithm

Du, Zhihui ; Rodriguez, Oliver Alvarado ; Li, Fuhuan ; Dindoost, Mohammad ; Bader, David ( June 2023 , The 10th Annual Chapel Implementers and Users Workshop (CHIUW))

Finding connected components is a fundamental problem in graph analysis. We develop a novel minimum- mapping based Contour algorithm to solve the connectivity problem. The Contour algorithm can identify all connected components of an undirected graph within O (log 𝑑𝑚𝑎𝑥 ) iterations on 𝑚 parallel processors, where 𝑑𝑚𝑎𝑥 is the largest diameter of all components in a given graph and 𝑚 is the total number of edges of the given graph. Furthermore, each iteration can easily be parallelized by employing the highly efficient minimum-mapping operator on all edges. To improve performance, the Contour algorithm is further optimized through asynchronous updates and simplified atomic operations. Our algorithm has been integrated into an open-source framework, Arachne, that extends Arkouda for large-scale interactive graph analytics with a Python API powered by the high-productivity parallel language Chapel. Experimental results on real-world and synthetic graphs show that the proposed Contour algorithm needs less number of iterations and can achieve 5.26 folds of speedup on average compared with the state-of-the-art connected component method FastSV implemented in Chapel. All code is publicly available on GitHub (https://github.com/Bears-R-Us/arkouda-njit).
more » « less
Free, publicly-accessible full text available June 1, 2024
Billion-scale Detection of Isomorphic Nodes

https://doi.org/10.1109/IPDPSW59300.2023.00046

Cappelletti, Luca ; Fontana, Tommaso ; Reese, Justin ; Bader, David A. ( May 2023 , IEEE)

This paper presents an algorithm for detecting attributed high-degree node isomorphism. High-degree isomorphic nodes seldom happen by chance and often represent duplicated entities or data processing errors. By definition, isomorphic nodes are topologically indistinguishable and can be problematic in graph ML tasks. The algorithm employs a parallel, “degree-bounded” approach that fingerprints each node’s local properties through a hash, which constrains the search to nodes within hash-defined buckets, thus minimising the number of comparisons. This method scales on graphs with billions of nodes and edges. Finally, we provide isomorphic node oddities identified in real-world data.
more » « less
Free, publicly-accessible full text available May 1, 2024
High-Performance Truss Analysis in Arkouda

Zhihui Du ; Joseph Patchett ; Oliver Alvarado Rodriguez ; David A. Bader ( December 2022 , The 29th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC))

In graph analytics, a truss is a cohesive subgraph based on the number of triangles supporting each edge. It is widely used for community detection applications such as social networks and security analysis, and the performance of truss analytics highly depends on its triangle counting method. This paper proposes a novel triangle counting kernel named Minimum Search (MS). Minimum Search can select two smaller adjacency lists out of three and uses fine-grained parallelism to improve the performance of triangle counting. Then, two basic algorithms, MS-based triangle counting, and MS-based support updating are developed. Based on the novel triangle counting kernel and the two basic algorithms above, three fundamental parallel truss analytics algorithms are designed and implemented to enable different kinds of graph truss analysis. These truss algorithms include an optimized K-Truss algorithm, a Max-Truss algorithm, and a Truss Decomposition algorithm. Moreover, all proposed algorithms have been implemented in the parallel language Chapel and integrated into an open-source framework, Arkouda. Through Arkouda, data scientists can efficiently conduct graph analysis through an easy-to-use Python interface and handle large-scale graph data in powerful back-end computing resources. Experimental results show that the proposed methods can significantly improve the performance of truss analysis on real-world graphs compared with the existing and widely adopted list intersection-based method. The implemented code is publicly available from GitHub (https://github.com/Bears-R-Us/arkoudanjit). }
more » « less
Full Text Available

« Prev Next »