<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>ParaDyMS: Parallel Dynamic Motif Counting at Scale</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>05/19/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10632183</idno>
					<idno type="doi">10.1109/CCGRID64434.2025.00046</idno>
					
					<author>Ali Khan</author><author>Nigel Tan</author><author>Jack Marquez</author><author>Michela Taufer</author><author>Sanjukta Bhowmick</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Motifs in graphs (or networks) are subgraphs induced by a small set of vertices such as triangles and cliques. The frequency of motifs is used to compare and align networks across various domains, such as biology, epidemiology, and social sciences. Recent advances have made it feasible to solve the computational challenge of counting motifs in networks with over a billion edges. However, these algorithms apply only for static networks where the structure remains unchanged. In reality, networks dynamically evolve, and understanding how motifs change with the dynamic nature of these networks remains an unsolved challenge despite its potential to provide essential insights into the system. We present ParaDyMS (Parallel Dynamic Motif Counting at Scale), the first parallel algorithm for updating motif counts in fully dynamic networks using batched updates. Our algorithm updates the frequencies of motifs only in the modified parts of the network instead of recomputing them from scratch. We provide proof of the algorithm's correctness and complexity and empirically compare its execution time with another state-ofthe-art static algorithm on shared memory and GPUs using realworld networks. Our results show that our algorithm is highly scalable and can significantly reduce the time to compute motifs by more than 90% in the best case and, on average, by 69%.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Graphs, or networks, are used to model and analyze relationships within complex systems, providing critical insights into distributed computing challenges such as biological data analysis <ref type="bibr">[1]</ref> and social network dynamics <ref type="bibr">[2]</ref> as well as task scheduling, resource optimization, and detection of nondeterminism in high-performance computing (HPC) executions <ref type="bibr">[3]</ref>. Entities such as genes, individuals, or computing nodes are depicted as vertices, with edges representing their interactions. Motifs, also termed graphlets <ref type="bibr">[2]</ref>, are subgraphs induced by a small set of vertices and serve as fundamental building blocks in the analysis of large-scale graphs. Figure <ref type="figure">1</ref> illustrates the pool of subgraphs of vertices of size k, 3 &#8804; k &#8804; 4.</p><p>This material is based upon work supported by the National Science Foundation under Grant <ref type="bibr">No. 1900765, 1900888</ref>, and 2341588, as well as computational resources from Texas Advanced Computing Center (TACC) The University of Texas at Austin</p><p>The number of motifs in a network is key to understanding repetitions and structural similarities, providing insight into the underlying properties of distributed systems, such as regulatory relationships in gene expression data, social network community dynamics, and communication patterns in HPC workflows <ref type="bibr">[4]</ref>. Accurately and efficiently counting motifs is a critical challenge in graph analysis, especially for scaling applications across distributed and parallel platforms. To exacerbate the challenge, one has to consider that the interactions in real-world networks evolve with time, and this change is modeled as vertex and edge additions or deletions. As the structure of the network changes, recounting the motifs can be expensive, especially for extremely large networks <ref type="bibr">[5]</ref>. Updating these counts efficiently, instead of recalculating them entirely, is a complex challenge. Current algorithms primarily cater to static graphs with fixed structures, lacking efficacy in managing continuous changes in dynamic networks <ref type="bibr">[6]</ref>. Consequently, there is an urgent need for scalable algorithms for motif counting in evolving networks.</p><p>To address this need, we present ParaDyMS (Parallel Dynamic Motif Counting at Scale), an efficient algorithm tailored for motif counting in fully dynamic networks (i.e., both adding and deleting edges). Our algorithm updates motif frequencies only in the regions where the network is altered, enhancing efficiency instead of recalculating over the entire graph. Alongside presenting our algorithm, we provide validation of its correctness and complexity, supported by empirical tests on shared-memory systems and GPUs with real-world datasets. Our findings indicate that our algorithms scale effectively and outperform current algorithms for static networks on average by 69% and in the best case more than 90%, offering a much-needed solution for dynamic network motif counting.</p><p>Computing Motifs in Dynamic Networks in Parallel. Real-world networks are dynamic (i.e., vertices and edges are added or removed). For instance, gene correlation networks alter with aging, while social networks evolve with behavior. When analyzing these networks, updating motif counts for only the altered sections is more efficient than recalculating them entirely. This is particularly useful in network alignment, where the structural similarities between the motifs can be compared by updating their graphlet degree vectors (GDV) <ref type="bibr">[7]</ref>, the compact representation of graphlet counts. The capability of calculating motifs in dynamic networks allows one to assess the impact of individual changes. A significant change in the network due to edge addition or removal notably modifies the GDV. Consequently, localized updates enable the measurement of each edge's structural contribution. Further, given the large size of many real-world graphs, parallel computation of these motifs is needed.</p><p>Challenges in Dynamic Update of Motifs. While updating motifs in dynamic networks presents significant advantages, such as avoiding full recomputation, developing general algorithms for this purpose remains an open challenge. Most existing techniques are for static graphs and fail to fully exploit the unique characteristics of dynamic networks. Exceptions, such as triangle update methods <ref type="bibr">[8]</ref>, highlight the limited progress in this area. Moreover, the few algorithms available for dynamic motif updates do not take advantage of parallelism and are, therefore, suitable for small networks.</p><p>Despite the intuitive expectation that updating motifs in a dynamic network should be computationally faster than recomputing them, this is not always the case. In fact, updating operations can be more computationally demanding due to the complexity of efficiently managing changes. This difficulty arises from three key challenges that must be addressed when developing algorithms for dynamic networks:</p><p>&#8226; Efficiently Identifying Affected Motifs. Determining which motifs are impacted by each network change without redundant computation is a key challenge in dynamic network analysis. Updates often focus on counting motifs locally, near the altered edges. Although static algorithms excel at efficiently counting smaller subgraphs, such as triangles, and extending these counts to larger motifs using combinatorial methods <ref type="bibr">[9]</ref>- <ref type="bibr">[11]</ref>, they lack mechanisms to isolate and address localized changes effectively. Moreover, dynamic updates require tracking and adjusting for defunct motifs, such as those invalidated by structural changes (e.g., a wedge becoming a triangle), which static approaches do not consider. Handling multiple changes per motif, where several edges within the same motif are modified, further complicates efficient motif identification without redundant calculations. &#8226; Handling Large-Scale Networks. The scale of real-world dynamic graphs, often comprising millions of edges, poses significant challenges for motif analysis. Static algorithms, while efficient for fixed networks, struggle to scale in dynamic settings where updates occur frequently and unpredictably. Algorithms for dynamic networks must be designed to handle this complexity, ensuring both computational efficiency and memory scalability to process large graphs while maintaining accurate and timely motif counts <ref type="bibr">[8]</ref>. This is especially critical as the size and density of networks continue to grow in domains such as social media, biological systems, and transportation networks.</p><p>&#8226; Parallelization for Performance. Leveraging modern computational resources is essential for scaling motif updates in dynamic networks. Parallel algorithms can distribute the workload of motif updating across multiple processing units, significantly reducing runtime. However, designing parallel methods for dynamic networks introduces new challenges, such as synchronizing updates across threads, avoiding redundant computations, and ensuring consistency in the evolving network structure.</p><p>Existing approaches for counting parallel motifs in static graphs <ref type="bibr">[11]</ref> provide a foundation but require adaptation to meet the demands of dynamic network processing. Our Contributions: Efficient Dynamic Motif Update.</p><p>&#8226; Scalable Algorithm for Dynamic Motif Counting. We introduce ParaDyMs, an innovative parallel algorithm for motif updating in evolving graphs. Our algorithm focuses on detecting and updating connections between distance k -2 neighbors of the end points of the modified edges (u, v). We demonstrate the method for motifs of size 3 and 4, and illustrate its extension to motifs of size k. &#8226; Results on Correctness and Complexity. We present proof that our algorithm can correctly compute the frequency of the motifs of size 3 and 4. We show that the asymptotic complexity of our algorithm is equal to a state-of-the-art static algorithm. &#8226; Empirical Results on Performance. We empirically demonstrate on real-world networks that our algorithm is scalable and can achieve a computational speed on average 12x faster than the state-of-the-art static algorithm <ref type="bibr">[9]</ref>. In best case the speedup is as much as 124. <ref type="foot">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. UPDATING MOTIF COUNTS IN DYNAMIC GRAPHS</head><p>We propose a parallel algorithm to streamline motif updates, ensuring that the analysis remains efficient and adaptable as networks evolve. Our algorithm addresses the challenges of edge changes and enhances the ability to analyze large, dynamic networks in the cloud and in HPC.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Mathematical Notations and Definitions.</head><p>Let G = (V, E) represent a graph with V as the vertex set and E as the edge set. The global frequency of a motif H x in G is denoted by f (H x , G) for each H x &#8712; H, where H consists of motifs of size up to k and every H x is an induced connected motif. An induced motif includes all edges among its vertices. For instance, counting induced triangles (H 2 in Fig. <ref type="figure">1</ref>) excludes wedge counts (H 1 in Fig. <ref type="figure">1</ref>) because a triangle induces a wedge. In connected motifs, any two vertices are connected by a path.</p><p>We define an updated graph as the original graph augmented by a set of edge modifications. Let W represent the modified edges, divided into Ins for added edges and Del for removed edges. The updated graph G &#8242; = (V, E &#8242; ) is obtained using</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Motif Computation via Edge Modifications.</head><p>Motif computation in dynamic networks involves tracking the structural changes induced by edge insertions and deletions. These modifications impact the motifs in the network, as adding an edge can create new motifs while removing existing ones. For instance, consider the example in Figure <ref type="figure">1</ref> with nodes u, v, and q forming a wedge (H 1 ). Adding edge (q, u) creates a triangle (H 2 ), increasing the triangle count by one and eliminating the wedge, reducing its count by one. In contrast, deleting edge (q, u) in the triangle (H 2 ) reverses this: the triangle count decreases by one, while the wedge count increases by one.</p><p>To determine the number and types of motifs added or removed in an evolving graph G, we examine the motifs linked to a specific edge (u, v). Considering motifs with a maximum size of k, the largest motif is a path of length k -1. If (u, v) is a terminal edge of the path, the most distant vertex involved in the motif is positioned k -2 edges away from u or v. As a result, motifs linked to the edge (u, v) can be effectively identified within a localized subgraph. This subgraph is constructed from the vertices u, v, and their reachable neighbors within a distance of k -2.</p><p>Within the localized subgraph, we can define motif-specific counts affected by the edge (u, v). Let H add(u,v) x be the number of motifs of type H x containing the edge (u, v), and let H minus(u,v) x be the number of motifs of type H x containing the vertices u and v, but not the edge (u, v). Then the number of motifs of type H x in the updated graph for inserted edges, (u, v) &#8712; Ins, is given by:</p><p>For deleted edges, (u, v) &#8712; Del, the signs are reversed;</p><p>Based on Equations 1 and 2, we can add updates to the set of all edges changed and all types of motifs. The problem of updating motifs, now transforms to computing H add(u,v) x and H minus(u,v) x for all changed edges and all motifs.</p><p>C. Our Algorithm.</p><p>The algorithm for updating motifs in dynamic networks operates through a two-step method. Initially, STEP I involves mapping integers to neighbors to represent local structures around altered edges, facilitating efficient motif detection. Subsequently, STEP II evaluates shifts in motif counts to determine the effects of adding or removing edges on motif frequencies. This approach accommodates motifs with mixed and overlapping edge updates. The process is elaborated here for motifs up to size 4. STEP I: Assigning Integer Labels to Neighbors for Efficient Motif Identification. This step assigns integer labels to vertices in the local neighborhood of a modified edge. These labels identify the distance of neighboring vertices (e.g., direct or second-hop neighbors) relative to the modified edge. This mapping simplifies the identification of potential motifs by encoding the local structure in a computationally efficient way.</p><p>Based on the method to identify triangles in <ref type="bibr">[9]</ref>, we assign distinct integers to each neighbor of distance 1 and distance 2 of u and v, depending on whether the edges linking the vertices and neighbors belong to the original set of edges E or the updated set of edges W , as shown in Table <ref type="table">I</ref>. Consider X (u,v) (p) as the mapping of a vertex p relative to edge (u, v). Initially, mappings for all network vertices are zero and are adjusted with edge modifications as per Table <ref type="table">I</ref>. Algorithm 1 details the procedure for mapping X values.</p><p>Let K(i) be the number of vertices mapped to value i in X. Let p and q be two neighbors of u and v that satisfy one of the conditions in Table <ref type="table">I</ref>. For every edge between (p, q) where the X mappings of both p and q are nonzero, we update M G (ij) and M Q (ij), which gives the types of connections between distance 2 neighbors in the original G and the graph induced by the updated edges Q, respectively. The step for computing the values of M G (ij) , M Q (ij), and K(i) is in Algorithm 2.</p><p>TABLE I: Integer mapping of distance 1 and distance 2 neighbors of the changed edge set. Type of Existing or Integer Id connection Changed Edge Distance 1 Neighbors: Single Vertex Node only connected to u Existing Edge 1 Node only connected to u Changed Edge 2 Node only connected to v Existing Edge 3 Node only connected to v Changed Edge 4 Distance 1 Neighbors: Both Vertices Connected to u and v Existing Edge connects u Changed Edge connects v 5 Connected to u and v Existing Edge connects v Changed Edge connects u 6 Connected to u and v Both are Changed Edges 7 Connected to u and v Both are Existing Edges 8 Distance 2 Neighbors Distance 2 neighbor of Either 9</p><p>STEP II: Computing Changes in Motif Counts due to Modifications. In this step, the algorithm assesses the impact of adding or removing an edge on motif counts. For example, adding an edge can form new motifs, such as transforming a wedge into a triangle, can increase the count of some motifs Algorithm 1 The inputs are, G, the original graph and, Q, the graph induced by the set of changed edges W . The algorithm examines each vertex connected to a recently modified edge, e = (u, v) &#8712; W , and updates the value of X[z] to reflect the classification of the distance-1 neighbor z &#8712; N G'Q (e), i.e. z is a neighbor of the end point(s) of e in either G or Q or both.</p><p>Initialize Array X to zero 3:</p><p>for all z &#8712; N Q (u) \ {v} do 6:</p><p>12:</p><p>else 13:</p><p>14:</p><p>else 20:</p><p>The inputs are, G, the original graph and, Q, the graph induced by the set of changed edges W . The algorithm computes all distance-2 motifs incident to an updated edge (u, v). K [i] counts the distance 1 neighbors where X(z) = i. M G [i, j] counts the number of distance 2 neighbors where X(p) = i and X(q) = j, and edge (p, q) in G. M Q [i, j] counts the number of distance 2 neighbors in Q.</p><p>for all a &#8712; N G'Q (e) \ {u, v} do 5:</p><p>for all i = 1..X.length do 14:</p><p>and decrease the count of others. Conversely, deleting an edge decomposes motifs that depend on it, reverting to simpler forms. The algorithm employs the integer mapping scheme described above to monitor these modifications, updating only affected motifs without recalculating the entire graph.</p><p>Given the values of M G (ij) M Q (ij) and K(i), we can compute H add ( u,v) x and H minus ( u,v) x for H x of size 4 or less in constant time. For simplicity, when it is clear which edge is being used, we write these terms as H add x and H minus x . When evaluating k-size motifs, it suffices to adjust only their counts. Smaller motifs up to size k -1 are inherently included within k-size motifs and were accounted for during the processing of their respective dimensions. For example, as shown in Figure <ref type="figure">2</ref>(B) let the edges (p, q), (q, v), (v, p) and (u, v) form a triangle with a tail (H 5 ). If edge (u, v) is deleted, we are left with a triangle. However, the triangle has already been counted when processing motifs of size 3, and we do not need to add it to the count again. On the other hand, if the edge (p, v) is deleted, then a new 4-node motif, a 3-linear path (H 4 ), is formed. We have to increase the count of this motif. In both cases, the frequency of tailed triangles is decreased by 1.</p><p>To compute updates H add x and H minus x , we consider u and v, the endpoints of edge (u, v), and vertices p and q, which are distance-1 or distance-2 neighbors of u or v in graph G or in updated graph G &#8242; = G + Q, to create the following functions;</p><p>We now demonstrate the computation of updates utilizing these functions for motifs of sizes 3 and 4. For clarity, we denote X(p) as i and X(q) as j.</p><p>Motifs of Size 3. Motifs of size 3, wedges and triangles, are given by the values of K(i). A wedge is formed for every neighbor p, where</p><p>For each neighboring vertex p with 5 &#8804; i &#8804; 8, a triangle is created, leading to H add 2 = K(i) for 5 &#8804; i &#8804; 8. In cases where 5 &#8804; i &#8804; 6, a wedge is transformed into a triangle, resulting in H minus 1 = K(i) for 5 &#8804; i &#8804; 6. Motifs of Size 4. Different cases for motifs of size 4 are described below. Figure <ref type="figure">2</ref> provides examples of these cases.</p><p>1) Vertices p and q are both connected to the same vertex.</p><p>Here, either p, q are both connected either to u or to v.</p><p>is given by function <ref type="bibr">(2)</ref>. See example in Figure <ref type="figure">2</ref> (F). Motifs of Size k. When counting motifs of size k, we can follow the same philosophy of finding base motifs of size k -2 with at least one changed edge and then enumerating the different ways in which the remaining two vertices p and q can be connected to the base motif. Given a base motif of size r = k -2, the number of combinations by which p (or q) are connected to the base motif is; i=r i=1 2 i r i +1. This value is obtained as follows: Vertex p q) can have at most r connections to the vertices of the base motif. The number of ways i connections can be formed out of r options is r i . Each connection is an existing or updated edge, which adds a factor of 2 i . We add an extra 1 for the case where p is a distance-2 neighbor of all vertices in the base motif.</p><p>As k increases, the number of base motifs increases combinatorially. Thus for larger values of k counting the motifs on dynamic or even static graphs quickly becomes infeasible. This is not surprising, as finding subgraphs in larger graphs is NP-complete. However, for smaller values of k this method provides a blue print to update motifs on dynamic graphs.</p><p>Motifs with Multiple and Mixed Updated Edges. Motifs may include multiple update edges, including a mix of inserts and deletes. We address the complexity of such cases by ensuring that overlapping updates are handled efficiently.</p><p>In case of multiple edges, assigning edge priority using vertex identifiers can potentially eliminate recomputation, yet offers minimal performance gains. This is due to the necessity of traversing all updated edges in a motif to determine priority.</p><p>In a more straightforward approach, we normalize H add and H minus by the count of updated edges in the motif. For instance, in Figure <ref type="figure">2</ref>(R), H add is divided by three, corresponding to the updated edges (u, v), (p, u), and (q, v). As the update function operates once per updated edge, the sum of contributions to H add 8 from these edges equals 1. The updated edges can contain both insertion and deletion. In such cases, we first apply the updates for only deleted edges. In the next step, we update for only inserted edges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Proof of Correctness and Complexity</head><p>We prove that our updating algorithm provides the same frequency of motifs of size k = 3, 4 as that obtained by the static algorithm. We observe that in our algorithm, motifs are computed only when an edge is updated, and each updated edge causes either a new motif to be formed or an old motif to be defunct, or both. Further, inserting an edge has exactly the equal and opposite effect as deleting an edge. Thus, it is sufficient to show correctness for only insertions.</p><p>For motifs of size 3; insertion of an edge can either add a wedge or add a triangle while losing the wedge that it closes. These are the exact changes listed in our algorithm.</p><p>For motifs of size 4, we have exactly 5 cases by which p and q can be connected to the inserted edge (u, v). If both p and q are connected to u and v, then they can be connected with one edge or two edges, and they can be connected to either one vertex or two vertices. The total number of possibility is (2x2=4). The fifth case is when either p or q is a distance-2 neighbor and not connected to either u or v. All these cases are considered in Step II for updating motifs of size 4. For each case, two options are considered, when p and q are connected, and when they are not. Thus, all the new motifs are covered.</p><p>A motif is lost or defunct if it had 4 vertices before the update. For case 1, there is no defunct motifs. For cases 2 and 5 there is one each, Figure <ref type="figure">2</ref> (D) and (F). For case 3, if a tailed triangle is formed, then the defunct motif would be a line, Figure <ref type="figure">2 (G)</ref>. If a diamond is formed, the tailed triangle would be defunct Figure <ref type="figure">2 (H)</ref>. The tailed triangle can break to smaller motifs of a paw Figure <ref type="figure">2</ref> (I) or a line Figure <ref type="figure">2 (J)</ref>.</p><p>For case 4, forming the diamond loses a square, Figure <ref type="figure">2  (K)</ref>. From the square, a line is lost Figure <ref type="figure">2 (L</ref> and <ref type="figure">M</ref>). From the clique, all other size 4 motifs can be made defunct including the diamond Figure <ref type="figure">2</ref> (N), the tailed triangle Figure <ref type="figure">2</ref> (O,P), the paw Figure <ref type="figure">2</ref> (Q) and the line Figure <ref type="figure">2 (R)</ref>.</p><p>Thus our algorithm considers all cases of insertion (or deletion) so the motif count is the same as the static recomputing algorithm.</p><p>Complexity:</p><p>Step 1, for each updated edge, all distance-2 neighbors of u and v are considered. If the average degree of the graph is d avg , then the number of neighbors accessed is d 2 avg . If the number of updated edges is |W |, then the complexity of Step 1 is O(|W |d 2 avg ). For Step II, computing each H add x and H minus x takes constant operations. There are six motifs, so complexity of both steps together is O(|W |d 2 avg ). A static algorithm is also considered up to distance-2 neighbors. The complexity of this algorithm is O(|E|d 2 avg ), where |E| is the number of edges. As |W | &#8594; |E|, the complexity of the static and dynamic methods converge.</p><p>The algorithm is trivially parallelizable, with each edge being assigned to a processor. If there are P processors, the parallel complexity withh be O((|W |/P )d 2 avg ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Implementation Details</head><p>We implement our algorithm for multi-core shared-memory systems and GPUs using the Kokkos C++ performance portability ecosystem. Kokkos maps a single C++ source code to various backends such as OpenMP and CUDA with performance comparable to that of a native implementation. We develop a suite of parallelization strategies and optimizations to improve performance across multiple architectures.</p><p>We leverage hierarchical parallelism to accelerate motif counting. Algorithm 1 and Algorithm 2 are applied to each changed edge independently by a team of threads in the outer loop. Within each Algorithm, we parallelize the integer map update in the inner loop using the team of threads.</p><p>Parallelizing the algorithm hierarchically requires temporary data structures for each team of threads. Each team of threads needs an integer map and an array of the neighbors for each vertex. The size of the integer map and neighbor array depends on the number of vertices and the maximum degree, which can lead to high memory usage for large, dense graphs. The high memory use limits the number of thread teams running concurrently without running out of memory. Running fewer teams concurrently can result in more computation resources being idle. Increasing the number of threads in each team may help increase resource utilization for edges with many neighbors, but still leave large parts of the GPU idle when the number of neighbors is few. Thus, minimizing memory usage is vital for maximizing the performance on massively parallel architectures like GPUs. To mitigate the high memory use, we store neighbors in a contiguous array whose size is determined by the maximum vertex degree.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. EXPERIMENTS</head><p>Our analysis of ParaDyMS on shared-memory and GPUbased platforms allows us to explore how graph characteristics and hardware configurations impact performance. We compare our results on the CPU with the only state-of-the-art parallel algorithm for global motif counting on static networks, PGD <ref type="bibr">[9]</ref>. We assess the scalability and suitability of ParaDyMS with both OpenMP and Nvidia GPUs for networks under different update scenarios, providing insights into optimizing computational resources for dynamic network workloads.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Experimental Settings</head><p>We evaluate the efficiency of ParaDyMS in dynamic networks for k = 3, 4 motifs, across eight different graphs characterized by vertices, edges, triangles, degrees, and triangle-toedge ratios (Table <ref type="table">II</ref>). We compute the results with networks with synthetic updates, which allows us to control the total number of updated edges as well as evaluate different ratios of the number of inserted edges to the number of deleted edges.</p><p>TABLE II: Table of graphs with number of vertices (|V |), number of edges (|E|), and number of triangles (|T |), average and maximum degree, and ratio of triangles to edges. |V | |E| |T | |Deg| |T |/|E| Avg (Max) in-2004 1.6M 2.6M 578M 32 (11K) 222.30 soc-orkut 3M 106M 1.6B 70 (27K) 150.51 soc-flickr 514K 3.2M 176M 23 (15K) 55.09 livejournal 4M 27.9M 250.7M 13 (2.7K) 8.98 wikiTalk 2.4M 4.6M 27.6M 3 (100K) 6 soc-pokec 1.6M 22.3M 97.7M 27 (15K) 4.3 dblp-coauthor 1.3M 5.4M 36.6M 8(2K) 6.77 road-usa 23.9M 28.9M 1.3M 2 (9) 0.04</p><p>We compare the ParaDyMS's runtime against a stateof-the-art static algorithm for edge-based motif counting, PGD <ref type="bibr">[9]</ref> since it is the only algorithm that performs global counting of the set of motifs of size k = 3, 4 in a single pass in parallel. We use the open-source code provided by the authors (see <ref type="url">https://github.com/nkahmed/PGD</ref>), compiled with GCC 11.2 and OpenMP 4.5 using the default build script. We run with default parameters, only specifying the number of threads and the location of the graph to use.</p><p>We perform the tests on two platforms from the Texas Advanced Computing Center (TACC): shared-memory experiments are performed on the Stampede3 cluster, featuring dual Intel Xeon CPU MAX 9480 processors ("Sapphire Rapids HBM") with 112 Cores and 128 GB of HBM2e RAM, leveraging OpenMP; GPU experiments are performed on an Nvidia A100 GPU node from the Lonestar6 cluster, utilizing CUDA 11.4 for implementation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Shared-Memory Experiments</head><p>We perform shared-memory experiments using OpenMP for parallelization. We compare the execution time of ParaDyMS and PGD by generating a set of updates and testing the time to compute the frequency of motifs. The inserted edges are randomly selected from distinct, initially unconnected vertices. The deleted edges are randomly chosen from the edges of the original graph. We test three types of updates:</p><p>(1) 100% insertions, (2) 50% insertions and 50% deletions, and (3) 100% deletions. A set of 1M updates is used, except for graphs with fewer than 5M edges, where the update size is adjusted to 10% of the edge count.</p><p>Figure <ref type="figure">3</ref> shows the performance of ParaDyMS and PGD for the six initial datasets listed in Table <ref type="table">II</ref>. In particular, ParaDyMS consistently outperforms PGD in terms of efficiency for these datasets. Specifically, graphs such as soc-orkut and pokec demonstrate enhancements of up to 22 times, highlighting the capability of ParaDyMS to exploit high-degree nodes and dense graph structures efficiently. The datasets flickr and wikitalk also show significant performance improvements for insert-dominated updates (100% insertions) and balanced updates (50% insertions, 50% deletions). However, deletions for these datasets show performance comparable to that of PGD, as the computation cost for deletions is higher due to the need to update a larger number of triangles.</p><p>In contrast, for the last two graphs, dblp-coauthor and road-usa, ParaDyMS exhibits lower performance relative to PGD (Figure <ref type="figure">4</ref>). These graphs have fewer triangles with low average and maximal degrees, which reduce the parallelism benefits of ParaDyMS. Specifically, the road-usa dataset highlights significant challenges due to its sparse nature, low triangle-to-edge ratio, and high vertex count, which limit the algorithm's efficiency and scalability. In such cases, PGD performs more efficiently, particularly for static computations that do not benefit as much from dynamic motif updates. This emphasizes the dependency of ParaDyMS's performance on the graph structure, and the nature of updates applied.</p><p>In summary, networks with high triangle density and maximum degree exhibit remarkable improvements with ParaDyMS, surpassing PGD by an order of magnitude for</p><p>1 2 4 8 16 32 64 112 Number of Threads 0 1 10 100 1000 10000 Execution Time (sec) 787 1116 1817 3206 4978 8783 17293 35632 33 32 58 117 199 388 776 1562 10 9 16 36 53 106 212 424 Algorithm ParaDyMS PGD Insert-Delete Ratio 0-100 50-50 100-0 (a) Orkut</p><p>1 2 4 8 16 32 64 112 Number of Threads 0 1 10 100 Execution Time (sec) 17 20 31 56 80 142 267 549 3 4 9 13 23 45 89 179 1 1 3 7 9 18 36 71 Algorithm ParaDyMS PGD Insert-Delete Ratio 0-100 50-50 100-0 (b) Pokec 1 2 4 8 16 32 64 112 Number of Threads 0 1 10 100 Execution Time (sec) 13 14 22 41 57 101 188 362 6 6 9 18 30 61 123 250 Algorithm ParaDyMS PGD Insert-Delete Ratio 0-100 50-50 100-0 (c) Livejournal 1 2 4 8 16 32 64 112 Number of Threads 0 1 10 100 Execution Time (sec) 6 10 20 39 57 114 232 478 4 6 12 23 44 88 173 350 0 0 1 1 2 4 7 14 Algorithm ParaDyMS PGD Insert-Delete Ratio 0-100 50-50 100-0 (d) Flickr</p><p>1 2 4 8 16 32 64 112 Number of Threads 0 1 10 100 1000 Execution Time (sec) 8 13 27 52 82 165 337 703 4 8 15 30 59 117 236 466 1 1 2 4 7 14 28 53 Algorithm ParaDyMS PGD Insert-Delete Ratio 0-100 50-50 100-0 (e) In2004</p><p>1 2 4 8 16 32 64 112 Number of Threads 0 1 10 100 1000 Execution Time (sec) 38 61 99 174 261 507 1039 2186 15 23 46 92 172 338 696 1351 1 1 2 5 9 17 35 71 Algorithm ParaDyMS PGD Insert-Delete Ratio 0-100 50-50 100-0 (f) Wikitalk  Execution times are compared across thread counts with 0%-100%, 50%-50%, and 100-0 insert-delete ratios. complete insertions. Networks like road-usa underperform due to minimal triangle density. Insertions are faster than deletions, as fewer triangles require modifications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. GPU Experiments</head><p>We perform the GPU experiments to evaluate the performance of ParaDyMS, as no spectral relationship allows for a direct comparison with other methods. ParaDyMS demonstrates similar or better performance on a single Nvidia A100 GPU compared to a dual-socket Sapphire Rapids node for most datasets. Figure <ref type="figure">5</ref> illustrates the runtime comparison of ParaDyMS on an Nvidia A100 GPU and a dual-socket Intel Sapphire Rapids (SPR) CPU across multiple datasets with varying insert-delete ratios. For graphs with high triangle density, such as Orkut, Flickr, and in-2024 the GPU shows competitive performance, particularly in scenarios dominated by insertions (100%-0% ratio). In the case of RoadUSA, a graph with low triangle-to-edge ratios, SPR outperforms the GPU due to its capacity to handle sparse memory access patterns more efficiently. The balanced insert-delete ratio (50-50) shows consistent performance across platforms, with GPUs exhibiting slightly lower variance in runtime. The findings underscore GPUs' effectiveness for graphs with numerous triangles and frequent insertions, while CPUs remain pertinent for sparser graphs with complex memory needs. Our analysis illustrates the significance of customizing computational resources to the graphs' traits and update patterns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Sensitivity of Edge Selection</head><p>We examine GPUs' effectiveness for dynamic motifcounting tasks when reliable performance is critical. We assess ParaDyMS's stability using three random edge update sets from the Orkut dataset. Results show minimal performance variance, with GPUs having less variability than CPUs. Figure <ref type="figure">6</ref> presents the execution time of ParaDyMS for the Orkut data on A100 and SPR with edge updates tested across insert-delete ratios: 0-100, 50-50, and 100-0. The boxplots display the performance variability of the algorithm on both platforms. Updates dominated by deletions (0%-100%) exhibit higher execution times on both platforms, reflecting the additional computational cost as more triangles need to be modified. Insertion-dominated updates (100%-0%) show the lowest execution times, as fewer motifs require recalculation.</p><p>The A100 GPU shows minimal performance variation across different insert-delete ratios, demonstrating stable behavior regardless of update patterns or edge selection. This stability is due to the GPU's capability to effectively parallelize motif updates, reducing the influence of edge selection randomness. In comparison, the SPR platform experiences greater execution time variation due to the algorithm's dynamic nature and the competition for CPU threads during balanced updates, which heightens the variability from random edge selection. The findings suggest that ParaDyMs generally remains stable with random edge selection updates. The GPU's capability to ensure reduced variance highlights its appropriateness for dynamic workloads that demand consistent execution times. Although the SPR platform occasionally performs similarly, it exhibits greater sensitivity to random edge selection.</p><p>IV. RELATED RESEARCH Motif counting research predominantly focuses on smaller motifs like triangles and cliques <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref>. In contrast, parallel methods for subgraph isomorphism <ref type="bibr">[14]</ref> identify larger subgraphs specified by users. However, these techniques require several executions to enumerate all the k = 3, 4 motifs. Static motif counting algorithms calculate either the global frequency of motifs within a network or the local count, which involves determining the number of motifs linked to each vertex. Local motif counting demands more time and space compared to global counting. The Parallel Graphlet Decomposition (PGD) library <ref type="bibr">[9]</ref> introduces equations to count edge-based motifs of k = 4 with OpenMP support for parallelism. 0-100 50-50 100-0 Insert-Delete Ratio 10 20 30 Execution Time (sec) Platform A100 SPR Fig. 6: Performance comparison of ParaDyMS on the Orkut dataset with different random edge selection updates across three insert-delete ratios (0%-100%, 50%-50%, 100%-0%) on Nvidia A100 GPU and Intel Sapphire Rapids (SPR).</p><p>ESCAPE <ref type="bibr">[11]</ref> calculates k &#8804; 5 global counts using smaller vertex-based motifs and exploits directed graph orientations to reduce the search space but is serial. EVOKE <ref type="bibr">[10]</ref> extends to local orbit-based vertex counts but is slower than ESCAPE. Few decomposition-based motif census methods utilize GPUs <ref type="bibr">[15]</ref>, shared <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>, or distributed memory <ref type="bibr">[16]</ref>. <ref type="bibr">Ribeiro et al. (2019)</ref>  <ref type="bibr">[6]</ref> review various algorithms for static motif counting. Dynamic motif counting algorithms tackle distinct challenges: some skip certain size k motifs <ref type="bibr">[17]</ref>, others focus on subgraph mining <ref type="bibr">[18]</ref>, and some provide approximate counts <ref type="bibr">[19]</ref>.</p><p>V. CONCLUSION</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>For soc-orkut with 1M inserts with OpenMP with 64 threads</p></note>
		</body>
		</text>
</TEI>
