<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs</title></titleStmt>
			<publicationStmt>
				<publisher>Association for Computing Machinery</publisher>
				<date>03/26/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10533452</idno>
					<idno type="doi"></idno>
					
					<author>B Yang</author><author>W Zheng</author><author>X Lian</author><author>Y Cai</author><author>Sean X Wang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[As one of the most primitive operators in graph algorithms, such as the triangle counting, maximal clique enumeration, and subgraph listing, a set intersection operator returns common vertices between any two given sets of vertices in data graphs. It is therefore very important to accelerate the set intersection, which will benefit a bunch of tasks that take it as a built-in block. Existing works on the set intersection usually followed the merge intersection or galloping-search framework, and most optimization research focused on how to leverage the SIMD hardware instructions. In this paper, we propose a novel multi-level set intersection framework, namely hierarchical set partitioning and join (HERO), by using our well-designed set intersection bitmap tree (SIB-tree) index, which is independent of SIMD instructions and completely orthogonal to the merge intersection framework. We recursively decompose the set intersection task into small-sized subtasks and solve each subtask using bitmap and boolean AND operations. To sufficiently achieve the acceleration brought by our proposed intersection approach, we formulate a graph reordering problem, prove its NP-hardness, and then develop a heuristic algorithm to tackle this problem. Extensive experiments on real-world graphs have been conducted to confirm the efficiency and effectiveness of our HERO approach. The speedup over classic merge intersection achieves up to 188x and 176x for triangle counting and maximal clique enumeration, respectively.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>In a wide spectrum of real-world applications such as social network analysis <ref type="bibr">[7,</ref><ref type="bibr">25]</ref>, friend recommendation <ref type="bibr">[4,</ref><ref type="bibr">15,</ref><ref type="bibr">17]</ref>, and community search/detection <ref type="bibr">[9,</ref><ref type="bibr">28,</ref><ref type="bibr">29]</ref>, a set intersection problem, which obtains common elements between any two sets, has been one of the most fundamental and important operators in graph algorithms/tasks such as the triangle counting <ref type="bibr">[8,</ref><ref type="bibr">27]</ref>, maximal clique enumeration <ref type="bibr">[5]</ref>, and subgraph listing <ref type="bibr">[24,</ref><ref type="bibr">30]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Motivation Example</head><p>Below, we give a motivation example of the friend recommendation in social networks.</p><p>Example 1.1 (Friend Recommendation in Social Networks). Consider an example of social network, &#119866;, in Figure <ref type="figure">1</ref>, where each vertex is a user and each edge between two vertices indicates the friend relationship between the two users. One important function on the social network platform (e.g., Twitter or Facebook) is to help users find and connect their friends. Intuitively, if two users have many common friends on social networks, they are more likely to know each other in reality. Therefore, in this case, we need to obtain a set of common friends between these two users and recommend them to add each other as friends (if this set is large).</p><p>Figure <ref type="figure">1</ref> illustrates two users &#119906; 1 and &#119906; 2 who are not connected in social network &#119866;, where their 1hop neighbor sets are given by N (&#119906; 1 ) = {&#119907; 3 , &#119907; 4 , &#119907; 5 , &#119907; 7 , &#119907; 9 , &#119907; 22 } and N (&#119906; 2 ) = {&#119907; 3 , &#119907; 10 , &#119907; 11 , &#119907; 17 , &#119907; 18 , &#119907; 24 }, respectively. To find their common friends, we need to conduct the set intersection operator, N (&#119906; 1 ) &#8745; N (&#119906; 2 ), between their 1-hop neighbor sets, and obtain common friends (i.e., {&#119907; 3 } in this example). &#9632;</p><p>Due to the large scale of social networks &#119866; in Example 1, in the worst case, there are quadratic pairs of user vertices (or set intersections) that need to be checked. Therefore, it is rather important, yet challenging, to optimize the cost of performing each set intersection operator, which inspires our work in this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Existing Methods and Limitations</head><p>Existing works on the set intersection <ref type="bibr">[14,</ref><ref type="bibr">16,</ref><ref type="bibr">23]</ref> follows a widely-used framework, the merge intersection, which assumes two sorted sets &#119860; and &#119861; (w.l.o.g. in the ascending order) and scans to merge both lists at the same time. The time complexity of this merge intersection method is O (|&#119860;| + |&#119861;|) in the worst case, where |&#119860;| and |&#119861;| are the numbers of elements in &#119860; and &#119861;, respectively. Another classic approach to perform the set intersection is the galloping search <ref type="bibr">[10]</ref>. Let us assume that |&#119860;| &lt; |&#119861;|. For each element &#119906; in &#119860;, the galloping search determines whether &#119906; is contained in &#119861; by using a binary search. The time complexity of this method is O (|&#119860;| log |&#119861;|), thereby it is more efficient when |&#119860;| &#8810; |&#119861;| holds.</p><p>A bunch of algorithms have been developed powered by SIMD <ref type="bibr">[1,</ref><ref type="bibr">14,</ref><ref type="bibr">16]</ref>. For example, QFilter <ref type="bibr">[14]</ref> improves the merge intersection by leveraging the SIMD instructions and a proposed bytechecking filter. However, due to the variability in available SIMD support across different processor architectures, the SIMD instruction sets are architecture-specific, leading to the inconvenience of implementing algorithms. In the worst-case scenario, some processors may not have SIMD instructions at all, making the SIMD-based algorithms even unable to work. Recently, a reducingmerging framework <ref type="bibr">[31]</ref> has been devised to enhance the performance of merge intersection. The basic idea is to reduce the input sets as much as possible before conducting intersection. The corresponding time cost is O (log |&#119860;| +log |&#119861;| + |&#119860; &#8242; | + |&#119861; &#8242; |), where &#119860; &#8242; and &#119861; &#8242; refer to subsets retrieved from &#119860; and &#119861; with range code reduction, respectively. The reducing-merging framework highly depends on the reducing performance, however, it remains an open problem to maximize the reducing ability.</p><p>Although advanced techniques such as SIMD <ref type="bibr">[14]</ref> and set reduction <ref type="bibr">[31]</ref> have been proposed to boost the set intersection task, they all take merge intersection or galloping-search paradigms as the backbone. In contrast, we focus on the following question: Can we develop a novel and more powerful framework to make set intersections on graphs even faster in practice?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Our Contributions</head><p>In order to speed up the intersection &#119860; &#8745; &#119861; between two sets &#119860; and &#119861;, in this paper, we design a novel multi-level set intersection framework, namely hierarchical set partitioning and join (HERO), which consists of offline set decomposition and online set intersection stages. The intuition behind our HERO framework is to decompose the task of the set intersection into several intersection subtasks with smaller subsets and avoid the computation cost if one of the two subsets to intersect in subtasks is empty.</p><p>Specifically, in the offline set decomposition phase, we divide the universal set into disjoint partitions and project each set &#119860; (or &#119861;) onto these partitions (called set partitioning), leading to projected subsets. Then, we take the IDs of non-empty (projected) subsets and form a new partition ID set. We recursively project this new partition ID set into partition ID subsets. This way, we can offline build a hierarchical structure of (element or partition ID) subsets for the original set &#119860; (or &#119861;).</p><p>In the online set intersection phase, we will start from the top of the hierarchical structures for sets &#119860; and &#119861; to perform the join over partition ID sets (called partition ID join). Then, we can traverse hierarchical structures of &#119860; and &#119861; to the next-level (sub)set intersection, based on the partition IDs in the join results, in a recursive top-down manner.</p><p>Under the HERO framework, we also propose an effective index, namely set intersection bitmap tree (SIB-tree) for the set intersection in the graph. We notice that the bitwise operators, supported by most computers, enable bit-level parallel comparisons, implying a great potential to accelerate the set intersection task. To the end, it is necessary to represent each set using a bitmap where each bit denotes whether or not a vertex appears in the set. However, maintaining bitmaps for sets of neighbors for all the vertices results in the overhead O (|&#119881; | 2 ), which is impractical for large-scale graphs. Moreover, due to the size limit of an operand register (namely word length), the bitwise operators (such as boolean AND operation) on over-width bitmaps cannot be completed by a single instruction. To handle the problems, we construct the SIB-tree, that is, plugging the bitwise operators into the HERO framework by decomposing the original set intersection into small-sized subtasks such that each subtask can be efficiently solved by a single instruction. The SIB-tree can facilitate the multi-level set intersection efficiently.</p><p>In fact, many different SIB-trees can be built based on distinct set partitionings, and thus affect the intersection performance. We find that any particular set partitioning can be achieved by applying the graph reorder technique. Therefore, we formulate a novel graph reordering task and develop an efficient heuristic algorithm to optimize the performance of our proposed HERO approach, by minimizing the total size of the SIB-tree for all vertices in the graph.</p><p>In summary, we make the following contributions in this paper.</p><p>(1) A novel set intersection framework: We propose a novel hierarchical set partitioning and join (HERO) framework in Section 3 that decomposes the set intersection task into small-sized subtasks and filters out unnecessary comparisons. (2) A well-designed index for the set intersection: We build an effective set intersection bitmap tree (SIB-tree) index for set intersection in the graph in Section 4, which can support efficient accomplishment of (sub)set intersection (sub)tasks via boolean AND operations over bitmaps. (3) Graph ordering optimization: We formulate a novel graph reordering problem and propose a heuristic algorithm to optimize the performance of the proposed HERO approach in Section 5. (4) Remarkable empirical studies: We have conducted extensive experiments in Section 6 to evaluate the proposed HERO approach. The results show that the speedup of our approach over the classic merge intersection achieves up to 188x and 176x for triangle counting and maximal clique enumeration, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">PROBLEM DEFINITION</head><p>In this section, we formally define several concepts used for the set intersection problem.</p><p>Set Over the Graph. A graph is denoted as &#119866; = (&#119881; , &#119864;), where &#119881; and &#119864; represent the sets of vertices and edges, respectively. We define a (vertex) set in the graph &#119866; below.</p><p>Definition 2.1 (Set in the Graph). Given a graph &#119866; = (&#119881; , &#119864;), a set &#119860; in the graph refers to a subset of &#119881; , i.e., &#119860; &#8838; &#119881; . One example of the set &#119860; in graph &#119866; can be a set of neighbors of a vertex &#119906; &#8712; &#119881; . In this paper, we assume that each vertex &#119906; is assigned with a unique integer ID, and all vertices in a set &#119860; in graph &#119866; are sorted (e.g., in ascending order). Set Intersection Problem. Next, we define the set intersection over the graph as follows.</p><p>Definition 2.2 (Set Intersection in the Graph). Given two sets &#119860; and &#119861; in the graph &#119866;, the set intersection task returns a set, &#119860; &#8745; &#119861;, of common vertices between two sets &#119860; and &#119861;, i.e., &#119860;</p><p>As discussed above, two classical methods, merge intersection and galloping search, can be used to handle the set intersection problem in graphs. However, they often fail to fully leverage the intrinsic relationships among vertices, which can be improved. For instance, the merge intersection often involves unnecessary comparisons, leading to inefficiency. Furthermore, due to the limitation of comparing only one pair of elements per machine instruction, they are unable to fully exploit parallel advantages offered by bit-level computations. To address these limitations, we develop a novel HERO framework, which effectively filters out unnecessary comparisons and seamlessly integrates efficient boolean AND operations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">THE HIERARCHICAL SET PARTITIONING AND JOIN FRAMEWORK</head><p>In Section 3.1, we introduce a two-level intersection method that decomposes set intersection into two subtasks of intersection to be performed in two stages. We then expand the two-level method to a multi-level intersection framework by recursively abstracting set intersection to a higher-level task in Section 3.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Set Partitioning and Join</head><p>Set Partitioning: Let &#119878; be a universal set that contains all possible elements in the domain of set elements (e.g., all the vertex IDs in the graph). Given a positive integer &#119899;, our two-level set partitioning and join approach partitions the universal set &#119878; into &#119899; disjoint subsets &#119878; 1 , &#119878; 2 , . . . , and &#119878; &#119899; . We also call such subsets, &#119878; &#119894; , space partitions of the set &#119878;, where &#119894; is the partition ID of the subset &#119878; &#119894; . Definition 3.1 (Subset Projection). Given a set &#119860; &#8838; &#119878; and a subset &#119878; &#119894; of &#119878;, the subset projection, &#119860; &#119894; , of &#119860; over &#119878; &#119894; contains common vertices between &#119860; and &#119878; &#119894; , i.e., &#119860; &#119894; = &#119860; &#8745; &#119878; &#119894; .</p><p>Problem Reduction Based on the Set Partitioning: Given any two sets &#119860; &#8838; &#119878; and &#119861; &#8838; &#119878; to be intersected, it holds that:</p><p>where &#119860; &#119894; and &#119861; &#119894; (for 1 &#8804; &#119894; &#8804; &#119899;) are the subsets projected from &#119860; and &#119861; onto the partition &#119878; &#119894; , respectively, that is, &#119860; &#119894; = &#119860; &#8745; &#119878; &#119894; and &#119861; &#119894; = &#119861; &#8745; &#119878; &#119894; . Note that, when &#119860; &#119894; = &#8709; or &#119861; &#119894; = &#8709; holds, we have &#119860; &#119894; &#8745; &#119861; &#119894; = &#8709;. Therefore, in this case, we can actually avoid the set intersection cost of the &#119894;-th subtask &#119860; &#119894; &#8745; &#119861; &#119894; , if either &#119860; &#119894; or &#119861; &#119894; is empty. Partition ID Join: Based on the observation above, we can record whether or not subsets &#119860; &#119894; (or &#119861; &#119894; ) are empty, by keeping their partition IDs &#119894; in a so-called support set defined below (i.e., partition ID set) Sup &#119860; (or Sup &#119861; ) if &#119860; &#119894; (or &#119861; &#119894; ) are not empty. Then, we can join the two support sets Sup &#119860; and Sup &#119861; to obtain a list of subset pairs (or partition IDs) that need to perform the actual set intersection (i.e., subtasks).</p><p>Below we give the definition of the support set.</p><p>Definition 3.2 (Support Set). Given two sets &#119860; and &#119861; to be intersected, their support sets, Sup &#119860; and Sup &#119861; , over the space partition &#119878; 1 , &#119878; 2 , . . . , &#119878; &#119899; of the universal set &#119878; are defined as Sup &#119860; = {&#119894; | &#119860; &#119894; &#8800; &#8709;, for 1 &#8804; &#119894; &#8804; &#119899;} and Sup &#119861; = {&#119894; | &#119861; &#119894; &#8800; &#8709;, for 1 &#8804; &#119894; &#8804; &#119899;}, where &#119860; &#119894; and &#119861; &#119894; are subset projections for &#119860; and &#119861; over &#119878; &#119894; , respectively.</p><p>Further Problem Reduction Based on the Partition ID Join: By using the concept of the support set (as given in Definition 3.2), we can rewrite the set intersection task in Equation (1) as:</p><p>Two-Level Intersection (Set Partitioning and Join): From Equation (2), we can: 1) first conduct the partition ID join, Sup &#119860; &#8745; Sup &#119861; , over the two support sets (Level 2, partition ID join), and 2) then perform the set intersection, &#119860; &#119894; &#8745; &#119861; &#119894; , only on those non-empty subsets &#119860; &#119894; and &#119861; &#119894; (Level 1, subset intersection), where partition IDs &#119894; are in the join result Sup &#119860; &#8745; Sup &#119861; . The above two steps exactly correspond to two levels of set intersection join, respectively. In summary, our principle is that the original task of the set intersection is decomposed into smaller subtasks of set intersections on two levels. We will later discuss in Section 3.2 how to generalize this principle to that of multiple levels in a hierarchical structure.</p><p>Example 3.3. Assume that we have a universal set &#119878; = {&#119907; &#8712; Z | 1 &#8804; &#119907; &#8804; 27}, which can be partitioned into 9 disjoint subsets, &#119878; 1 &#8764; &#119878; 9 , where</p><p>Given two sets &#119860; = {1, 4, 5, 7, 9, 22} and &#119861; = {1, 2, 10, 17, 18,24}, as shown in Figure <ref type="figure">2</ref>(a), we first compute subset projections, &#119860; &#119894; , of &#119860; over 9 partitions &#119878; &#119894; . That is, we have &#119860; 1 = {1}, &#119860; 2 = {4, 5}, &#119860; 3 = {7, 9}, and &#119860; 8 = {22} (note: other subsets &#119860; &#119894; = &#8709; for &#119894; &#8800; 1, 2, 3, 8). Similarly, for set &#119861;, we have non-empty subset projections &#119861; 1 = {1, 2}, &#119861; 4 = {10}, &#119861; 6 = {17, 18}, and &#119861; 8 = {24}.</p><p>Hence, the support set of &#119860; is Sup &#119860; = {1, 2, 3, 8}, whereas that of &#119861; is Sup &#119861; = {1, 4, 6, 8}.</p><p>In order to do the set intersection task &#119860; &#8745; &#119861;, as illustrated in Figure <ref type="figure">2</ref>(b), we first compute the support set intersection Sup &#119860; &#8745; Sup &#119861; = {1, 8} on the support set level (Level 2), and then perform the intersection of subsets for those partition IDs appearing in Sup &#119860; &#8745; Sup &#119861; (i.e., 1-st and 8-th partitions), that is,</p><p>Superiority of Two-Level Set Partitioning and Join: We illustrate the superiority of the two-level set intersection join mentioned above, by using Example 3.3. Figure <ref type="figure">3</ref> shows the computation costs of merge intersection <ref type="bibr">[12]</ref> and our two-level set intersection frameworks, where each line segment represents one comparison between two elements in the sets. We can see that our framework only takes 8 (=6+2) comparisons, which is lower than 10 comparisons of the merge intersection algorithm. This is because the set intersection over some partitions (e.g., Partitions 2, 3, 4, and 6) can be avoided, and thus their computation costs can be saved.</p><p>From Figure <ref type="figure">3</ref>, we can see that the major cost of our two-level set partitioning and join is now on Level 2 (i.e., 6 out of 8). It inspires us to further reduce the cost on this support set level, by introducing hierarchical set partitioning and join in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">The Hierarchical Set Partitioning and Join (HERO) Framework</head><p>In this section, we generalize our idea of set intersection partitioning and join (as mentioned in Section 3.1) from two levels to multiple levels in a hierarchical structure. Algorithm 1 shows our hierarchical set partitioning and join (HERO) framework for the set intersection operator. It has two stages, i.e., offline set decomposition (lines 1-8) and online set intersection stages (lines 9-13). Offline Set Decomposition: Given a set intersection task &#119860;&#8745;&#119861;, we consider an &#8462;-level intersection framework. Specifically, similar to the two-level case discussed in Section 3.1, on Level 1, we decompose the universal set &#119878; into &#119899; 1 disjoint partitions &#119878; 1  1 , &#119878; 1 2 , . . . , and &#119878; 1 &#119899; 1 , and obtain projected subsets {&#119860; 1 &#119894; } &#119899; 1 &#119894;=1 and {&#119861; 1 &#119894; } &#119899; 1 &#119894;=1 from sets &#119860; and &#119861;, respectively. For ease of presentation, we use the superscript to denote the level number.</p><p>Then, on Level 2, we maintain two support sets, Sup</p><p>1 &#119860; and Sup 1 &#119861; , containing partition IDs of nonempty projected subsets &#119860; 1 &#119894; and &#119861; 1 &#119894; . The universal set on this level becomes &#119878; 2 = { &#119895; &#8712; Z|1 &#8804; &#119895; &#8804; &#119899; 1 }. In order to enable the intersection between support sets, Sup 1 &#119860; &#8745; Sup 1 &#119861; , similar to Level 1, we recursively divide the new universal set into at most &#119899; 2 partitions &#119878; 2 1 , &#119878; 2 2 , ..., and &#119878; 2 &#119899; 2 , and project Sup 1 &#119860; and Sup 1 &#119861; onto these partitions, leading to &#119860; 2 &#119894; and &#119861; 2 &#119894; , where</p><p>In general, on Level &#119897; (2 &lt; &#119897; &#8804; &#8462;), we maintain two support sets Sup &#119897; -1 &#119860; and Sup &#119897; -1 &#119861; , containing partition IDs of non-empty projected subsets &#119860; &#119897; -1 &#119894; and &#119861; &#119897; -1 &#119894; . The universal set in this level should be &#119878; &#119897; = { &#119895; &#8712; Z|1 &#8804; &#119895; &#8804; &#119899; &#119897; -1 } and we divide it into &#119899; &#119897; partitions &#119878; &#119897; 1 , &#119878; &#119897; 2 , . . . , and &#119878; &#119897; &#119899; &#119897; . The support sets Sup &#119897; -1 &#119860; and Sup &#119897; -1 &#119861; will be projected onto these partitions, leading to the projected subsets &#119860; &#119897; &#119894; and &#119861; &#119897; &#119894; , where</p><p>Online Set Intersection: In contrast to the offline set decomposition, the online set intersection operates following a top-down paradigm. Initially, &#119877; &#8462; is calculated directly by intersecting the two support sets, Sup &#8462; &#119860; &#8745; Sup &#8462; &#119861; , (line 8). The filtering result &#119877; &#119897; determines which subsets in the &#119897;-th level subtasks need to be joined, where &#119897; ranges from &#8462; to 1. That is, for any element &#119894; in &#119877; &#119897; , we Algorithm 1: The Hierarchical Set Partitioning and Join (HERO) Framework Input: a universal set &#119878;, set &#119860;, set &#119861;, and level &#8462; Output: the intersection result &#119877; = &#119860; &#8745; &#119861; // offline set decomposition 1 Initialize &#119878; 1 as &#119878;, Sup 0 &#119860; as &#119860;, Sup 0 &#119861; as &#119861;</p><p>Details of the invoked procedure 12 Procedure Project(&#119883;, {&#119885; &#119894; } &#119899; &#119894;=1 ) 13 for &#119894; = 1 to &#119899; do 14 &#119883; &#119894; &#8592; &#119885; &#119894; &#8745; &#119883; 15 return {&#119883; &#119894; } &#119899; &#119894;=1 16 Procedure Join (&#119877;, {&#119883; &#119894; } &#119899; &#119894;=1 , {&#119884; &#119894; } &#119899; &#119894;=1 ) 17 &#119877; &#8242; &#8592; &#8709; 18 for &#119894; &#8712; &#119877; do 19 &#119877; &#8242; &#8592; &#119877; &#8242; &#8746; Intersect(&#119883; &#119894; , &#119884; &#119894; ) 20 return &#119877; &#8242; .</p><p>need to solve the subtask Intersection(&#119860; &#119897; &#119894; , &#119861; &#119897; &#119894; ) and add the results into &#119877; &#119897; -1 , while other subtasks can be safely skipped (lines <ref type="bibr">[17]</ref><ref type="bibr">[18]</ref><ref type="bibr">[19]</ref>. The above process also holds for &#119897; = 1, where the &#119877; &#119897; -1 (i.e., &#119877; 0 ) denotes result of &#119860; &#8745; &#119861;. The involved procedure Intersection(&#119883;, &#119884; ) is used to complete the subtask of computing the intersection between two sets &#119883; and &#119884; . Notice that any particular algorithm for set intersection, such as the merge intersection, can be applied.</p><p>Example 3.4. Following the settings of &#119878;, &#119860;, &#119861;, and the space partition of &#119878; in Example 3.3, we can easily have the following results.</p><p>&#8226; Sup 1 &#119860; = {1, 2, 3, 8} and Sup 1 &#119861; = {1, 4, 6, 8};</p><p>, and &#119861; 1 &#119897; = &#8709; for any &#119897; &#8713; Sup 1 &#119861; . To deal with the support set intersection of Sup 1  &#119860; and Sup 1 &#119861; , we define the 2nd level space partition as &#119878; 2 1 = {1, 2, 3}, &#119878; 2 2 = {4, 5, 6} and &#119878; 2 3 = {7, 8, 9}, where the new universal set is {&#119894; &#8712; Z|1 &#8804; &#119894; &#8804; 9}. Then, as illustrated in Figure <ref type="figure">4</ref>(a), we project the support set Sup 1  &#119860; and Sup</p><p>1 &#119861; into the partitions, and the non-empty projections should be &#119860; 2 1 = {1, 2, 3}, &#119860; 2 3 = {8}, &#119861; 2 1 ={1}, &#119861; 2 2 = {4, 6} and &#119861; 2 3 = {8}. Correspondingly, the support set should be Sup 2 &#119860; = {1, 3} and Sup 2 &#119861; = {1, 2, 3}. With the above pre-computation, we can execute the multi-layer intersection as follows, which is also illustrated in Figure <ref type="figure">4(b)</ref>.</p><p>(1)</p><p>1 8 = {1}. &#9632; Time Complexity. The complexity of online set intersection in our HERO framework is highly related to a particular algorithm used in the subtask Intersection(&#8226;, &#8226;). Assume that the complexity of procedure Intersection(&#119860;, &#119861;) is O (&#119891; (|&#119860;|, |&#119861;|)), where |&#119860;| and |&#119861;| are the size of sets &#119860; and &#119861;, respectively. The complexity of hierarchical online set intersection comprises of the cost of all the subtasks involved, i.e., O ( &#8462; &#119897;=1 &#119894; &#8712;&#119877; &#8462; &#119891; (|&#119860; &#119897; &#119894; |, |&#119861; &#119897; &#119894;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">BITMAP-BASED SET INTERSECTION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Set Intersection Using Boolean AND Over Bitmaps</head><p>In the HERO framework (as illustrated in Algorithm 1), one of the most fundamental and important operators is the subset intersections in subtasks (i.e., join two subsets to obtain their common elements). While we can plug in any existing set intersection method, in this paper, we use bitmap synopses to encode subsets and apply bit AND operator to enable the subset intersection.</p><p>Bitmap: Let &#119889; be the size of the set element domain. We define the bitmap, &#119861;&#119872; (&#119885; ), for a subset &#119885; as follows.</p><p>Definition 4.1 (Bitmap for a Subset, &#119861;&#119872; (&#119885; )). Given a set &#119885; , a bitmap, &#119861;&#119872; (&#119885; ), of set &#119885; is a bit vector containing &#119889; bits. If there exists an element &#119911; &#8712; &#119885; such that &#119891; (&#119911;) = &#119901;&#119900;&#119904;, then &#119861;&#119872; (&#119885; ) [&#119901;&#119900;&#119904;] = 1 holds; otherwise, we have &#119861;&#119872; (&#119885; ) [&#119901;&#119900;&#119904;] = 0, where &#119891; (&#119911;) is a hash function that maps an element &#119911; to an integer &#119901;&#119900;&#119904; within the range [0, &#119889;).</p><p>Boolean AND Over Bitmaps: Given two subsets &#119883; &#119894; and &#119884; &#119894; , we can obtain their intersection &#119883; &#119894; &#8745; &#119884; &#119894; (i.e., subtask) by using Boolean AND over their bitmaps, that is, &#119861;&#119872; (&#119883; &#119894; ) &#8743; &#119861;&#119872; (&#119884; &#119894; ). Then, we can obtain those non-zero bit positions, &#119901;&#119900;&#119904;, in the bitmap AND result, and their corresponding elements are common elements between subsets &#119883; &#119894; and &#119884; &#119894; . Discussions on the Advantages of Using Bitmaps for HERO:</p><p>Boolean AND operators are strictly constrained by the size of an operand register, also known as the word length. Thus, for the universal set &#119878; of large size (i.e., large element domain size), it is not efficient to directly perform Boolean AND operators over bitmaps of large size &#119889;. Assuming the word length is 64 and the graph contains more than a million vertices, it implies that more than 10,000 words are needed to encode a set using bitmaps, which leads to costly boolean AND operations to conduct a set intersection task. There are several methods that adopt the bitmap, such as the compressed bitmap method Roaring <ref type="bibr">[6]</ref> and the SIMD-based method QFilter <ref type="bibr">[14]</ref>. However, these methods have to conduct merge operations that perform vertex-wise comparisons during the intersection procedure. In other words, the set intersection partially benefits from the advantage of bitmaps (i.e., supporting bit-wise parallel comparisons).</p><p>In contrast, our HERO framework recursively decomposes the set intersection task into smallsized subtasks, until each subtask can be efficiently handled by a single boolean AND instruction. Hence, our HERO is scalable for the intersection over sets of large domain sizes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">SIB-Tree Structure</head><p>In this subsection, we discuss how to design a set intersection bitmap tree (SIB-tree) structure to facilitate our proposed HERO framework for the set intersection. Ordered Partitioning. Before introducing the details of the SIB-tree, we would like to unify the space partitioning in a natural way at first. To sufficiently leverage the advantage of the boolean AND operator, for subsets in varying-level space partitions, it is necessary to maintain a shared size of &#119908; which refers to the machine's word length. Given a universal set &#119878; &#119897; , it can be simply partitioned into</p><p>We call this partitioning strategy an ordered partitioning. Clearly, the ordered partitioning is sensitive to the element orderings, and we will discuss the impact of various orderings in Section 5.</p><p>Let us recall the offline set decomposition in the HERO framework. Subset projections are performed from the &#119897;-th level to ( &#119895; + 1)-th level recursively. It naturally produces a tree structure, where each node represents a non-empty subset projection as shown in Figure <ref type="figure">5</ref>. Following the tree, we can build an SIB-tree for each set to facilitate the boolean AND operation.</p><p>Definition 4.2 (Set Intersection Bitmap Tree, SIB-tree). Given a set &#119860;, its SIB-tree is denoted by &#119879; (&#119860;), where each node &#119879; &#119897; &#119894; (&#119860;) represents a non-empty subset projection &#119860; &#119897; &#119894; in a bitmap format and contains two attributes: base value and bitmap value.</p><p>&#8226; The base value, denoted by &#119879; &#119897; &#119894; (&#119860;).base, is set to &#119894;, which implies that the elements of &#119860; &#119897; &#119894; fall in the range (&#119908;</p><p>&#8226; (&#119894; -1), &#119908; &#8226; &#119894;]. &#8226; The bitmap value, denoted by &#119879; &#119897; &#119894; (&#119860;).&#119887;&#119894;&#119905;, is a &#119908;-bit bitmap whose &#119896;-th bit indicates whether or not the element &#119908; &#8226; (&#119894; -1) + &#119896; is involved in &#119860; &#119897; &#119894; . &#8226; The root node of the SIB-tree &#119879; (&#119860;) is &#119879; &#8462; 1 (&#119860;), where &#8462; = &#8968;log &#119908; |&#119878; |&#8969;. The &#8462;-th level universal set &#119878; &#8462; has no more than &#119908; elements, such that the corresponding &#8462;-th level subtask can be solved by a single boolean AND instruction. &#8226; The nodes {&#119879; 1 &#119894; (&#119860;)} &#119899; 1 &#119894;=1 constitute the leaf nodes. &#8226; For each node &#119879; &#119897; &#119894; (&#119860;), its child nodes are {&#119879; &#119897; -1 &#119906; (&#119860;)} &#119906; &#8712;&#119860; &#119897; &#119894; . It is clear that the &#119897;-th layer nodes of the SIB-tree &#119879; (&#119860;) represent the &#119897;-th level non-empty subset projections {&#119860; &#119897; &#119894; } &#119894; &#8712;Sup &#119897; &#119860;</p><p>. Furthermore, the base value of a node tells which subset projection it represents in the corresponding level. Therefore, according to the HERO framework, the boolean AND operators only occur between nodes from two SIB-trees that locate on the same level and share the same base value.</p><p>Example 4.3. We follow the setting of &#119878;, &#119860;, &#119861;, and the space partition of &#119878; in Example 3.4. We first consider the bottom layer of &#119879; (&#119860;). According to the Example 3.4, the non-empty 1st-level subset projections of &#119860; are &#119860; 1 1 = {1}, &#119860; 1 2 = {4, 5}, &#119860; 1 3 = {7, 9}, &#119860; 1 8 = {22}. Hence, there are 4 nodes &#119879; 1  1 (&#119860;), &#119879; 1 2 (&#119860;), &#119879; 1 3 (&#119860;), and &#119879; 1 8 (&#119860;) in the bottom layer of &#119879; (&#119860;). The corresponding pairs of base value and bitmap value are (1, 001), (2, 011), (3, 101), and (8, 001), respectively. We then consider the 2nd layer. The non-empty 2nd-level subset projections of &#119860; are &#119860; 2 1 = {1, 2, 3}, and &#119860; 2 3 = {8}. Thus, the 2nd layer nodes are &#119879; 2 1 (&#119860;) and &#119879; 2 3 (&#119860;), together with the pairs of base value and bitmap value</p><p>(1, 111), and (3, 010), respectively. The 3rd level has only one subset projection &#119860; 3 1 = {1, 3}, which refers to the root node &#119879; 3 1 (&#119860;) with the base and bitmap value (1, 101). We deliver an illustration of &#119879; (&#119860;) in Figure <ref type="figure">5</ref>(a). The construction of &#119879; (&#119861;) is similar, so we skip the description and just show its illustration in Figure <ref type="figure">5(b)</ref>. &#9632; SIB-Tree Construction: Given a set &#119860;, we construct &#119879; (&#119860;) in a bottom-up manner, progressing layer by layer. To construct the lowest layer (leaf nodes) of &#119879; (&#119860;), it is necessary to compute subset projections &#119860; 1 &#119894; = &#119860; &#8745; &#119878; 1 &#119894; , where &#119894; ranges from 1 to &#119899; 1 . For each non-empty subset projection &#119860; 1 &#119894; , a leaf node &#119879; 1 &#119894; (&#119860;) should be generated based on the SIB-tree definition. To establish the &#119897;-th layer of &#119879; (&#119860;), where 2 &#8804; &#119895; &#8804; &#119889;, we need to follow the subsequent steps.</p><p>(1) Collect the base value of the nodes in the ( &#119895; -1)-th layer, where each node corresponds to a non-empty subset projection. Thus, all the base values in this layer constitute the support set Sup</p><p>&#119897; -1 &#119860; . (2) Project Sup &#119897; -1 &#119860; into the &#119897;-th level space partitions {&#119878; &#119897; &#119894; } &#119899; &#119897; &#119894;=1 , leading to the subset projections {&#119860; &#119897; &#119894; } &#119899; &#119897; &#119894;=1 . (3) For each non-empty subset projection &#119860; &#119897; &#119894; , a new node &#119879; &#119897; &#119894; (&#119860;) is created. For each &#119890; &#8712; &#119860; &#119897; &#119894; , we set &#119879; &#119897;</p><p>&#119894; (&#119860;) as the parent node of &#119879; &#119897; -1 &#119890; (&#119860;). Offline Complexity: For a wide range of graph analytics algorithms, the set intersection operations are conducted on the neighborhoods of several vertices. Therefore, we can construct an SIB-tree for the neighbors of each vertex during the offline stage. Since the total number of leaf nodes is no more than |&#119864;|, and the height of the SIB-tree is bounded by O (log |&#119864;|), the space complexity of such an offline computation is bounded by O (|&#119864;| log |&#119864;|). Furthermore, each SIB-tree node will be traversed at most once during the SIB-tree construction. Hence, the time complexity of the offline computation is also bounded by O (|&#119864;| log |&#119864;|).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">SIB-Tree Based Intersection</head><p>Algorithm 2 shows how to calculate the set intersection using the SIB-tree structure in a depth-first manner. The algorithm employs a recursive approach, utilizing a helper function that takes a pair of vertices, denoted as (&#119907; &#119886; , &#119907; &#119887; ), from &#119879; (&#119860;) and &#119879; (&#119861;) respectively, as input. These vertices share the same base value and depth (line 4).</p><p>Helper Function: The helper function first performs a boolean AND operation on the bitmap values of &#119907; &#119886; and &#119907; &#119887; (lines 5-6). Then, an iterator is created based on the shared base value of &#119907; &#119886; and &#119907; &#119887; , along with the result of the boolean AND operation (lines <ref type="bibr">[15]</ref><ref type="bibr">[16]</ref><ref type="bibr">[17]</ref><ref type="bibr">[18]</ref><ref type="bibr">[19]</ref><ref type="bibr">[20]</ref>. The subsequent steps depend on whether the input vertices &#119907; &#119886; and &#119907; &#119887; are leaf vertices. If they are, each element in the iterator is directly added to the intersection result (lines 8-9). However, if they are not leaf vertices, the helper function iterates through each element in the iterator. If the iterator is empty, it backtracks to the last uncompleted subtask. For each element, it identifies the child vertices of &#119907; &#119886; and &#119907; &#119887; (referred to as child &#119886; and child &#119887; , respectively) whose base value matches the given element from the iterator (lines 12-13). Finally, the helper function recursively executes itself with child &#119886; and child &#119887; as input parameters (line 14).</p><p>Iterator Function: A base value and a bitmap value actually encode a set given a word length &#119908;. This iterator function is used to enumerate the elements contained in the set it encodes. The process is depicted in Algorithm 2 (lines 15-20). More specifically, in each iteration, the minimum element contained in the encoded set is computed by line 18, and then the bitmap value is updated by setting the first "1" bit to "0" (line <ref type="bibr">19)</ref>.</p><p>Auxiliary Functions: Algorithm 2 includes two functions that require further clarification. The first function is FindChildNode(&#8226;). It takes a vertex &#119907; from the SIB-tree &#119879; and a query &#119901;&#119900;&#119904; as inputs.</p><p>Algorithm 2: SIB-Tree Based Intersection Input: &#119879; (&#119860;),&#119879; (&#119861;), word length &#119908;, and tree height &#8462;. Output: the result of intersection &#119877; = &#119860; &#8745; &#119861;. 1 res &#8592; &#8709; 2 Helper (&#119879; &#8462; 1 (&#119860;),&#119879; &#8462; 1 (&#119861;), res) 3 return res 4 Function Helper(&#119907; &#119886; , &#119907; &#119887; , res): 5 &#119887;&#119894;&#119905; &#119886; &#8592; &#119907; &#119886; .&#119887;&#119894;&#119905;, &#119887;&#119894;&#119905; &#119887; &#8592; &#119907; &#119887; .&#119887;&#119894;&#119905; 6 &#119887;&#119894;&#119905; &#8592; bit a &amp; bit b 7 base &#8592; &#119907; &#119886; .base 8 if &#119907; &#119886; and &#119907; &#119887; are leaf vertices then 9 res.extend(Iterator(base, &#119887;&#119894;&#119905;, &#119908;)) 10 else 11 for &#119894; in Iterator (base, &#119887;&#119894;&#119905;, &#119908;) do 12 child &#119886; &#8592;FindChildNode (&#119907; &#119886; , &#119894;) 13 child &#119887; &#8592;FindChildNode (&#119907; &#119887; , &#119894;) 14 Helper (child &#119886; , child &#119887; , res) 15 Function Iterator(base, &#119887;&#119894;&#119905;, &#119908;): It returns a child vertex of &#119907; whose base value matches &#119901;&#119900;&#119904;. The specific implementation details of FindChildNode(&#8226;) are dependent on the structure of the SIB-tree. The second function is Ffs(&#8226;), which is the acronym for "Find First Significant". Taking a bitmap as the input, the Ffs(&#8226;) function finds its lowest significant bit (i.e., the first "1" bit) and returns the corresponding index. With the help of Ffs(&#8226;) function, we can construct an iterator to enumerate the elements contained in the set encoded by a base value and bitmap value pair as discussed above. The GCC compiler provides efficient implementations of Ffs(&#8226;), such as the __builtin_ffs(&#8226;) for 32-bit unsigned integers and the __builtin_ffsll(&#8226;) for 64-bit unsigned integers. Discussion on On-the-fly Intersection: If the sets for intersection are given on the fly, we have to construct the SIB-trees online. However, as long as the SIB-tree index is constructed, the subsequent intersection calculation can be accelerated. Especially, for the sets that are frequently involved in the intersection tasks, it will be beneficial if the corresponding SIB-trees are constructed.</p><p>Trade-off Between Online and Offline Computations. As mentioned in Section 4.2, we would like to construct SIB-trees for the neighbors of all vertices offline if it is allowed. However, if the offline computation resource (either time or space) is limited, there is an alternative to construct SIB-trees only for a subset of vertices. Intuitively, vertices that are frequently involved in the graph set intersection tasks are recommended to be selected.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">GRAPH REORDERING</head><p>Section 4.2 recursively partitions/projects (support) sets onto subsets (partition ID subsets) by using the ordered partitioning as the space partition. Different space partitioning/projections may lead to different subsets, which in turn result in distinct SIB-tree performance. We find that different orderings of graph vertices may lead to different space partitions. Therefore, our basic idea of the graph reordering is to assign each vertex in the data graph &#119866; a new vertex ID, which equivalently reorders those vertices and obtains a better partitioning/projection strategy for the SIB-tree.</p><p>Example 5.1. We follow the settings of &#119878;, &#119860;, and &#119861; in Example 3.4. The reorder rule is illustrated at the top of Figure <ref type="figure">6</ref>. Each vertex in &#119878; will be assigned a new ID, and the reordered set will be sorted according to the new IDs. Sets &#119860; and &#119861; are {1, 2, 3, 4, 5, 6} and {1, 7, 8, 9, 10, 11} after reordering, respectively. We also present the SIB-trees of reordered sets &#119860; and &#119861; in the Figure <ref type="figure">6</ref>. The reordered SIB-trees of both sets &#119860; and &#119861; contain fewer tree nodes than the original SIB-trees in Figure <ref type="figure">5 (4 v</ref>.s. 7, and 6 v.s. 8). Furthermore, the intersection task &#119860; &#8745; &#119861; needs to handle only 3 subtasks based on new SIB-trees, which gains a significant improvement over the original order that has 5 subtasks.&#9632; In fact, arbitrary partitioning can be exploited to build the SIB-tree only if it can produce equalsized partitions &#119878; 1 , &#119878; 2 , . . . , and &#119878; &#119899; . Any such a partitioning can obtained by performing the graph reordering and ordered partitioning. More specifically, we derive the vertex ID assignment rule &#120601; : Z &#8594; Z, where for each vertex &#119906;'s ID &#119906;.&#119868; &#119863; &#8712; &#119878; &#119894; it holds that &#120601; (&#119906;) = &#119908; &#8226; (&#119894; -1) + |{&#119907; &#8712; &#119878; &#119894; |&#119907; .&#119868; &#119863; &#8804; &#119906;.&#119868;&#119863; }|. We can find that {&#120601; (&#119907;)|&#119907; &#8712; &#119878; &#119894; } = { &#119895; &#8712; Z|&#119908; &#8226; (&#119894; -1) &lt; &#119895; &#8804; &#119908; &#8226; &#119894;}. In other words, if we conduct the ordered partitioning according to new vertex IDs, the vertices contained in any &#119878; &#119894; will still be contained in the same partition. Hence, the new partitioning results are equivalent to the original partitions.</p><p>In this section, we aim to develop a novel graph reordering method to optimize the performance of the proposed HERO approach. As discussed in Section 4.3, the time complexity of Algorithm 2 is bounded by the size of the SIB-tree as each vertex in the SIB-tree will be traversed at most once. Thus, the objective for the graph reordering task is to minimize the total size of the SIB-tree for all vertices &#119907; in the graph &#119866; = (&#119881; , &#119864;). This target serves as a starting point in our approach</p><p>1 2 4 5 7 9 10 17 18 22 24 1 7 2 3 4 5 8 9 10 6 11 111 000 000 000 000 000 000 110 000 000 100 1 3 1 2 1 1 2 1 000 000 000 000 000 000 100 110 000 110 1 3 1 2 1 1 4 1 5 1 000 2 2 111 001 111 011</p><p>SIB-tree of reordered set A SIB-tree of reordered set B</p><p>3 12 6 13 8 14 10 11 12 13 8 15 16 17 14 15 16 19 18 19 20 21 20 21 22 23 23 24 25 25 26 26 27 27 Original ID New ID Reordered set A = 1,2,3,4,5,6 Reordered set B = 1,7,8,9,10,11 and can be mathematically formulated as follows:</p><p>where |&#119879; (N (&#119907;))| is the size of the SIB-tree &#119879; (N (&#119907;)). For simplicity, we would like to abbreviate &#119879; (N (&#119907;)) as &#119879; (&#119907;). Please note that the SIB-tree of the set which contains a single element &#119907; is denoted as &#119879; ({&#119907; }). Thus, the abbreviation will not lead to ambiguity.</p><p>Although we can consider different graph orders that lead to distinct SIB-trees and different intersection performance, it is computationally expensive to determine the optimal graph order (e.g., if we construct an SIB-tree for each graph order). To address this problem, we propose a hierarchical balanced graph partitioning (HBGP) strategy, which helps us compute the size of the SIB-tree index without constructing it. Next, we first introduce the definition of HBGP in Section 5.1. Then, we will describe how to compute the size of the SIB-tree index through HBGP, followed by the hardness analysis in Section 5.2. Finally, we develop an efficient heuristic algorithm to solve the HBGP problem in Section 5.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Hierarchical Balanced Graph Partitioning</head><p>Balanced Graph Partitioning: Given a graph &#119866; = (&#119881; , &#119864;) and an integer &#119899;, the classical balanced graph partitioning (BGP) problem asks to split the vertices &#119881; into &#119899; disjoint and equal-sized subsets &#119875; 1 , . . . , &#119875; &#119899; such that the edge cut is minimized, where the edge cut is the number of the edges whose two vertices belong to different subsets <ref type="bibr">[2]</ref>. The BGP is known as an NP-complete problem <ref type="bibr">[2]</ref>.</p><p>In BGP, the optimization target is to minimize the number of crossing edges, which can be reformulated as:</p><p>By denoting the cost of a subset &#119875; &#119894; as cost(&#119875; &#119894; ) =</p><p>Algorithm 3: General Greedy Paradigm for BGP Input: Graph &#119866; = (&#119881; , &#119864;), number of partitions &#119899; Output: The partition result P.</p><p>12 return P we can rewrite the cost function of BGP as a general form</p><p>where P = {&#119875; 1 , . . . , &#119875; &#119899; }. Therefore, by defining different partitioning costs, it is easy to derive a wide class of generalized BGP. Note that we say the partitioning cost satisfies the locality if the cost of a partition is determined only by itself and is independent of other partitions, such that the cost of a partition can be computed during the construction procedure. Thus, the generalized BGP can be approximately solved by using a general greedy paradigm, as shown in Algorithm 3.</p><p>It sequentially constructs the subsets following the greedy heuristic. Each subset &#119875; &#119894; is initialized as an empty set (line 2), and a candidate set &#119862; is initialized as &#119881; (line 3). The construction of each subset &#119875; &#119894; is an iterative procedure (lines 5-11). In each iteration, a vertex in the candidate set &#119862; is selected (line 9) and inserted into the subset (line 11), such that the cost of the subset after insertion is maximized. Meanwhile, the vertex will be removed from the candidate set &#119862; once it has been selected (line 10).</p><p>Hierarchical Balanced Graph Partitioning: The graph partitioning problems above just divide the vertices through a unified granularity, however, it is required to partition the graph hierarchically in some cases. The hierarchy can be comprehended from two different perspectives. The first one is from top-down, that is, first dividing the graph into coarse-grained partitions and then dividing each partition into finer-grained partitions recursively. The second one is from bottom-up, that is, first dividing the graph into fine-grained partitions and then merging the partitions into coarser-grained partitions recursively. The partitioning result of HBGP is a balanced tree, where each node refers to a subset of the vertices, i.e., the union of the subsets referred to by its child nodes.</p><p>Definition 5.2 (Hierarchical Balanced graph partitioning problem, HBGP). Given a graph &#119866; = (&#119881; , &#119864;) and a width parameter &#119908;, a &#8462;-level HBGP aims at partitioning &#119881; into &#8462; groups of subsets</p><p>(2) For &#119897; = 1 to &#8462;, {&#119875;</p><p>&#119897; &#119894; } &#119899; &#119897; &#119894;=1 is a group of disjoint subsets of |&#119881; |, where each subset &#119875; &#119897; &#119894; has no more than &#119908; &#8226; &#119895; vertices. (3) For &#119897; = 2 to &#8462;, each subset in {&#119875; &#119897; &#119894; } &#119899; &#119897; &#119894;=1 is the union of &#119908; subsets in {&#119875; &#119897; -1 &#119894; } &#119899; &#119897; -1 &#119894;=1 at most. (4) The total cost L (P) = &#8462; &#119897;=1 &#119899; &#119897; &#119894;=1 cost(&#119875; &#119897; &#119894; ) is minimized, where cost(&#119875; &#119897; &#119894; ) is defined as | &#119906; &#8712;&#119875; &#119897; &#119894; N (&#119906;)|. Similar to the BGP, it is convenient to represent a result of HBGP by reordering the graph. That is, for each subset &#119875; &#119897; &#119894; , we always assign the ID of its vertices from 1 + &#119908; &#119889;+1-&#119895; &#8226; (&#119894; -1) to &#119908; &#119889;+1-&#119895; &#8226; &#119894;. 5.2 Computing SIB-Tree Size via HBGP Given a hierarchical partitioning P, we can build a bipartite graph B = (B &#119897; , B &#119903; , &#119864; B ) as follows. For each vertex &#119906; in &#119866;, it will be contained in the left part B &#119897; . While for each subset &#119875; &#119897; &#119894; in P, we abstract it as a vertex in the right part B &#119903; . For each pair of vertices (&#119906;, &#119875; &#119897; &#119894; ), we add an edge between them if and only if the intersection of the corresponding subset &#119875; &#119897; &#119894; and N (&#119906;) is non-empty. Let &#119879; &#119897; &#119894; (&#119906;) &#8712; &#119879; (&#119906;) denote that there exists a node in the &#119897;-th layer of SIB-tree &#119879; (&#119906;) such that its base value equals &#119894;. Lemma 5.3. Given a graph &#119866; = (&#119881; , &#119864;), a hierarchical partitioning P and the corresponding bipartite graph &#119860; = (B &#119897; , B &#119903; , &#119864; B ), for each &#119906; &#8712; B &#119897; and &#119901; &#119897; &#119894; &#8712; B &#119903; , it holds that (&#119906;, &#119901; &#119897; &#119894; ) &#8712; &#119864; B if and only if &#119879; &#119897; &#119894; (&#119906;) &#8712; &#119879; (&#119906;).</p><p>Proof. We just give a proof sketch of mathematical induction. For each &#119879; &#119897; &#119894; (&#119906;) &#8712; &#119879; (&#119906;), the target is to prove that N (&#119906;) contains at least one vertex in the &#119875; &#119897; &#119894; . It is trivial for &#119897; = 1. For the case &#119897; &#8805; 2, the key idea lies in that &#119879; &#119897; &#119894; (&#119906;) &#8712; &#119879; (&#119906;) is equivalent to &#119879; &#119897; -1 &#119894; &#8242; &#8712; &#119879; (&#119906;) for at least one of the &#119908; (&#119894; -1) &lt; &#119894; &#8242; &#8804; &#119908;&#119894;. Then, combining the proved case of &#119897; -1, the correctness of case &#119897; is achieved. &#9633; </p><p>Proof. The proof can be achieved by counting the edges in the bipartite graph B in two different ways, i.e., summing up the degrees of all vertices in the left and right parts. Lemma 5.3 implies that the sum of degrees of the left part equals L (&#119866;). According to the definition of edges in B, the sum of degrees of the right part equals L (P). Hence, the theorem is proved. &#9633; Theorem 5.4 implies that optimizing the graph ordering is as difficult as solving the HBGP problem. which is NP-hard as shown next.</p><p>Theorem 5.5. The hierarchical graph partitioning problem cannot be solved in polynomial time unless P = NP.</p><p>Proof. The proof can be achieved by reducing from the 3-partition problem, which is known as an NP-complete problem <ref type="bibr">[13]</ref>, to the HBGP problem. For each 3-partition problem instance &#119875;1, we can construct a corresponding HBGP instance &#119875;2. Then, determining whether &#119875; 1 has feasible solutions or not according to the solution of &#119875;2 will be given, and its correctness will be proved. In this way, we can prove that HBGP is not easier than the 3-partition problem, which implies that HBGP is NP. &#9633; Due to space limitations, we will provide complete proofs of lemmas/theorems in a technical report after the paper publication (for the double-blind reason) and omit them here. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Heuristic Algorithm</head><p>Since the hierarchical balanced graph partitioning problem is intractable, we propose an efficient greedy heuristic algorithm. Based on the HBGP definition, a naive heuristic idea is to divide the HBGP problem into several BGP problems and solve them orderly, either top-down or bottom-up. In this paper, we adopt the top-down heuristic. The reason behind this is that the top-down heuristic prioritizes minimizing the number of high-level nodes in the SIB-tree. Consequently, space pruning is more likely to occur at the high-level intersection subtasks. The pseudocode of the heuristic algorithm is outlined in Algorithm 4, where the cost function in Line 15 is defined as</p><p>For the procedure Reorder(left, right, &#119908;), we would like to store the set of candidate vertices &#119862; in a heap. Hence, each greedy selection (Line 13 and Line 15 in Algorithm 4) equals popping an element from the heap. The time complexity of a pop operation is O (log |&#119862; |). However, for each vertex &#119907; which is still in the heap, we need to update the cost(&#119875; &#8746; {&#119907; }) according to the Equation ( <ref type="formula">8</ref>) such that we can correctly pop the next element. The fact is that the update procedure dominates the time complexity of Algorithm 4. Consequently, we need to design an efficient way for the update. Consider line 15 in Algorithm 4, an equivalent way to make the greedy selection is computing the increment of the cost. That is, by denoting &#916;(&#119907; |&#119875;) = cost(&#119875; &#8746; {&#119907; })cost(&#119875;), we can easily in C++ and compiled by GCC v7.3.0 with -O3 option. Anonymous source code and data are available at: <ref type="url">https:// anonymous.4open.science/ r/ HEROFramework-126E/</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Accelerating Set Intersections</head><p>Following the previous research <ref type="bibr">[31]</ref>, we use two groups of query sets for global intersection and local intersection for each graph. The global intersection group contains 100, 00 pairs of randomly selected vertices to compute their common neighbors. The local intersection group contains 10, 000 pairs of randomly selected adjacent vertices to compute common neighbors. The results of global intersection and local intersection are presented in Figure <ref type="figure">7</ref> and Table <ref type="table">3</ref>. Speedup of our approach. To evaluate the performance of the proposed SIB method, we compare it with the merge intersection and three state-of-the-art algorithms, that is, the compressed bitmap method Roaring <ref type="bibr">[6]</ref>, the SIMD-based method QFilter <ref type="bibr">[14]</ref>, and the reducing-merging method Range <ref type="bibr">[31]</ref>. We report the speedups compared to the merge intersection as shown in Figure <ref type="figure">7</ref>, where the elapsed time of the merge intersection is taken as the baseline with speedup=1.0x. We find that the SIB method consistently outperforms three baselines on 8 of 10 graphs for both global intersection and local intersection tasks. Specifically, on the graph web-BerkStan, the SIB method achieves significant speedups, up to 181.00x and 83.33x on global and local intersection tasks, respectively. Averaged on 10 graphs, the SIB method accelerates the global and local intersection tasks by 26.50x and 16.17x, respectively. We show the running time of the merge intersection algorithm in Table <ref type="table">4</ref> (the columns "GI" and "LI" refer to global and local intersections, respectively). The running time of other algorithms can be easily inferred through the corresponding speedups.  The number of comparisons. Table <ref type="table">3</ref> reports how many comparisons each intersection method demands on the global and local intersection tasks. For the merge intersection method, one comparison just refers to comparing a pair of vertices. For our proposed SIB method, one comparison refers to invoking the boolean AND operator once. Although the specific comparison operations are different, we find that there is a significant gap in the number of comparisons. For both global intersection and local intersection tasks, the merge intersection takes 10x-100x comparisons more than our proposed SIB, which may explain why the SIB method is much faster than the merge intersection method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Accelerating Graph Algorithms</head><p>In this section, we evaluate the performance of SIB method through three graph algorithms, that is triangle counting (TC), maximal clique enumeration (MC), and subgraph listing (SL). For each task, we report the speedups compared with the merge intersection method, where the merge intersection is taken as the baseline with speedup= 1.0x. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">Evaluating Different Graph Orderings</head><p>In this section, we explore how different graph orderings benefit the SIB approach. In addition to the proposed ordering HBGP, we adopt 6 existing graph orderings, including the original order, BFSR <ref type="bibr">[3]</ref>, DFS <ref type="bibr">[26]</ref>, Hybrid <ref type="bibr">[1]</ref>, MLOGGAPA <ref type="bibr">[11]</ref>, and SlashBurn <ref type="bibr">[22]</ref> as competitors, since they are often used for the graph set intersection in previous studies like QFilter <ref type="bibr">[14]</ref> and Range <ref type="bibr">[31]</ref>.</p><p>We report the performance of the ordering through two metrics, that is, the size of the SIB-tree index and the speedup of SIB method on TC, MC, and SL algorithms. Note it is extremely expensive to enumerate MC on the whole graph, we enumerate MC on 1% vertices for each graph in this section.</p><p>Effect on the size of SIB-tree. The size information of the SIB-tree index is depicted in Figure <ref type="figure">9</ref>(a). For simplicity of comparison, we scale the data by the size of the SIB-tree index in original order. We first consider the size of the SIB-tree index, which evaluates how well the ordering solves the graph reorder problem proposed in Section 5. As shown in Figure <ref type="figure">9</ref>(a), our proposed HBGP-based ordering always constructs the SIB-tree index with minimal size among all the compared graph orderings. We also find that the performance of each order becomes pretty similar when the average degree of the graph is small. This phenomenon is intuitive because the small degree implies a corresponding small optimization space. Effect on the speedups. As for the speedups of three graph algorithms, we report each ordering's average speedup over 10 graphs in Table <ref type="table">6</ref>. For detailed speedup on each graph, the results are shown in Figures 9(b), 9(c), and 9(d), respectively. Since the final target of reordering the graph is to improve the performance of SIB method, we then focus on the speedups brought by different orderings. According to the average speedups shown in Table <ref type="table">6</ref>, our proposed HBGP-based ordering beats all the ordering baselines in the TC and MC tasks. As for the SL task, our proposed HBGP ordering performs second only to the MLOGGAPA ordering. More specifically, the HBGP-based ordering performs the best on all 10 graph datasets in the TC task. Similarly, except for the MLOGGAPA multi-level intersection framework using bitmap and boolean AND operation. Consequently, the intersection algorithm can totally get rid of the merge intersection framework.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Proc. ACM Manag. Data, Vol. 2, No. 1 (SIGMOD), Article 29. Publication date: February 2024.</p></note>
		</body>
		</text>
</TEI>
