skip to main content


Title: NWGraph: A Library of Generic Graph Algorithms and Data Structures in C++20
The C++ Standard Library is a valuable collection of generic algorithms and data structures that improves the usability and reliability of C++ software. Graph algorithms and data structures are notably absent from the standard library, and previous attempts to fill this gap have not gained widespread adoption. In this paper we show that the richness of graph algorithms and data structures can in fact be captured by straightforward composition of existing C++ mechanisms. Generic programming is algorithm-oriented. Accordingly, we apply a systematic approach to analyzing a broad set of graph algorithms, "lift" unnecessary constraints from them, and organize the resulting set of minimal common type requirements, i.e., concepts, for defining their interfaces. By using the newly available ranges and concepts in C++20, the type requirements for generic graph algorithms can be succinctly expressed. The generic algorithms and data structures resulting from our analysis are realized in NWGraph, a modern, composable, and extensible C++ library.  more » « less
Award ID(s):
1716828
NSF-PAR ID:
10381434
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Ali, Karim; Vitek, Jan
Date Published:
Journal Name:
Leibniz international proceedings in informatics
Volume:
222
ISSN:
1868-8969
Page Range / eLocation ID:
31:1--31:28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this article, we show how a single function,join, can be used to implement parallelbalanced binary search trees(BSTs) simply and efficiently. Based onjoin, our approach applies to multiple balanced tree data structures, and a variety of functions for ordered sets and maps. We describe our technique as an algorithmic framework calledjoin-based algorithms. We show that thejoinfunction fully captures what is needed for rebalancing trees for a variety of tree algorithms, as long as the balancing scheme satisfies certain properties, which we refer to asjoinabletrees. We discuss four balancing schemes that are joinable: AVL trees, red-black trees, weight-balanced trees, and treaps. We present a variety of tree algorithms that apply to joinable trees, includinginsert,delete,union,intersection,difference,split,range,filter, and so on, most of them also parallel. These algorithms are generic across balancing schemes. Many algorithms are optimal in the comparison model, and we provide a general proof to show the efficiency in work for joinable trees. The algorithms are highly parallel, all with polylogarithmic span (parallel dependence). Specifically, the set-set operationsunion,intersection, anddifferencehave work\( O(m\log (\frac{n}{m}+1)) \)and polylogarithmic span for input set sizes\( n \)and\( m\le n \).

    We implemented and tested our algorithms on the four balancing schemes. In general, all four schemes have quite similar performance, but the weight-balanced tree slightly outperforms the others. They have the same speedup characteristics, getting around 73\( \times \)speedup on 72 cores (144 hyperthreads). Experimental results also show that our implementation outperforms existing parallel implementations, and our sequential version achieves close or much better performance than the sequential merging algorithm in C++ Standard Template Library (STL) on various input sizes.

     
    more » « less
  2. In many applications of graph processing, the input data is often generated from an underlying geometric point data set. However, existing high-performance graph processing frameworks assume that the input data is given as a graph. Therefore, to use these frameworks, the user must write or use external programs based on computational geometry algorithms to convert their point data set to a graph, which requires more programming effort and can also lead to performance degradation. In this paper, we present our ongoing work on the Geo- Graph framework for shared-memory multicore machines, which seamlessly supports routines for parallel geometric graph construction and parallel graph processing within the same environment. GeoGraph supports graph construction based on k-nearest neighbors, Delaunay triangulation, and b-skeleton graphs. It can then pass these generated graphs to over 25 graph algorithms. GeoGraph contains highperformance parallel primitives and algorithms implemented in C++, and includes a Python interface. We present four examples of using GeoGraph, and some experimental results showing good parallel speedups and improvements over the Higra library. We conclude with a vision of future directions for research in bridging graph and geometric data processing. 
    more » « less
  3. null (Ed.)
    The Twitter-Based Knowledge Graph for Researchers project is an effort to construct a knowledge graph of computation-based tasks and corresponding outputs. It will be utilized by subject matter experts, statisticians, and developers. A knowledge graph is a directed graph of knowledge accumulated from a variety of sources. For our application, Subject Matter Experts (SMEs) are experts in their respective non-computer science fields, but are not necessarily experienced with running heavy computation on datasets. As a result, they find it difficult to generate workflows for their projects involving Twitter data and advanced analysis. Workflow management systems and libraries that facilitate computation are only practical when the users of these systems understand what analysis they need to perform. Our goal is to bridge this gap in understanding. Our queryable knowledge graph will generate a visual workflow for these experts and researchers to achieve their project goals. After meeting with our client, we established two primary deliverables. First, we needed to create an ontology of all Twitter-related information that an SME might want to answer. Secondly, we needed to build a knowledge graph based on this ontology and produce a set of APIs to trigger a set of network algorithms based on the information queried to the graph. An ontology is simply the class structure/schema for the graph. Throughout future meetings, we established some more specific additional requirements. Most importantly, the client stressed that users should be able to bring their own data and add it to our knowledge graph. As more research is completed and new technologies are released, it will be important to be able to edit and add to the knowledge graph. Next, we must be able to provide metrics about the data itself. These metrics will be useful for both our own work, and future research surrounding graph search problems and search optimization. Additionally, our system should provide users with information regarding the original domain that the algorithms and workflows were run against. That way they can choose the best workflow for their data. The project team first conducted a literature review, reading reports from the CS5604 Information Retrieval courses in 2016 and 2017 to extract information related to Twitter data and algorithms. This information was used to construct our raw ontology in Google Sheets, which contained a set of dataset-algorithm-dataset tuples. The raw ontology was then converted into nodes and edges csv files for building the knowledge graph. After implementing our original solution on a CentOS virtual machine hosted by the Virginia Tech Department of Computer Science, we transitioned our solution to Grakn, an open-source knowledge graph database that supports hypergraph functionality. When finalizing our workflow paths, we noted some nodes depended on completion of two or more inputs, representing an ”AND” edge. This phenomenon is modeled as a hyperedge with Grakn, initiating our transition from Neo4J to Grakn. Currently, our system supports queries through the console, where a user can type a Graql statement to retrieve information about data in the graph, from relationships to entities to derived rules. The user can also interact with the data via Grakn's data visualizer: Workbase. The user can enter Graql queries to visualize connections within the knowledge graph. 
    more » « less
  4. Abstract—Materials Genomics initiative has the goal of rapidly synthesizing materials with a given set of desired properties using data science techniques. An important step in this direction is the ability to predict the outcomes of complex chemical reactions. Some graph-based feature learning algorithms have been proposed recently. However, the comprehensive relationship between atoms or structures is not learned properly and not explainable, and multiple graphs cannot be handled. In this paper, chemical reaction processes are formulated as translation processes. Both atoms and edges are mapped to vectors represent- ing the structural information. We employ the graph convolution layers to learn meaningful information of atom graphs, and further employ its variations, message passing networks (MPNN) and edge attention graph convolution network (EAGCN) to learn edge representations. Particularly, multi-view EAGCN groups and maps edges to a set of representations for the properties of the chemical bond between atoms from multiple views. Each bond is viewed from its atom type, bond type, distance and neighbor environment. The final node and edge representations are mapped to a sequence defined by the SMILES of the molecule and then fed to a decoder model with attention. To make full usage of multi-view information, we propose multi-view attention model to handle self correlation inside each atom or edge, and mutual correlation between edges and atoms, both of which are important in chemical reaction processes. We have evaluated our method on the standard benchmark datasets (that have been used by all the prior works), and the results show that edge embedding with multi-view attention achieves superior accuracy compared to existing techniques. 
    more » « less
  5. This research investigates the design of structurally performant, lightweight architectural elements produced through concrete 3D printing (C3DP). Traditionally, concrete requires dense and sturdy formwork, whose production adds significantly to the total cost and results in massive and heavy parts after demolding. C3DP offers the unique opportunity to both eliminate the need for formwork and to create lighter parts by introducing internal voids and cavities. The advent of additive manufacturing in a broad range of scales, materials, industries, and applications, led to increased interest and intense research into different types of porous structures, their geometry, and structural performance under various boundary conditions. Precise control over the sparse distribution of material allows not only for parts with similar strength at reduced mass but even for modifications of mechanical properties, like turning brittle materials into elastic or shock-absorbent ones. While with powder-based additive manufacturing processes like metal 3D printing, truss-based lattices have become very popular for the light-weighting of parts or to provide tissue growth scaffolds for medical implants, their geometry – a sparse space frame resulting in numerous individual contour islands and accentuated overhangs – cannot as easily be produced by C3DP, which is based on a continuous material extrusion. Alternative types of micro-structures, so-called triply periodic minimal surfaces (TPMS), are better suited for this process as they are, as their name suggests, consisting of one continuous surface dividing space into two separate but interwoven subspaces. TPMS are therefore very popular for the efficient design of heat exchangers. We develop and present a continuous and integrated workflow, in which the architectural elements and their structural requirements are designed through transitioning back and forth between the force and the form diagram using 3D graphic statics [1]. The members and their topology from the abstract graph of the conceptual form diagram are seamlessly connected to the volumetric modeling (VM) framework, responsible for the definition of the part geometry [2]. VM represents form assigned distance functions (SDF) and can easily handle complex topologies and flawless Boolean operations of not only the outer shell geometry but also the internal micro-structural infill patterns (Fig. 1, a). In an iterative feedback loop, the infill can be further optimized to leave the material only along certain internal stress trajectories (force flows). This functional grading controlling the relative density is done based on the FE analysis results. The stress distribution is thereby defined as a three-dimensional field (Fig. 1, b). Its values can factor into the SDF equation and be used to modify the wavelength (periodicity) of the TPMS, the local thickness of the surface shell, the solid to void fraction by shifting the threshold iso-value or even the alignment and orientation of the unit cells (Fig. 1, c). They can be arranged in an orthogonal, polar- or even spherical coordinate system to optimally adapt to structural necessities. The TPMS pattern can also gradually transition from one type into another type along the gradient of a spatial function. 
    more » « less