skip to main content

Title: GraphTrans: A Software System for Network Conversions for Simulation, Structural Analysis, and Graph Operations
Network representations of socio-physical systems are ubiquitous, examples being social (media) networks and infrastructure networks like power transmission andwater systems. The many software tools that analyze and visualize networks, and carry out simulations on them, require different graph formats. Consequently, it is important to develop software for converting graphs that are represented in a given source format into a required representation in a destination format. For network-based computations, graph conversion is a key capability that facilitates interoperability among software tools. This paper describes such a system called GraphTrans to convert graphs among different formats. This system is part of a new cyberinfrastructure for network science called net.science. We present the GraphTrans system design and implementation, results from a performance evaluation, and a case study to demonstrate its utility.
Authors:
; ; ; ;
Award ID(s):
1916670
Publication Date:
NSF-PAR ID:
10310253
Journal Name:
Winter Simulation Conference
Sponsoring Org:
National Science Foundation
More Like this
  1. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learningmore »approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies.« less
  2. We describe a software system called ExecutionManager (abbreviated EM) that controls the execution of third-party software (TPS) for analyzing networks. Based on a configuration file that contains a specification for the execution of each TPS, the system launches any number of stand-alone TPS codes, if the projected execution time and the graph size are within user-imposed limits. A system capability is to estimate the running time of a TPS code on a given network through regression analysis, to support execution decision-making by EM. We demonstrate the usefulness of EM in generating network structure parameters and distributions, and in extracting meta-datamore »information from these results. We evaluate its performance on directed and undirected, simple and multi-edge graphs that range in size over seven orders of magnitude in numbers of edges, up to 1.5 billion edges. The software system is part of a cyberinfrastructure called net.science for network science.« less
  3. Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation, and data movements. One goal of data engineering is to transform data from original data to vector/matrix/tensor formats accepted by deep learning and machine learning applications. There are many structures such as tables, graphs, and trees to represent data in these data engineering phases. Among them, tables are a versatile and commonly used format to load and process data. In this paper, we present a distributed Pythonmore »API based on table abstraction for representing and processing data. Unlike existing state-of-the-art data engineering tools written purely in Python, our solution adopts high performance compute kernels in C++, with an in-memory table representation with Cython-based Python bindings. In the core system, we use MPI for distributed memory computations with a data-parallel approach for processing large datasets in HPC clusters.« less
  4. Intrusion detection systems are a commonly deployed defense that examines network traffic, host operations, or both to detect attacks. However, more attacks bypass IDS defenses each year, and with the sophistication of attacks increasing as well, we must examine new perspectives for intrusion detection. Current intrusion detection systems focus on known attacks and/or vulnerabilities, limiting their ability to identify new attacks, and lack the visibility into all system components necessary to confirm attacks accurately, particularly programs. To change the landscape of intrusion detection, we propose that future IDSs track how attacks evolve across system layers by adapting the concept ofmore »attack graphs. Attack graphs were proposed to study how multi-stage attacks could be launched by exploiting known vulnerabilities. Instead of constructing attacks reactively, we propose to apply attack graphs proactively to detect sequences of events that fulfill the requirements for vulnerability exploitation. Using this insight, we examine how to generate modular attack graphs automatically that relate adversary accessibility for each component, called its attack surface, to flaws that provide adversaries with permissions that create threats, called attack states, and exploit operations from those threats, called attack actions. We evaluate the proposed approach by applying it to two case studies: (1) attacks on file retrieval, such as TOCTTOU attacks, and (2) attacks propagated among processes, such as attacks on Shellshock vulnerabilities. In these case studies, we demonstrate how to leverage existing tools to compute attack graphs automatically and assess the effectiveness of these tools for building complete attack graphs. While we identify some research areas, we also find several reasons why attack graphs can provide a valuable foundation for improving future intrusion detection systems.« less
  5. In recent years, smart grid communications (SGC) has evolved to use new technologies not only for data delivery but also for enhanced smart grid (SG) security and reliability. Software Defined Networks (SDN) has proved to be a reliable and efficient architecture for handling diverse communication systems due to their ability to divide responsibilities of the network using control plane and data plane. This paper presents a graph learning approach for detecting and identifying Distributed Denial of Service (DDoS) attacks in SDN-SGC systems (GLASS). GLASS is a two phase framework that (1) detects if SDN-SGC is under DDoS attack using supervisedmore »graph deep learning and then (2) identifies the compromised entities using unsupervised learning methods. Network performance statistics are used for modeling SDN-SGC graphs, which train Graph Convolutional Neural Networks (GCN) to extract latent representations caused by DDoS attacks. Finally, spectral clustering is used to identify compromised entities. The experimental results, obtained by analysis of an IEEE 118-bus system, show the average throughput for compromised entities is able to maintain 84% of normal traffic level with GLASS, compared to achieving only 4% of normal throughput caused by DDoS attacks on compromised entities without the GLASS framework.« less