skip to main content

Title: ExecutionManager: A Software System to Control Execution of Third-Party Software That Performs Network Computations
We describe a software system called ExecutionManager (abbreviated EM) that controls the execution of third-party software (TPS) for analyzing networks. Based on a configuration file that contains a specification for the execution of each TPS, the system launches any number of stand-alone TPS codes, if the projected execution time and the graph size are within user-imposed limits. A system capability is to estimate the running time of a TPS code on a given network through regression analysis, to support execution decision-making by EM. We demonstrate the usefulness of EM in generating network structure parameters and distributions, and in extracting meta-data information from these results. We evaluate its performance on directed and undirected, simple and multi-edge graphs that range in size over seven orders of magnitude in numbers of edges, up to 1.5 billion edges. The software system is part of a cyberinfrastructure called for network science.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Winter Simulation Conference
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents Rolis, a new speedy and fault-tolerant replicated multi-core transactional database system. Rolis's aim is to mask the high cost of replication by ensuring that cores are always doing useful work and not waiting for each other or for other replicas. Rolis achieves this by not mixing the multi-core concurrency control with multi-machine replication, as is traditionally done by systems that use Paxos to replicate the transaction commit protocol. Instead, Rolis takes an "execute-replicate-replay" approach. Rolis first speculatively executes the transaction on the leader machine, and then replicates the per-thread transaction log to the followers using a novel protocol that leverages independent Paxos instances to avoid coordination, while still allowing followers to safely replay. The execution, replication, and replay are carefully designed to be scalable and have nearly zero coordination overhead across cores. Our evaluation shows that Rolis can achieve 1.03M TPS (transactions per second) on the TPC-C workload, using a 3-replica setup where each server has 32 cores. This throughput result is orders of magnitude higher than traditional software approaches we tested (e.g., 2PL), and is comparable to state-of-the-art, fault-tolerant, in-memory storage systems built using kernel bypass and advanced networking hardware, even though Rolis runs on commodity machines. 
    more » « less
  2. Wedescribeasoftwaresystemcalled ExecutionManager (abbreviated EM) thatcontrolstheexecutionof third-party software(TPS)foranalyzingnetworks.Basedonaconfigurationfilethatcontainsaspecification for theexecutionofeachTPS,thesystemlaunchesanynumberofstand-aloneTPScodes,iftheprojected executiontimeandthegraphsizearewithinuser-imposedlimits.Asystemcapabilityistoestimate the runningtimeofaTPScodeonagivennetworkthroughregressionanalysis,tosupportexecution decision-making by EM. Wedemonstratetheusefulnessof EM in generatingnetworkstructureparameters and distributions,andinextractingmeta-datainformationfromtheseresults.Weevaluateitsperformance on directedandundirected,simpleandmulti-edgegraphsthatrangeinsizeoversevenordersofmagnitude in numbersofedges,upto1.5billionedges.Thesoftwaresystemispartofacyberinfrastructurecalled for networkscience. 
    more » « less
  3. Bitslicing is a software implementation technique that treats an N-bit processor datapath as N parallel single-bit datapaths. Bitslicing is particularly useful to implement data-parallel algorithms, algorithms that apply the same operation sequence to every element of a vector. Indeed, a bit-wise processor instruction applies the same logical operation to every single-bit slice. A second benefit of bitsliced execution is that the natural spatial redundancy of bitsliced software can support countermeasures against fault attacks. A k-redundant program on an N-bit processor then runs as N/k parallel redundant slices. In this contribution, we combine these two benefits of bitslicing to implement a fault countermeasure for the number-theoretic transform (NTT). The NTT eiciently implements a polynomial multiplication. The internal symmetry of the NTT algorithm lends itself to a data-parallel implementation, and hence it is a good candidate for the redundantly bitsliced implementation. We implement a redundantly bitsliced NTT on an advanced 667MHz ARM Cortex-A9 processor, and study the fault coverage for the protected NTT under optimized electromagnetic fault injection (EMFI). Our work brings two major contributions. First, we show for the irst time how to develop a redundantly bitsliced version of the NTT. We integrate the protected NTT into a full Dilithium signature sequence. Second, we demonstrate an EMFI analysis on a prototype implementation of the Dilithium signature sequence on ARM Cortex-M9. We perform a detailed EM fault-injection parameter search to optimize the location, intensity and timing of injected EM pulses. We demonstrate that, under optimized fault injection parameters, about 10% of the injected faults become potentially exploitable. However, the redundantly bitsliced NTT design is able to catch the majority of these potentially exploitable faults, even when the remainder of the Dilithium algorithm as well as the control low is left unprotected. To our knowledge, this is the irst demonstration of a bitslice-redundant design of the NTT that offers distributed fault detection throughout the execution of the algorithm. 
    more » « less
  4. As data analytics applications become increasingly important in a wide range of domains, the ability to develop large-scale and sustainable platforms and software infrastructure to support these applications has significant potential to drive research and innovation in both science and business domains. This paper characterizes performance and power-related behavior trends and tradeoffs of the two predominant frameworks for Big Data analytics (i.e., Apache Hadoop and Spark) for a range of representative applications. It also evaluates system design knobs, such as storage and network technologies and power capping techniques. Experimental results from empirical executions provide meaningful data points for exploring the potential of software-defined infrastructure for Big Data processing systems through simulation. The results provide better understanding of the design space to build multi-criteria application-centric models as well as show significant advantages of software-defined infrastructure in terms of execution time, energy and cost. It motivates further research focused on in-memory processing formulations regarding systems with deeper memory hierarchies and software-defined infrastructure. 
    more » « less
  5. Runtimes and applications that rely heavily on asynchronous event notifications suffer when such notifications must traverse several layers of processing in software. Many of these layers necessarily exist in order to support a general-purpose, portable kernel architecture, but they introduce considerable overheads for demanding, high-performance parallel runtimes and applications. Other overheads can arise from a mismatched event programming or system call interface. Whatever the case, the average latency and variance in latency of commonly used software mechanisms for event notifications is abysmal compared to the capabilities of the hardware, which can exhibit orders of magnitude lower latency. We leverage the flexibility and freedom of the previously proposed Hybrid Runtime (HRT) model to explore the construction of low-latency, asynchronous software events uninhibited by interfaces and execution models commonly imposed by general-purpose OSes. We propose several mechanisms in a system we call Nemo which employs kernel mode-only features to accelerate event notifications by up to 4,000 times and we provide a detailed evaluation of our implementation using extensive microbenchmarks. We carry out our evaluation both on a modern x64 server and the Intel Xeon Phi. Finally, we propose a small addition to existing interrupt controllers (APICs) that could push the limit of asynchronous events closer to the latency of the hardware cache coherence network. 
    more » « less