skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dynamic Sizing of Continuously Divisible Jobs for Heterogeneous Resources
Many scientific applications operate on large datasets that can be partitioned and operated on concurrently.The existing approaches for concurrent execution generally rely on statically partitioned data. This static partitioning can lock performance in a sub-optimal configuration, leading to higher execution time and an inability to respond to dynamic resources.We present the Continuously Divisible Job abstraction which allows statically defined applications to have their component tasks dynamically sized responding to system behaviour. The Continuously Divisible Job abstraction defines a simple interface that dictates how work can be recursively divided, executed,and merged. Implementing this abstraction allows scientific applications to leverage dynamic job coordinators for execution.We also propose the Virtual File abstraction which allows read-only subsets of large files to be treated as separate files.In exploring the Continuously Divisible Job abstraction, two applications were implemented using the Continuously Divisible Job interface: a bioinformatics application and a high-energy physics event analysis. These were tested using an abstract job interface and several job coordinators. Comparing these against a previous static partitioning implementation we show comparable or better performance without having to make static decisions or implement complex dynamic application handling.  more » « less
Award ID(s):
1642409
PAR ID:
10210592
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on e-Science
Page Range / eLocation ID:
178 to 187
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Faceted execution is a linguistic paradigm for dynamic information-flow control with the distinguishing feature that program values may be faceted. Such values represent multiple versions or facets at once, for different security labels. This enables policy-agnostic programming: a paradigm permitting expressive privacy policies to be declared, independent of program logic. Although faceted execution prevents information leakage at runtime, it does not guarantee the absence of failure due to policy violations. By contrast with static mechanisms (such as security type systems), dynamic information-flow control permits arbitrarily expressive and dynamic privacy policies but imposes significant runtime overhead and delays discovery of any possible violations. In this paper, we present the two different abstract interpretations for faceted execution in the presence of first-class policies. We first present an abstraction which allows one to reason statically about the shape of facets at each program point. This abstraction is useful for statically proving the absence of runtime errors and eliminating runtime checks related to facets. Reasoning statically about the contents of faceted values, however, is complicated by the presence of first-class security labels, especially because abstract labels may conflate more than one runtime label. To address these issues, we also develop a more precise abstraction that relies on an analysis tracking singleton heap abstractions. We present an implementation of our coarse abstraction in Racket and demonstrate its performance on several sample programs. We conclude by showing how our precise domain can be used to verify information-flow properties. 
    more » « less
  2. Research on transaction processing has made significant progress towards improving performance of main memory multicore OLTP systems under low contention. However, these systems struggle on workloads with lots of conflicts. Partitioned databases (and variants) perform well on high contention workloads that are statically partitionable, but time-varying workloads often make them impractical. To- wards addressing this, we propose Strife—a novel transac- tion processing scheme that clusters transactions together dynamically and executes most of them without any con- currency control. Strife executes transactions in batches, where each batch is partitioned into disjoint clusters with- out any cross-cluster conflicts and a small set of residuals. The clusters are then executed in parallel with no concur- rency control, followed by residuals separately executed with concurrency control. Strife uses a fast dynamic clustering al- gorithm that exploits a combination of random sampling and concurrent union-find data structure to partition the batch online, before executing it. Strife outperforms lock-based and optimistic protocols by up to 2× on high contention workloads. While Strife incurs about 50% overhead relative to partitioned systems in the statically partitionable case, it performs 2× better when such static partitioning is not possible and adapts to dynamically varying workloads. 
    more » « less
  3. Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully exposed before the real execution. Without an understanding of the program execution, statically pre-trained models fail to comprehensively capture the dynamic code properties, such as the branch coverage and the runtime variable values, and they are consequently less effective at code understanding tasks, such as retrieving semantic clones and detecting software vulnerabilities. To close the gap between the static nature of language models and the dynamic characteristics of programs, we introduce TRACED, an execution-aware pre-training strategy for source code. Specifically, we pre-train code language models with a combination of source code, executable inputs, and corresponding execution traces. Our goal is to teach code models the complicated execution logic during the pre-training, enabling the model to statically estimate the dynamic code properties without repeatedly executing code during task-specific fine-tuning. To illustrate the effectiveness of our proposed approach, we fine-tune and evaluate TRACED on three downstream tasks: static execution estimation, clone retrieval, and vulnerability detection. The empirical results show that TRACED relatively improves the statically pre-trained code models by 12.4% for complete execution path prediction and by 25.2% for runtime variable value predictions. TRACED also significantly outperforms statically pre-trained models in clone retrieval and vulnerability detection across four public benchmarks. 
    more » « less
  4. Hardware enclaves are designed to execute small pieces of sensitive code or to operate on sensitive data, in isolation from larger, less trusted systems. Partitioning a large, legacy application requires significant effort. Partitioning an application written in a managed language, such as Java, is more challenging because of mutable language characteristics, extensive code reachability in class libraries, and the inevitability of using a heavyweight runtime. Civet is a framework for partitioning Java applications into enclaves. Civet reduces the number of lines of code in the enclave and uses language-level defenses, including deep type checks and dynamic taint-tracking, to harden the enclave interface. Civet also contributes a partitioned Java runtime design, including a garbage collection design optimized for the peculiarities of enclaves. Civet is efficient for data-intensive workloads; partitioning a Hadoop mapper reduces the enclave overhead from 10 to 16–22% without taint-tracking or 70–80% with taint-tracking. 
    more » « less
  5. null (Ed.)
    The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of streamed spatial data in real-time. The current scale of spatial data cannot be handled using centralized systems. This has led to the development of distributed spatial streaming systems. Existing systems are using static spatial partitioning to distribute the workload. In contrast, the real-time streamed spatial data follows non-uniform spatial distributions that are continuously changing over time. Distributed spatial streaming systems need to react to the changes in the distribution of spatial data and queries. This article introduces SWARM, a lightweight adaptivity protocol that continuously monitors the data and query workloads across the distributed processes of the spatial data streaming system and redistributes and rebalances the workloads as soon as performance bottlenecks get detected. SWARM is able to handle multiple query-execution and data-persistence models. A distributed streaming system can directly use SWARM to adaptively rebalance the system’s workload among its machines with minimal changes to the original code of the underlying spatial application. Extensive experimental evaluation using real and synthetic datasets illustrate that, on average, SWARM achieves 2 improvement in throughput over a static grid partitioning that is determined based on observing a limited history of the data and query workloads. Moreover, SWARM reduces execution latency on average 4 compared with the other technique. 
    more » « less