Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available February 1, 2026
-
Workflow systems provide a convenient way for users to write large-scale applications by composing independent tasks into large graphs that can be executed concurrently on high-performance clus- ters. In many newer workflow systems, tasks are often expressed as a combination of function invocations in a high-level language. Because necessary code and data are not statically known prior to execution, they must be moved into the cluster at runtime. An obvious way of doing this is to translate function invocations into self-contained executable programs and run them as usual, but this brings a hefty performance penalty: a function invocation now needs to piggyback its context with extra code and data to a remote node, and the remote node needs to take extra time to reconstruct the invocation’s context before executing it, both detrimental to lightweight short-running functions. A better solution for workflow systems is to treat functions and invocations as first-class abstractions: subsequent invocations of the same function on a worker node should only pay for the cost of context setup once and reuse the context between different invocations. The remaining problems lie in discovering, distributing, and retaining the reusable context among workers. In this paper, we discuss the rationale and design requirement of these mechanisms to support context reuse, and implement them in TaskVine, a data- intensive distributed framework and execution engine. Our results from executing a large-scale neural network inference application and a molecular design application show that treating functions and invocations as first-class abstractions reduces the execution time of the applications by 94.5% and 26.9%, respectively.more » « lessFree, publicly-accessible full text available June 3, 2025
-
Users running dynamic workflows in distributed systems usually have inadequate expertise to correctly size the allocation of resources (cores, memory, disk) to each task due to the difficulty in uncovering the obscure yet important correlation between tasks and their resource consumption. Thus, users typically pay little attention to this problem of allocation sizing and either simply apply an error-prone upper bound of resource allocation to all tasks, or delegate this responsibility to underlying distributed systems, resulting in substantial waste from allocated yet unused resources. In this paper, we will first show that tasks performing different work may have significantly different resource consumption. We will then show that exploiting the heterogeneity of tasks is a desirable way to reveal and predict the relationship between tasks and their resource consumption, reduce waste from resource misallocation, increase tasks' consumption efficiency, and incentivize users' cooperation. We have developed two info-aware allocation strategies capitalizing on this characteristic and will show their effectiveness through simulations on two modern applications with dynamic workflows and five synthetic datasets of resource consumption. Our results show that info-aware strategies can cut down up to 98.7% of the total waste incurred by a best-effort strategy, and increase the efficiency in resource consumption of each task on average anywhere up to 93.9%.more » « less