skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PONCHO: Dynamic Package Synthesis for Distributed and Serverless Python Applications
An increasing number of distributed applications operate by dispatching function invocations across the nodes of a distributed system. To operate correctly, the code and data dependencies of the function must be distributed along with the invocations in some way. When translating applications to work on large scale distributed systems, managing these dependencies becomes challenging: delivery must be scalable to thousands of nodes; the dependencies must be consistent across the system; and the method must be usable by an unprivileged developer. As a solution, in this paper we present PONCHO, which is a lightweight Python based toolkit which allows users to discover, package, and deploy dependencies as an integral part of distributed applications. PONCHO encapsulates a set of commands to be executed within an environment. PONCHO offers a lightweight solution to create and manage environments increasing the portability of scientific applications as well as reproducibility. In this paper, we evaluate PONCHO with real-world applications in the fields of physics, computational chemistry, and hyperparameter optimization, We observe the challenges that arise when creating and distributing an environment and measure the overheads that emerge as a result.  more » « less
Award ID(s):
1931348
PAR ID:
10356915
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
High Performance Serverless Workshop at HPDC
Page Range / eLocation ID:
8 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Workflow systems provide a convenient way for users to write large-scale applications by composing independent tasks into large graphs that can be executed concurrently on high-performance clus- ters. In many newer workflow systems, tasks are often expressed as a combination of function invocations in a high-level language. Because necessary code and data are not statically known prior to execution, they must be moved into the cluster at runtime. An obvious way of doing this is to translate function invocations into self-contained executable programs and run them as usual, but this brings a hefty performance penalty: a function invocation now needs to piggyback its context with extra code and data to a remote node, and the remote node needs to take extra time to reconstruct the invocation’s context before executing it, both detrimental to lightweight short-running functions. A better solution for workflow systems is to treat functions and invocations as first-class abstractions: subsequent invocations of the same function on a worker node should only pay for the cost of context setup once and reuse the context between different invocations. The remaining problems lie in discovering, distributing, and retaining the reusable context among workers. In this paper, we discuss the rationale and design requirement of these mechanisms to support context reuse, and implement them in TaskVine, a data- intensive distributed framework and execution engine. Our results from executing a large-scale neural network inference application and a molecular design application show that treating functions and invocations as first-class abstractions reduces the execution time of the applications by 94.5% and 26.9%, respectively. 
    more » « less
  2. The rapid growth in technology and wide use of internet has increased smart applications such as intelligent transportation control system, and Internet of Things, which heavily rely on an efficient and reliable connectivity network. To overcome high bandwidth work load on the network, as well as minimize latency for real-time applications, the computation can be moved from the central cloud to a distributed edge cloud. The edge computing benefits various smart applications that uses distributed network for data analytics and services. Different from the existing cloud management solutions, edge computing needs to move cloud management services towards distributed heterogeneous edge nodes for multi-tenant user applications. However, existing cloud management services do not offer remote deployment of multi-tenant user applications on the cloud of edge nodes. In this paper, we propose a practical edge cloud software framework for deploying multi-tenant distributed smart applications. Having multiple distributed end nodes, auto discovery of all active end nodes is required for deploying multi-tenant user applications. However, existing cloud solutions require either private network or fixed IP address, which is not achievable for the distributed edge nodes. Most of the edge nodes connected through the public internet without fixed IP, and some of them even connect through IEEE 802.15 based sensor networks. We propose to build a software platform to manage the distributed edge nodes as well as support services to deploy and launch isolated, multi-tenant user applications through a lightweight container. We propose an architectural solution to remotely access edge cloud management services through intermittent internet connections. We open sourced our whole set of software solutions, and analyzed the major performance metrics of the edge cloud platform. 
    more » « less
  3. Edge application’s distributed nature presents significant challenges for developers in orchestrating and managing the multitenant applications. In this paper, we propose a practical edge cloud software framework for deploying multitenant distributed smart applications. Here we exploit commodity, a low cost embedded board to form distributed edge clusters. The cluster of geo-distributed and wireless edge nodes not only power multitenant IoT applications that are closer to the data source and the user, but also enable developers to remotely deploy and orchestrate application containers over the cloud. Specifically, we propose building a software platform to manage the distributed edge nodes along with support services to deploy and launch isolated and multitenant user applications through a lightweight container. In particular, we propose an architectural solution to improve the resilience of edge cloud services through peer collaborated service migration when the failures happen or when resources are overburdened. We focus on giving the developers a single point control of the infrastructure over the intermittent and lossy wide area networks (WANs) and enabling the remote deployment of multitenant applications. 
    more » « less
  4. The Kaczmarz algorithm is an iterative method for solving systems of linear equations. We introduce a randomized Kaczmarz algorithm for solving systems of linear equations in a distributed environment, i.e., the equations within the system are distributed over multiple nodes within a network. The modification we introduce is designed for a network with a tree structure that allows for passage of solution estimates between the nodes in the network. We demonstrate that the algorithm converges to the solution, or the solution of minimal norm, when the system is consistent. We also prove convergence rates of the randomized algorithm that depend on the spectral data of the coefficient matrix and the random control probability distribution. In addition, we demonstrate that the randomized algorithm can be used to identify anomalies in the system of equations when the measurements are perturbed by large, sparse noise. 
    more » « less
  5. The Kaczmarz algorithm is an iterative method for solving systems of linear equations. We introduce a modified Kaczmarz algorithm for solving systems of linear equations in a distributed environment, i.e., the equations within the system are distributed over multiple nodes within a network. The modification we introduce is designed for a network with a tree structure that allows for passage of solution estimates between the nodes in the network. We prove that the modified algorithm converges under no additional assumptions on the equations. We demonstrate that the algorithm converges to the solution, or the solution of minimal norm, when the system is consistent. We also demonstrate that in the case of an inconsistent system of equations, the modified relaxed Kaczmarz algorithm converges to a weighted least squares solution as the relaxation parameter approaches 0. 
    more » « less