Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation.
more »
« less
Graph Representation of Computer Network Resources for Precise Allocations
Distributed networked systems form an essential resource for computation and applications ranging from commercial, military, scientific, and research communities. Allocation of resources on a given infrastructure is realized through various mapping systems that are tailored towards specific use cases of the requesting applications. While HPC system requests demand compute resources heavy on processor and memory, cloud applications may demand distributed web services that are composed of networked processing and some memory. All resource requests allocate on the infrastructure with some form of network connectivity. However, during mapping of resources, the features and topology constraints of network components are typically handled indirectly through abstractions of user requests. This paper is on a novel graph representation that enables precise mapping methods for distributed networked systems. The proposed graph representations are demonstrated to allocate specific network components and adjacency requirements of a requested graph on a given infrastructure. Furthermore, we report on application of business policy requirements that resulted in increased utilization and a gradual decrease in idle node count as requests are mapped using our proposed methods.
more »
« less
- Award ID(s):
- 2018472
- PAR ID:
- 10366229
- Date Published:
- Journal Name:
- ICCCN
- Page Range / eLocation ID:
- 1 to 10
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Several recent studies have investigated the virtual machine (VM) provisioning problem for requests with time constraints (deadlines) in cloud systems. These studies typically assumed that a request is associated with a single execution time when running on VMs with a given resource demand. In this paper, we consider modern applications that are normally implemented with generic frameworks that allow them to execute with various numbers of threads on VMs with different resource demands. For such applications, it is possible for the users to specify multiple execution options (MEOs) for a request where each execution option is represented by a certain number of VMs with some resources to run the application and its corresponding execution time. We investigate the problem of virtual machine provisioning for such time-sensitive requests with MEOs in resource-constrained clouds. By incorporating the MEOs of requests, we propose several novel and flexible VM provisioning schemes that carefully balance resource usage efficiency, input workloads and request deadlines with the objective of achieving higher resource utilization and system benefits. We evaluated the proposed MEO-aware schemes on various workloads with both benchmark requests and synthetic requests. The results show that our MEO-aware algorithms outperform the state-of-the-art schemes that consider only a single execution option of requests by serving up to 38% more requests and achieving up to 27% more benefits.more » « less
-
Several recent studies have investigated the virtual machine (VM) provisioning problem for requests with time constraints (deadlines) in cloud systems. These studies typically assumed that a request is associated with a single execution time when running on VMs with a given resource demand. In this paper, we consider modern applications that are normally implemented with generic frameworks that allow them to execute with various numbers of threads on VMs with different resource demands. For such applications, it is possible for the users to specify multiple execution options (MEOs) for a request where each execution option is represented by a certain number of VMs with some resources to run the application and its corresponding execution time. We investigate the problem of virtual machine provisioning for such time-sensitive requests with MEOs in resource-constrained clouds. By incorporating the MEOs of requests, we propose several novel and flexible VM provisioning schemes that carefully balance resource usage efficiency, input workloads and request deadlines with the objective of achieving higher resource utilization and system benefits. We evaluated the proposed MEO-aware schemes on various workloads with both benchmark requests and synthetic requests. The results show that our MEO-aware algorithms outperform the state-of-the-art schemes that consider only a single execution option of requests by serving up to 38% more requests and achieving up to 27% more benefits.more » « less
-
Network resource reservation systems are being developed and deployed, driven by the demand and substantial benefits of providing performance predictability for modern distributed applications. However, existing systems suffer limitations: They either are inefficient in finding the optimal resource reservation, or cause private information (e.g., from the network infrastructure) to be exposed (e.g., to the user). In this paper, we design BoxOpt, a novel system that leverages efficient oracle construction techniques in optimization and learning theory to automatically, and swiftly learn the optimal resource reservations without exchanging any private information between the network and the user. We implement a prototype of BoxOpt and demonstrate its efficiency and efficacy via extensive experiments using real network topology and trace. Results show that (1) BoxOpt has a 100% correctness ratio, and (2) for 95% of requests, BoxOpt learns the optimal resource reservation within 13 seconds.more » « less
-
d. Many of the infrastructure sectors that are considered to be crucial by the Department of Homeland Security include networked systems (physical and temporal) that function to move some commodity like electricity, people, or even communication from one location of importance to another. The costs associated with these flows make up the price of the network’s normal functionality. These networks have limited capacities, which cause the marginal cost of a unit of flow across an edge to increase as congestion builds. In order to limit the expense of a network’s normal demand we aim to increase the resilience of the system and specifically the resilience of the arc capacities. Divisions of critical infrastructure have faced difficulties in recent years as inadequate resources have been available for needed upgrades and repairs. Without being able to determine future factors that cause damage both minor and extreme to the networks, officials must decide how to best allocate the limited funds now so that these essential systems can withstand the heavy weight of society’s reliance. We model these resource allocation decisions using a two-stage stochastic program (SP) for the purpose of network protection. Starting with a general form for a basic two-stage SP, we enforce assumptions that specify characteristics key to this type of decision model. The second stage objective—which represents the price of the network’s routine functionality—is nonlinear, as it reflects the increasing marginal cost per unit of additional flow across an arc. After the model has been designed properly to reflect the network protection problem, we are left with a nonconvex, nonlinear, nonseparable risk-neutral program. This research focuses on key reformulation techniques that transform the problematic model into one that is convex, separable, and much more solvable. Our approach focuses on using perspective functions to convexify the feasibility set of the second stage and second order conic constraints to represent nonlinear constraints in a form that better allows the use of computational solvers. Once these methods have been applied to the risk-neutral model we introduce a risk measure into the first stage that allows us to control the balance between an efficient, solvable model and the need to hedge against extreme events. Using Benders cuts that exploit linear separability, we give a decomposition and solution algorithm for the general network model. The innovations included in this formulation are then implemented on a transportation network with given flow demandmore » « less
An official website of the United States government

