Mobile Edge Cloud endures limited computational resources as compared to back-end cloud. Semi-Markov Decision Process (SMDP) basedMulti-Resource Allocation (MRA) work [6] introduces optimal resource allocation for mobile requests in the resource constrained edge cloud environments. In this study, we scale existing SMDPMRA work for real-world scenarios. First, we structure the policy tables in a two dimensional matrix such that columns represent states of the system and rows for the actions. Second, we propose an index based search technique over structured policy tables. Simulation results demonstrate that our approach outperforms the legacy method and retrieves an optimal action from the policy tables in the order of microseconds, which meets the delay criteria of real-time applications in edge cloud based systems.
more »
« less
MIRAS: Model-based Reinforcement Learning for Microservice Resource Allocation over Scientific Workflows
Microservice, an architectural design that decomposes applications into loosely coupled services, is adopted in modern software design, including cloud-based scientific workflow processing. The microservice design makes scientific workflow systems more modular, more flexible, and easier to develop. However, cloud deployment of microservice workflow execution systems doesn't come for free, and proper resource management decisions have to be made in order to achieve certain performance objective (e.g., response time) within constraint operation cost. Nevertheless, effective online resource allocation decisions are hard to achieve due to dynamic workloads and the complicated interactions of microservices in each workflow. In this paper, we propose an adaptive resource allocation approach for microservice workflow system based on recent advances in reinforcement learning. Our approach (1) assumes little prior knowledge of the microservice workflow system and does not require any elaborately designed model or crafted representative simulator of the underlying system, and (2) avoids high sample complexity which is a common drawback of model-free reinforcement learning when applied to real-world scenarios. We show that our proposed approach automatically achieves effective policy for resource allocation with limited number of time-consuming interactions with the microservice workflow system. We perform extensive evaluations to validate the effectiveness of our approach and demonstrate that it outperforms existing resource allocation approaches with read-world emulated workflows.
more »
« less
- Award ID(s):
- 1827126
- PAR ID:
- 10192135
- Date Published:
- Journal Name:
- 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)
- Page Range / eLocation ID:
- 122 to 132
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Edge Cloud (EC) is poised to brace massive machine type communication (mMTC) for 5G and IoT by providing compute and network resources at the edge. Yet, the EC being regionally domestic with a smaller scale, faces the challenges of bandwidth and computational throughput. Resource management techniques are considered necessary to achieve efficient resource allocation objectives. Software Defined Network (SDN) enabled EC architecture is emerging as a potential solution that enables dynamic bandwidth allocation and task scheduling for latency sensitive and diverse mobile applications in the EC environment. This study proposes a novel Heuristic Reinforcement Learning (HRL) based flowlevel dynamic bandwidth allocation framework and validates it through end-to-end implementation using OpenFlow meter feature. OpenFlow meter provides granular control and allows demand-based flow management to meet the diverse QoS requirements germane to IoT traffics. The proposed framework is then evaluated by emulating an EC scenario based on real NSF COSMOS testbed topology at The City College of New York. A specific heuristic reinforcement learning with linear-annealing technique and a pruning principle are proposed and compared with the baseline approach. Our proposed strategy performs consistently in both Mininet and hardware OpenFlow switches based environments. The performance evaluation considers key metrics associated with real-time applications: throughput, end-to-end delay, packet loss rate, and overall system cost for bandwidth allocation. Furthermore, our proposed linear annealing method achieves faster convergence rate and better reward in terms of system cost, and the proposed pruning principle remarkably reduces control traffic in the network.more » « less
-
Constructing and executing reproducible workflows is fundamental to performing research in a variety of scientific domains. Many of the current commercial and open source solutions for workflow en- gineering impose constraints—either technical or budgetary—upon researchers, requiring them to use their limited funding on expensive cloud platforms or spend valuable time acquiring knowledge of software systems and processes outside of their domain expertise. Even though many commercial solutions offer free-tier services, they often do not meet the resource and architectural requirements (memory, data storage, compute time, networking, etc) for researchers to run their workflows effectively at scale. Tapis Workflows abstracts away the complexities of workflow creation and execution behind a web-based API with a simplified workflow model comprised of only pipelines and tasks. This paper will de- tail how Tapis Workflows approaches workflow management by exploring its domain model, the technologies used, application architecture, design patterns, how organizations are leveraging Tapis Workflows to solve unique problems in their scientific workflows, and this projects’s vision for a simple, open source, extensible, and easily deployable workflow engine.more » « less
-
Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem – allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.more » « less
-
Significant obstacles exist in scientific domains including genetics, climate modeling, and astronomy due to the management, preprocess, and training on complicated data for deep learning. Even while several large-scale solutions offer distributed execution environments, open-source alternatives that integrate scalable runtime tools, deep learning and data frameworks on high-performance computing platforms remain crucial for accessibility and flexibility. In this paper, we introduce Deep Radical-Cylon(RC), a heterogeneous runtime system that combines data engineering, deep learning frameworks, and workflow engines across several HPC environments, including cloud and supercomputing infrastructures. Deep RC supports heterogeneous systems with accelerators, allows the usage of communication libraries like MPI, GLOO and NCCL across multi-node setups, and facilitates parallel and distributed deep learning pipelines by utilizing Radical Pilot as a task execution framework. By attaining an end-to-end pipeline including preprocessing, model training, and postprocessing with 11 neural forecasting models (PyTorch) and hydrology models (TensorFlow) under identical resource conditions, the system reduces 3.28 and 75.9 seconds, respectively. The design of Deep RC guarantees the smooth integration of scalable data frameworks, such as Cylon, with deep learning processes, exhibiting strong performance on cloud platforms and scientific HPC systems. By offering a flexible, high-performance solution for resource-intensive applications, this method closes the gap between data preprocessing, model training, and postprocessing.more » « less
An official website of the United States government

