Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks. The device placement problem aims to identify optimal allocations of those nodes to a set of (potentially heterogeneous) devices. Existing approaches rely on two types of architectures known as grouper-placer and encoder-placer, respectively. In this work, we bridge the gap between encoder-placer and grouper-placer techniques and propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into account the DAG nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and jointed, personalized graph partitioning, using an unspecified number of groups. To train the entire framework, we use reinforcement learning using the execution time of the placement as a reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to over CPU execution and by up to compared to other commonly used baselines.
more »
« less
GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing
Careful placement of a distributed computational application within a target device cluster is critical for achieving low application completion time. The problem is challenging due to its NP-hardness and combinatorial nature. In recent years, learning-based approaches have been proposed to learn a placement policy that can be applied to unseen applications, motivated by the problem of placing a neural network across cloud servers. These approaches, however, generally assume the device cluster is fixed, which is not the case in mobile or edge computing settings, where heterogeneous devices move in and out of range for a particular application. To address the challenge of scaling to different-sized device clusters and adapting to the addition of new devices, we propose a new learning approach called GiPH, which learns policies that generalize to dynamic device clusters via 1) a novel graph representation gpNet that efficiently encodes the information needed for choosing a good placement, and 2) a scalable graph neural network (GNN) that learns a summary of the gpNet information. GiPH turns the placement problem into that of finding a sequence of placement improvements, learning a policy for selecting this sequence that scales to problems of arbitrary size. We evaluate GiPH with a wide range of task graphs and device clusters and show that our learned policy rapidly finds good placements for new problem instances. GiPH finds placements that achieve up to 30.5% better makespan, searching up to 3× faster than other search-based placement policies.
more »
« less
- Award ID(s):
- 1645578
- PAR ID:
- 10492333
- Publisher / Repository:
- MLSys
- Date Published:
- Journal Name:
- MLSys
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Model predictive control (MPC) provides a useful means for controlling systems with constraints, but suffers from the computational burden of repeatedly solving an optimization problem in real time. Offline (explicit) solutions for MPC attempt to alleviate real time computational challenges using either multiparametric programming or machine learning. The multiparametric approaches are typically applied to linear or quadratic MPC problems, while learning-based approaches can be more flexible and are less memory-intensive. Existing learning-based approaches offer significant speedups, but the challenge becomes ensuring constraint satisfaction while maintaining good performance. In this paper, we provide a neural network parameterization of MPC policies that explicitly encodes the constraints of the problem. By exploring the interior of the MPC feasible set in an unsupervised learning paradigm, the neural network finds better policies faster than projection-based methods and exhibits substantially shorter solve times. We use the proposed policy to solve a robust MPC problem, and demonstrate the performance and computational gains on a standard test system.more » « less
-
models, it is difficult to fit and train a complete copy of the model on a single computational device with limited capability. Therefore, large neural networks are usually trained on a mixture of devices, including multiple CPUs and GPUs, of which the computational speed and efficiency are drastically affected by how these models are partitioned and placed on the devices. In this paper, we propose Mars, a novel design to find efficient placements for large models. Mars leverages a self-supervised graph neural network pre-training framework to generate node representations for operations, which is able to capture the topological properties of the computational graph. Then, a sequence-to-sequence neural network is applied to split large models into small segments so that Mars can predict the placements sequentially. Novel optimizations have been applied in the placer design to achieve the best possible performance in terms of the time needed to complete training the agent for placing models with very large sizes. We deployed and evaluated Mars on benchmarks involving Inception-V3, GNMT, and BERT models. Extensive experimental results show that Mars can achieve up to 27.2% and 2.7% speedup of per-step training time than the state-of-the-art for GNMT and BERT models, respectively. We also show that with self-supervised graph neural network pretraining, our design achieves the fastest speed in discovering the optimal placement for Inception-V3.more » « less
-
A crucial challenge for data-parallel clusters is achieving high application-level communication efficiency for structured traffic flows (a.k.a. Coflows) from distributed data processing applications. A range of recent works focus on designing network scheduling algorithms with predetermined Coflow placement, i.e. the endpoints of subflows within a Coflow are preset. However, the underlying Coflow placement problem and its decisive impact on scheduling efficiency have long been overlooked. It is hard to find good placements for Coflows. At the intra-Coflow level, constituent flows are related and therefore their placement decisions are dependent. Thus, strategies extended from flow-by-flow placement is sub-optimal due to negligence of the inter-flow relationship in a Coflow. At the inter-Coflow level, placing a new Coflow may introduce contentions with existing Coflows, which changes communication efficiency. This paper is the first to study the Coflow placement problem with careful considerations of the inter-flow relationship in Coflows. We formulate the Coflow placement problem and propose a Coflow placement algorithm. Under realistic traffic in various settings, our algorithm reduces the average completion time for Coflows by up to 26%.more » « less
-
We consider the problem of jammer placement to partition a wireless network, where the network nodes and jammers are located in the real plane. In previous research, we found optimal and suboptimal jammer placements by reducing the search space for the jammers to the locations of the network nodes. In this paper, we develop techniques to find optimal jammer placements over all possible jammer placements in the real plane. Our approach finds a set of candidate jammer locations (CJLs) such that a jammer-placement solution using the CJLs achieves the minimum possible cardinality among all possible jammer placements in the real plane. The CJLs can be used directly with the optimal and fast, suboptimal algorithms for jammer placement from our previous work.more » « less
An official website of the United States government
