<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Efficient Distribution of Quantum Circuits</title></titleStmt>
			<publicationStmt>
				<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>
				<date>01/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10352366</idno>
					<idno type="doi">10.4230/LIPIcs.DISC.2021.41</idno>
					
					<author>Ranjani G_Sundaram</author><author>Himanshu Gupta</author><author>C R Ramakrishnan</author><author>Seth Gilbert</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Quantum computing hardware is improving in robustness, but individual computers still have small number of qubits (for storing quantum information). Computations needing a large number of qubits can only be performed by distributing them over a network of smaller quantum computers. In this paper, we consider the problem of distributing a quantum computation, represented as a quantum circuit, over a homogeneous network of quantum computers, minimizing the number of communication operations needed to complete every step of the computation. We propose a two-step solution: dividing the given circuit’s qubits among the computers in the network, and scheduling communication operations, called migrations, to share quantum information among the computers to ensure that every operation can be performed locally. While the first step is an intractable problem, we present a polynomial-time solution for the second step in a special setting, and a O(log n)-approximate solution in the general setting. We provide empirical results which show that our two-step solution outperforms existing heuristic for this problem by a significant margin (up to 90%, in some cases).]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Motivation and Driving Problem. Over the recent past, there has been a steady increase in the capacity of quantum computing hardware. A crucial obstacle to achieving the full potential of quantum computing is the limited number of qubits, which are basic stores of quantum information, in any single quantum computer (QC). Distributing a large quantum computation over a network of QCs is a way to overcome this obstacle <ref type="bibr">[7,</ref><ref type="bibr">5]</ref>. Distributing a computation among a network of QCs will be important to realize the promise and power of quantum computation. Quantum circuits form a useful abstraction between higher-level quantum programs written in languages such as Qiskit <ref type="bibr">[14]</ref> and Quipper <ref type="bibr">[15]</ref>, and lower-level computing hardware. In this paper, we consider the problem of optimally distributing a given quantum circuit for evaluation over a network of QCs. This distribution involves mapping the qubits in the circuit to individual QCs such that the communication needed to perform operations whose operands span different QCs is minimized. The distribution problem is novel to quantum computing in two important ways:</p><p>1. Quantum circuits have an explicit linear structure that makes it possible to determine the cost of a distribution before the circuit is evaluated. In contrast, classical programs have conditionals and loops where data dependencies cannot be predicted before execution. 2. Entanglement, which is a phenomenon unique to quantum computing, enables modes of communication that permit efficient sharing of information between operations.</p><p>The above differences enable novel solutions to the problem of distributing quantum computation (see &#167;2 for an elaboration of these key concepts).</p><p>Efficient techniques for executing a single operation with operands that span different QCs in a network have been studied before <ref type="bibr">[11,</ref><ref type="bibr">18,</ref><ref type="bibr">19]</ref>. More recently, <ref type="bibr">[8]</ref> posed the problem of finding an optimal distribution of a quantum circuit under a naive communication model in terms of balanced graph partitioning, and solved it using well-known heuristics for that problem <ref type="bibr">[12]</ref>. A formulation with a refined communication model that exploits quantum entanglement was considered in <ref type="bibr">[3]</ref>, where the optimization problem was shown to be intractable and solved using hypergraph partitioning heuristics <ref type="bibr">[1]</ref>. A more detailed comparison with related work is in &#167;2.3.</p><p>Our Approach and Contributions. We pose the problem of optimal distribution of quantum circuit evaluation in two steps (see overview in &#167;3). In the first step, we partition the qubits of the circuit using ordinary balanced graph partitioning, but with the edge weights determined by accurately estimating the cost of placing the corresponding qubits in different partitions. In the second step, we solve the problem of optimal placement of operations for a given partition. The main contributions of the paper are:</p><p>For an appropriately formulated DQC problem of optimizing distributed evaluation of quantum circuits, we develop an efficient heuristic that outperforms the prior work by a significant margin (up to 90%) across a wide range of parameters (see &#167;6).</p><p>Our algorithm for the DQC problem is a two-step heuristic. For the second step of covering gates using minimum number of communication primitives, we design an optimal algorithm for a specialized setting (see &#167;4), and an O(log n)-approximation algorithm for the general setting (see &#167;5).</p><p>We defer the proofs of all lemmas and theorems to Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background and Related Works</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Quantum Computation and Communication</head><p>This paper's technical development can be followed with a high-level understanding of three of quantum computing/communication concepts: the structure of quantum circuits, properties of certain quantum gates, and characteristics of quantum communication that are directly used in our development. In the following, we give an overview of these three concepts. For a more thorough review of this area, the reader is referred to standard sources such as <ref type="bibr">[13]</ref>.</p><p>Quantum Circuits. Quantum computation is typically abstracted as a circuit, where horizontal "wires" represent qubits which carry quantum data, and operations on the qubits performed by vertical "gates" connecting the operand wires. Quantum computers (QCs) evaluate a circuit by applying the gates in the left-to-right order, so this circuit can also be understood as a sequence of machine-level instructions (gates) over fixed number of data cells (qubits). Analogous to classical Boolean circuits, there are several universal gate sets for quantum computation: any quantum computation can be expressed by a circuit consisting only of gates from a universal gate set. In particular, a special binary gate called CZ along with the set of all possible unary gates forms a universal gate set; we make use of this universal gate set in this paper. Fig. <ref type="figure">1</ref> shows the pictorial representation of an example circuit, consisting only of unary gates (boxes) and CZ gates (vertical connectors). Quantum circuits also comprise of measurement operations which yield classical bits as results, and conditional operations that are guarded by classical bit values. For the purposes of this paper, these operations can be treated as unary operations.</p><p>Quantum Communication. If a given quantum circuit is to be evaluated in a distributed fashion over a network of QCs, we have to first distribute the qubits over the QCs. But such a distribution may induce gates in the circuit to span different QCs. To execute such non-local gates, we need to bring all operands' values into a single QC via quantum communication. However, physical transmission of arbitrary qubits can incur irreparable communication errors, as the No Cloning Theorem <ref type="bibr">[17,</ref><ref type="bibr">9]</ref> proscribes making independent copies of arbitrary qubits. An alternative approach to communicate qubits is via teleportation <ref type="bibr">[4]</ref>, which requires an a priori distribution of maximally-entangled pair (MEP) of qubits (e.g., Bell Pair) over the two nodes. With an MEP distributed over nodes A and B, teleportation of a qubit state from A to B can be accomplished using classical communication and local gate operations, while consuming/destroying the MEP.</p><p>Creating "Linked Copies" of a Qubit via Cat-Entanglement. Another means of communicating qubit states is by creating linked copies of a qubit across QCs, via cat-entanglement operations <ref type="bibr">[11,</ref><ref type="bibr">19]</ref> which, like teleportation, require a Bell Pair to be shared a priori. These linked copies are particularly useful in efficient distributed evaluation of circuits involving only CZ and unary gates, as follows. CZ gates have two special properties: (i) we can freely replace an operand q with a linked copy of q; and (ii) CZ operations commute with cat-entanglement operation. As a consequence, the linked copies act like shared memories that continue to remain "in sync" across QCs as long as only CZ operations are executed on them. Unlike CZ gates, unary gates destroy the cat-entanglements in general. 1 Before applying a unary operation on q, we have to disentangle any linked copies via a dual operation called cat-disentanglement which doesn't require a Bell Pair.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Our Model of Distributed Evaluation of Quantum Circuits</head><p>Our model for distributed evaluation of quantum circuits is as follows.</p><p>1. Without loss of generality, we assume, that the given (centralized) circuit is composed only of unary and CZ gates. Circuits using other gate sets can be rewritten to this form, with only a polynomial expansion in size. 2. We assume that memory for Bell pairs used in cat-entanglement (called ebits) are distinct from that used for computation qubits, as in prior works <ref type="bibr">[3]</ref>.</p><p>1 An important exception are phase-shift gates which also preserve cat-entanglement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D I S C 2 0 2 1</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>41:4</head><p>Efficient Distribution of Quantum Circuits 3. Our network model consists of homogeneous set of QCs: the memory capacity for computation/data qubits is nearly the same in all QCs. Thus, when distributing an input circuit, we partition the number of qubits almost-uniformly across the given QCs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4.</head><p>We consider only a "static" assignment of qubits to QCs, i.e., each qubit has a home where all the unary operations are performed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.</head><p>Due to the advantages listed in &#167;2.1, we use cat-entanglement as the only mechanism to communicate qubits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6.</head><p>Gates in the given quantum circuit are executed in the order they appear in the circuit: no further optimizations such as gate reordering are done during distributed evaluation.</p><p>Cost Model and Objective. Let us assume that the given (centralized) quantum circuit is already optimized in the sense that successive gates on different operands are executed in parallel. For such an optimized circuit, if we disallow any gate reordering optimization (#6 above), it is easy to see that all distributed evaluation schemes would incur the same computational cost/time. Thus, our optimization objective in distribution of a given quantum circuit is to minimize the communication cost, i.e., the number of cat-entanglement operations needed to evaluate the gates. As mentioned above, cat-entanglements require a priori shared MEPs, which is an expensive resource -generating them can incur substantial latency due to the stochastic nature of underlying processes <ref type="bibr">[10,</ref><ref type="bibr">16]</ref>. Note that cat-disentanglement operations only require local operations and classical communication, and furthermore, only done following prior cat-entanglements. Hence, their cost can be ignored (or folded into the cost of cat-entanglements).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Related Work</head><p>The problem of implementing non-local gates in distributed quantum circuits was considered in <ref type="bibr">[11]</ref>. That paper showed necessary and sufficient communications needed to implement several gates non-locally. In particular, they used the idea of cat-entanglement to share linked copies across QCs. Such sharing was further explored in <ref type="bibr">[19]</ref>, which also introduced the terms "cat-entanglement" and "cat-disentanglement". The paper generalized sharing to the multi-partite case, by essentially composing cat-entanglement operations. Both <ref type="bibr">[11]</ref> and <ref type="bibr">[19]</ref> focus on optimal implementation of given non-local gates. In contrast, <ref type="bibr">[8]</ref> and <ref type="bibr">[3]</ref> focus on optimizing the assignment of qubits to QCs in order to minimize communication costs. A teleportation-based model is used in <ref type="bibr">[8]</ref>: non-local operands of a gate are teleported to a common QC for the gate operation, and then teleported back to their original locations. The paper poses the optimization problem in terms of balanced k-partition of a weighted graph with qubits as vertices. The paper uses the graph partitioning algorithm of <ref type="bibr">[12]</ref> to derive good partitions.</p><p>The closest work to ours is that of <ref type="bibr">[3]</ref>, where cat-entanglement operations are used to share linked copies of qubits. The optimal assignment problem is posed in terms of balanced hypergraph partitioning, with hyperedges capturing the set of gates that can be executed with linked copies. The assignment problem is shown to be intractable via reduction from the hypergraph balanced k-min-cut problem. In contrast, we have developed a two-step heuristic, which uses a simpler min-cut problem in the first step and a coverage problem in the second step that can be solved optimally or near-optimally. We compare our work empirically with theirs in &#167;6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">DQC Problem Formulation and High-Level Algorithm</head><p>In this section, we formally define the problem of efficient distribution of quantum circuits, and give a high-level description of our proposed algorithm.</p><p>Quantum Circuits. We use q 1 , q 2 , . . . , q n to denote qubits. Quantum circuits are composed of gates. Based on &#167;2, we consider the universal gate set with (binary) CZ and unary gates.</p><p>Since we only use one type of binary gate and the type of unary gate does not play a role in our context, we need to only represent the operands of each gate and not the gate itself. We use the below notion of an abstract quantum circuit that elides the gate information.</p><p>. .&#10217; where each g k is one of: (q i , q j ): the pair of operands for a CZ gate that occurs as the k-th gate. q i : the operand of an unary gate that occurs as the k-th gate. Throughout the article, we thus represent binary gates in a circuit as triplets (q i , q j , t), and unary gates as pairs (q i , t), where t is the index/timestamp of the gate in the circuit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Distributed Quantum Circuit (DQC) Problem</head><p>Our goal is to determine an efficient distribution of a given quantum circuit, over a given network of QCs. Efficient distribution essentially entails two tasks: distributing the qubits over the distributed QCs, and then executing the gates efficiently over the distributed QCs. We represent the distribution of qubits over the distributed QCs as a partition of the qubits. We also define a concept of migrations which represent the cat-entanglement operations; each migration may facilitate execution of one or more gates, which is captured via a notion of coverage of gates by migrations.</p><p>Informally, the DQC problem is to (i) partition the given circuit across distributed QCs, and (ii) for the given partition, determine a small set of migrations that are sufficient to execute/cover all the non-local gates in the circuit. The overall goal of the DQC problem is to minimize the number of cat-entanglements (or migrations, as we represent them as). Below, we formalize the notions of partitions, migrations, and execution of gates by migrations (via a notion of coverage), before defining the DQC problem formally.</p><p>Partitioning. Distributing a quantum circuit first entails assigning qubits to quantum computers in the network. In this paper, we implicitly assume homogeneous quantum computers, and thus, we consider only near-balanced partitions of the qubits across QCs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#9654; Definition 3.2 (Partition). Given integer k &gt; 0 and real &#957; &gt; 1, a balanced (k, &#957;)-partition of a finite set S divides S into k components i.e., disjoint subsets of S, such that each component is of size at most &#957; &#8226; |S|</head><p>k . A partition can be represented by a total function &#960; that maps S to a set of labels P = {p 1 , p 2 , . . . , p k }.</p><p>For our DQC problem, we consider (k, &#957;)-balanced partitions of the set of qubits Q using the set of QCs, P = {p 1 , p 2 , . . . , p k }, as the partition labels. Note that such a partition assigns qubits to QCs. Throughout this paper, we refer to quantum computers p i as partitions, and &#960;(q) as the home-partition of q. In addition, we refer to a quantum circuit with an already given partition as a partitioned circuit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D I S C 2 0 2 1</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>41:6</head><p>Efficient Distribution of Quantum Circuits Migrations. A migration essentially represents the cat-entanglement operation to create a linked copy of a qubit in a partition other than its home-partition. We allow migrations of a qubit q i to occur only initially (i.e., t = 0) or right after a unary operation on q i . This is without loss of generality, because, since cat-entanglement commutes with CZ , we can move any migration to the most recent unary operation or t = 0. Also, disentanglement operations are implicit -done, if needed, immediately before any unary operations, so are not represented.</p><p>&#9654; Definition 3.3 (Migration). Given a set of qubits Q, abstracted circuit C and partition P , a migration is a triple (q i , p k , j) where q i &#8712; Q, p k &#8712; P such that p k &#824; = &#960;(q i ) (i.e. not the home-partition of q i ), and either j = 0 or (q i , j) &#8712; C, i.e., j is the index of an unary gate on q i in C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Coverage of Gate by Migration(s).</head><p>Consider a binary gate (q i , q j , t), where &#960;(q i ) &#824; = &#960;(q j ) i.e., the operands are in different partitions. We refer to such gates as non-local gates. To execute such a non-local binary gate, we can: (i) Migrate q i to the home partition of q j , (ii) Migrate q j to the home partition of q i , or (iii) Migrate both q i and q j to a third partition where the gate operation can then be performed. In each of the above cases, since unary gates do not commute with migrations, we need to ensure that there are no unary gate operations on the migrated qubit between the migration and the binary gate operation being covered. This notion is formalized below, in terms of coverage of a gate via migrations.</p><p>&#9654; Definition 3.4 (Coverage). For a given partition &#960;, a binary gate (q i , q j , t) can be covered by a single or a pair of migrations in one of the following ways. 1. By (q i , &#960;(q j ), t &#8242; ), if t &#8242; &lt; t and there are no unary gates on q i between t &#8242; and t; or 2. By (q j , &#960;(q i ), t &#8242; ), if t &#8242; &lt; t and there are no unary gates on q j between t &#8242; and t; or 3. By the pair {(q i , p k , t &#8242; ), (q j , p k , t &#8242;&#8242; )}, if t &#8242; , t &#8242;&#8242; &lt; t and there are no unary gates on q i between t &#8242; and t and on q j between t &#8242;&#8242; and t. Note that the third condition above allows a gate (q i , q j , t) to be executed at a partition other than the home-partitions of either of the operand qubits (i.e., by migrating both the operand qubits to a third partition. DQC Problem Formulation. Given a quantum circuit, the number k of distributed QCs, and a partitioning factor &#957; &#8805; 1, the DQC problem is to (k, &#957;)-partition the qubits into the distributed QCs and determine the set of migrations that cover the non-local gates of the partitioned circuit, such that the total number of migrations required is minimized. The DQC problem is known to be NP-hard by a reduction from the hypergraph min-cut problem <ref type="bibr">[3]</ref>. DQC Example. Consider the running example of our quantum circuit from Fig. <ref type="figure">1</ref>. Fig. <ref type="figure">2a</ref> shows the distribution of qubits into three partitions: p 1 = {q 1 }, p 2 = {q 2 , q 3 } and p 3 = {q 4 , q 5 }. The non-local binary gates are represented by red vertical lines, and the local ones by green lines. Let t i (i &#8805; 1) be the timestamp associated with the i-th unary gate. The set of all possible migrations for this partitioned circuit is: {(q 1 , p 2 , 0), (q 1 , p 3 , 0), (q 2 , p 1 , 0), (q 2 , p 3 , 0), (q 3 , p 1 , 0), (q 3 , p 3 , 0), (q 4 , p 1 , 0), (q 4 , p 2 , 0), (q 5 , p 1 , 0), (q 5 , p 2 , 0), (q 1 , p 2 , t 1 ), (q 1 , p 3 , t 1 ), (q 3 , p 1 , t 2 ), (q 3 , p 3 , t 2 ), (q 2 , p 1 , t 3 ), (q 2 , p 3 , t 3 ), (q 5 , p 1 , t 4 ), (q 5 , p 2 , t 4 )}. A example of a set of migrations that can cover all the non-local gates is: {(q 1 , p 2 , 0), (q 4 , p 2 , 0), (q 5 , p 2 , 0), (q 1 , p 2 , t 1 ), (q 5 , p 2 , t 4 )}. This set of migrations execute all the non-local gates in partition p 2 .  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Two-Step DQC Algorithm</head><p>In a recent work, Martinez and Heunen <ref type="bibr">[3]</ref> reduce the DQC problem to a balanced k-min-cut of an appropriate hypergraph, and thus, use a hypergraph-partitioning heuristic to solve the problem. Here, we propose a use natural two-step procedure -in effect, reducing the DQC problem to a sequence of two simpler problems; in fact, we show that the second step can be solved near-optimally and even optimally in a special setting. In particular, our proposed algorithm consists of the following two steps: (i) First, we compute a "good" (k, &#957;)-partition of the qubits for distribution over the k QCs; (ii) Then, we determine a minimal set of migrations required to cover all the non-local gates of the partitioned circuit.</p><p>Step 1: Partitioning Qubits into QCs/Partitions. To compute a partition that will intuitively yield a small set of migrations in the second step, we compute the partitioning of the qubits by solving a balanced k-min-cut problem over an edge-weighted graph over qubits as vertices. In particular, given a circuit C, we define a weighted graph G = (V, E) where V is the set of qubits in C and E = {(q i , q j )|(q i , q j , t) &#8712; C}. The weight on each edge intuitively represents the cost of keeping the qubits in different partitions; at its simplest, the weight of an edge (q i , q j ) can be the number of binary gates of the type (q i , q j , _) in C for some t. A more refined weight function is defined at the end of this section.</p><p>In essence, in the first step, we partition the qubits by computing a balanced min-k-cut of the of the above weighted graph. The balanced min-k-cut problem is NP-hard even for k = 2, with no known approximation algorithms. Therefore, we consider an approximate version of the problem, viz., (k, &#957;)-balanced graph partitioning for &#957; &gt; 1 as defined in Def. 3.2. There are several works that address the (k, &#957;)-balanced min-cut problem.</p><p>Notably, <ref type="bibr">[2]</ref> gives a O(log 2 n)-approximation algorithm for &#957; = 1 + &#1013; (&#1013; is arbitrarily small) when k is a constant. In our evaluation, we use a third-party graph partitioning algorithm called KaHyPar <ref type="bibr">[1]</ref>.</p><p>Step 2: Selecting Migrations. Given a partitioned circuit, we then find the smallest set of migrations that cover all the non-local gates. This minimization problem can be solved optimally in polynomial time when restrict each gate to be executed only in the home-partition of one of its operands ( &#167; 4). For the general case, we provide an O(log N )-approximation algorithm (where N is the number of binary/CZ gates in the circuit C).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D I S C 2 0 2 1</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>41:8</head><p>Efficient Distribution of Quantum Circuits Refined Weight Function in Step 1. The total number of CZ gates between two qubits q i and q j , originally proposed as the weights of edges in Step 1 above, is actually just an upper bound on the number of migrations needed to cover those gates. A more accurate cost will be the actual/optimal number of migrations needed to cover (q i , q j , _) gates if q i and q j were in separate partitions. We use the above intuition to construct an induced circuit C &#8242; (i,j)</p><p>consisting only of gates on q i and q j . We distribute C &#8242; (i,j) such that the two qubits q i and q j are in different partitions and compute the minimum number of migrations needed to cover all the binary gates in C &#8242; (i,j) . The minimum number of migrations in C &#8242; (i,j) can be computed using the optimal MS-HC algorithm described in &#167;4, and we use this minimum value as the weight on the edge (q i , q j ) of the weighted graph used in Step 1. See Fig. <ref type="figure">2b</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Optimal Selection of Migrations under Home-Coverage</head><p>In this section, given a quantum circuit and a partitioning of its qubits to computers/partitions, we design an efficient algorithm to compute the optimal set of migrations needed to execute the partitioned circuit. The algorithm developed in this section is used as a subroutine in the Two-Step DQC Algorithm presented in &#167;3.</p><p>Here, we will assume that a gate can only be executed at the home-partition of one of its operand-qubits; this restriction helps minimize execution memory needed at individual partitions, and in addition, simplifies the migration-selection problem sufficiently that we can design an optimal algorithm for the problem of selection of migrations to cover all gates. We will relax this assumption in the next section where we describe an algorithm for the migration-selection problem in the more general setting.</p><p>We start with formally incorporating the above assumption in our problem formulation by restricting the original definition (Def. 3.4) of coverage of gates by migrations.</p><p>Home Coverage of Gates by Migrations. Given a quantum circuit, recall the definition (Def. 3.4) of coverage of gates by migrations, for a given partition &#960;. Therein, we defined that a gate can be covered by migration(s) in many ways. To impose the condition that each gate be executed only at a home-partition of its operand qubits, we restrict the definition of coverage by allowing a migration to cover a gate only if it migrates one of the operand qubits to the home-partition of the other qubits. We define the notion of home-coverage below.</p><p>&#9654; Definition 4.1 (Home-Coverage). For a given partition &#960;, a binary gate gate (q i , q j , t) can be home-covered by a migration in one of the following ways. 1. By (q i , &#960;(q j ), t &#8242; ), if t &#8242; &lt; t and there are no unary gates on q i between t &#8242; and t; or 2. By (q j , &#960;(q i ), t &#8242; ), if t &#8242; &lt; t and there are no unary gates on q j between t &#8242; and t.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Memory Conservation by Home-Coverage.</head><p>Restricting execution of gates at home partitions of its operands is one way to implicitly minimize/constrain the size of execution memory needed. For example, consider a circuit with n qubits {q 1 , q 2 , . . . , q m } , no unary gates, and n -1 binary CZ-gates of the type (q i , q i+1 , t i ) for some t i 's. Suppose that each qubit is distributed to a different partition. In this case, a potential set of migrations to cover the gates could be to migrate all qubits to a single partition and execute all the gates there. However, this would require n -1 units of additional memory to hold all the migrated qubits. In contrast, the alternate solution restricted by home-coverage would migrate qubit q i to &#960;(q i+1 ) for each i, requiring only one additional memory at each partition. In addition, as mentioned above, restricting the coverage of gates to home-coverage allows design of an optimal algorithm for selection of migrations, and thus, an efficient overall solution for the DQC problem as observed in our evaluations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MS-HC Problem. Migration Selection under Home-Coverage. Given a quantum circuit C</head><p>and a partitioning of C's qubits to given partitions/QCs, the MS-HC problem is to determine the smallest set of migrations that can execute the gates of the partitioned circuit in the home partitions of one of its operands. More formally, let Q be the set of qubits in a given circuit C, P be the set of given partitions, &#960; be the partitioning function, and M be the set of all potential migrations (as per Def. 3.3). The MS-HC problem is to select a smallest set of migrations M such that each gate in C is home-covered by some migration in M .</p><p>We start with illustrating the MS-HC problem via an example. Then, we design an optimal algorithm for the MS-HC problem. Recall our running example of a quantum circuit from Example 2a. As shown in Fig. <ref type="figure">3</ref>, one solution to the MS-HC problem for the given partition of the qubits is {(q 1 , p 2 , 0), (q 4 , p 2 , 0), (q 1 , p 2 , t 1 ), (q 1 , p 3 , t 1 ), (q 3 , p 3 , t 2 )}, where as before t i is the timestamp of the i-th unary gate. This solution can be easily verified as optimal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Optimal MDS-HC Algorithm</head><p>We start with a high-level description and intuition of the proposed optimal algorithm, followed by the proofs of two key claims needed to design the algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>High-level.</head><p>The MS-HC problem can be easily formulated as the well-known set-cover problem. However, MS-HC is in fact a very special case of the set-cover, allowing us to design a polynomial-time optimal algorithm. In particular, first, we show that each gate (element) is home-covered by exactly two migrations (sets), and thus the MS-HC problem can be formulated as a vertex-cover problem in an appropriate graph G HC , where migrations are represented as nodes and gates as edges. Second, we show that the graph G HC is actually a bipartite graph, by showing that it has no odd length cycles. Thus, the MS-HC problem reduces to the vertex-cover problem in the bipartite graph G HC , and thus, solvable optimally in polynomial time.</p><p>Each Gate is Home-Covered by Two Migrations. Recall from Def. 3.3 that a migration is a triplet (q, P, t), where the qubit q is being migrated to partition P at time t, and we only allow migrations with a timestamp t of either 0 or corresponding to some unary gate on qubit q. The below lemma shows that each gate is home-covered by exactly two such migrations.</p><p>D I S C 2 0 2 1 41:10 Efficient Distribution of Quantum Circuits &#9654; Lemma 4.2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>. For a given partitioned quantum circuit, a binary gate in the circuit is home-covered by exactly two migrations.</head><p>It is easy to see that the above lemma holds for the partitioned circuit of our running example. See Figure <ref type="figure">5</ref> in Appendix A.</p><p>Bipartite Graph over Migrations. Based on the intuition from Fig. <ref type="figure">5</ref>, we can construct a graph G HC (V = M, E = T ) where T is the set of all non-local gates (i.e., gates with different home-partitions of the operands) in the given partitioned circuit and M is the set of all migrations as per Def. 3.3 (i.e., with a timestamp of 0 or a unary gate on the migrated qubit). More formally, an edge/migration (q i , q j , t) connects the unique vertices (q i , &#960;(q j ), t i ) and (q j , &#960;(q i ), t j ) where t i and t j are as defined in Lemma 4.2. Based on this graph representation of home-coverage of gates by migrations, we can solve the MS-HC problem by computing the optimal vertex-cover of edges in G HC , since a vertex cover of G HC corresponds exactly to a set of migrations that home-cover all the given gates. Below, we prove that, the G HC graph is actually a bipartite graph -which allows us to compute the optimal vertex-cover of G HC in polynomial time.</p><p>&#9654; Lemma 4.3. For a given partitioned circuit, we claim that the G HC graph, as defined above, has no odd-length cycles, and is thus a bipartite graph.</p><p>Optimal Algorithm for MS-HC. Based on the above two lemmas, we can now solve the MS-HC problem by computing an optimal vertex cover of the bipartite graph G HC for a given partitioned circuit. The pseudo-code of the overall algorithm is given in Appendix A.4.</p><p>&#9654; Theorem 4.4. Given a partitioned quantum circuit, Algorithm 2 returns an optimal set of migrations that home-cover all the non-local gates of the given partitioned circuit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Near-Optimal Selection of Migrations under General Coverage</head><p>In this section, we relax the assumption made in the previous section, i.e., allow execution of non-local gates of a partitioned circuit in partitions different from any of the home-partitions of its operands. We do this by using the original notion of coverage defined in Def. 3.4, where a non-local gate may also be executed by migrating both of its operand qubits to a third partition. In such a general setting, for a given partitioned circuit, we consider the problem of selecting a minimum number of migrations to cover all non-local gates, and present a polynomial-time approximation algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MS-GC Problem: Selection of Migrations to Cover</head><p>Gates. Given a partitioned quantum circuit, the MS-GC problem is to determine a set a migrations of minimum size that covers all the non-local binary gates of the given partitioned circuit.</p><p>The above MS-GC problem generalizes the set-cover problem in some sense, as it allows an element to be covered by a pair of sets (i.e., an element is covered only if both the sets are selected). However, the MS-GC problem also has a special structure, which makes proving its intractability non-trivial; however, we conjecture the MS-GC problem to be NP-hard. In either case, since the objective function is not submodular, the simple greedy algorithm that iteratively selects the migration with most "benefit" does not offer a performance guarantee. Below, we will design a polynomial-time approximation algorithm for the MS-GC problem. We start with a definition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>R. G Sundaram, H. Gupta, and C. R. Ramakrishnan</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>41:11</head><p>&#9654; Definition 5.1 (Same-Partition Set (of Migrations)). A set of migrations M is called a same-partition set if for every pair of migrations (q 1 , p 1 , t 1 ) and (q 2 , p 2 , t 2 ) in M , we have p 1 = p 2 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Algorithm G*: Approximation Algorithm for the MS-GC Problem</head><p>Our proposed algorithm G * is a greedy algorithm that picks, at each iteration, a samepartition set of migrations with highest benefit-density (as defined below), until all the non-local gates of the given partitioned circuit are covered by the picked set of migrations. Note that, at each stage, the G * algorithm may pick more than one migration. We will later develop a procedure (Algorithm 1) to select a same-partition set with highest benefit-density, which form a single iteration of the G * algorithm. Below, we first show that G * will deliver an O(log N )-approximate solution to the MS-GC problem, where N is the total number of non-local gates in the given partitioned circuit. We now define the notion of benefit and benefit-density of a set of migrations.</p><p>&#9654; Definition 5.2 (Benefit; Benefit-Density). Consider a stage/iteration in the G * algorithm, where a set of migrations have already been selected. For a set X of migrations, we define its benefit B(X) as the number of (non-local) gates covered by X that have not been covered yet by the set of migrations already selected in previous iterations. We define the benefit-density of X as B(X)/|X|). &#9654; Theorem 5.3. For the MS-GC problem, the G * algorithm delivers a solution with at most |O| ln N number of migrations, where O is the optimal solution and N is the total number of non-local gates in the given partitioned circuit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Dividing a Set into Same-Partition Subsets</head><p>We now show that a set of migrations can be divided into a disjoint collection of samepartition subsets such that the benefit of the original set is at most the summation of the benefit of the subsets. We start with a formal definition and on observation.</p><p>&#9654; Definition 5.4 (Benefit Graph). The benefit-graph, denoted by B M (V, E), for a set of migrations M at a certain stage of G * is defined as follows. The set of vertices V (B M ) is the set of migrations in M , and each node/migration m in V (B M ) is assigned a node-weight of B(m). Consider a pair of migrations m 1 , m 2 in M . The pair of vertices (m 1 , m 2 ) are connected by an edge in B M if and only if B({m 1 , m 2 }) is non-zero, in which case we also assign a weight of B({m 1 , m 2 }) to the edge (m 1 , m 2 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#9654; Lemma 5.5. At any stage of the</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">G* Iteration: Selecting an Optimal Same-Partition Set</head><p>We now design an algorithm to select the same-partition set of migrations with highest benefit-density, at a given stage/iteration of the G * algorithm. Let us consider a stage of the G * algorithm where a set of migrations have already been selected. Our goal is to pick a set M of same-partition migrations from the remaining migrations such that B(M)/|M| is maximized, where B(M) is the benefit of M at the given stage of the G * algorithm. Our overall algorithm of selecting an optimal M consists of the following steps. D I S C 2 0 2 1 41:12 Efficient Distribution of Quantum Circuits 1. For each given partition p i , let M i be the set of remaining (i.e., not yet selected by G * ) migrations that migrate a qubit to p i . Note that each M i is a maximal set of same-partition migrations. 2. For each M i , we find a set M i &#8838; M i with highest benefit-density, as described later.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.</head><p>From the above M i 's, We pick the M i with highest B(M i )/|M i | as the optimal M. It is easy to see that the above returns an optimal same-partition set of migrations, as any same-partition set of migrations must be a subset of M i for some i. Selecting a Subset of M i with Highest Benefit-Density. We now discuss how to select an optimal subset within a given M i . We start with a lemma, which will help reduce the problem to that of selecting an optimal induced subgraph in the benefit graph of M i .</p><p>&#9654; Lemma 5.6. Let M be a same-partition set of migrations. Then,</p><p>The above lemma implies that finding the best subset within a given M i is equivalent to finding an induced subgraph H in the benefit-graph of M i that has the highest density of weight, i.e., maximum value of ( e&#8712;E(H) w(e) + v&#8712;V (H) w(v))/|V (H)|. This is a weighted generalization of the well-studied densest subgraph problem which can be solved optimally in polynomial time. We discuss this below.</p><p>Weighted Densest Subgraph Problem. Given an unweighted graph G, the densest subgraph problem is to select an induced subgraph H in G such that |E(H)|/|V (H)| is maximized. This problem can be solved optimally in polynomial time <ref type="bibr">[6]</ref>. We are interested in the weighted generalization of this problem referred to as the weighted densest subgraph problem, wherein vertices and edges have (positive) weights associated, and the goal is to select an induced subgraph H with maximum ( e&#8712;E(H) w(e) + v&#8712;V (H) w(v))/|V (H)|. The simple LP-based optimal algorithm as well as the more time-efficient 2-approximation algorithm for the unweighted version given in <ref type="bibr">[6]</ref> can be both easily generalized to our weighted version of the problem. We briefly give both the algorithms, and defer the performance guarantee proofs (as they are simple generalizations of the proofs for the unweighted case in <ref type="bibr">[6]</ref>).</p><p>LP-based Optimal Algorithm. The weighted densest subgraph problem can be easily represented as an ILP; the corresponding LP is as follows. Here, w i is the weight of vertex i, and w ij is the weight of edge (i, j) in the given graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Maximize</head><p>The optimal solution is obtained by: (i) Solving the above fractional LP; let the solution be {y i , x ij }. (ii) Picking a threshold &#964; appropriately and setting y i = &#8970;y i /&#964; &#8971;. In (ii), we pick a threshold the LP objective; we do so by exhaustively trying all y i values as the threshold. It can be shown by a simple generalization of the proof for the unweighted version <ref type="bibr">[6]</ref> that the above process returns an optimal solution for the weighted densest problem.</p><p>2-Approximate Greedy Algorithm. A simple greedy algorithm to solve the weighted densest subgraph problem is to iteratively remove a vertex with the lowest sum of node weight and weight of incident edges, keep track of the resulting n subgraphs over n interations, and picking the best among them. This greedy algorithm can be easily shown to be 2-approximate, by a simple generalization of the proof for the unweighted version <ref type="bibr">[6]</ref>.</p><p>Overall G* Iteration. Algorithm 1 gives the pseudo-code of the overall algorithm for a single G * iteration, which selects the same-partition set with highest benefit-density.</p><p>Algorithm 1 Single Iteration of G * Algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Input:</head><p>The set of remaining migrations M , at a certain stage of G * Algorithm. Output: A set of same-partition migrations M in M , with the highest benefit-density.</p><p>1: Let M 1 , M 2 , . . . , M k be the disjoint and maximal same-partition subsets of M corresponding to each of the partitions p 1 , p 2 , . . . , p k . /* For each M i , find a subset of M i with highest benefit-density */. 2: for all i &#8804; k do 3:</p><p>Construct the benefit-graph B(M i ) of M i .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4:</head><p>Find the weighted densest subgraph H i in B(M i ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5:</head><p>M i &#8592; Vertices in H i 6: end for 7: M &#8592; The M i with highest benefit-density. 8: return M.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Evaluation</head><p>We now evaluate our algorithms empirically over random quantum circuits. The goal of our empirical study is to evaluate the performance of proposed techniques in terms of number of migrations incurred, i.e., the number of cat-entanglements, and compare them with the prior approach from <ref type="bibr">[3]</ref>.</p><p>Algorithms Compared. We compare our techniques with the algorithm proposed in <ref type="bibr">[3]</ref>, which is based on computing a min-cut in an appropriate hypergraph; we refer to this algorithm as Martinez-19. Our algorithms are all based on the Two-Step Algorithm of &#167;3, with just different subroutines (from &#167;4-5) for the second step. For the first-step of partitioning the qubits using a simple weighted graph, we use a third-party solver KaHyPar <ref type="bibr">[1]</ref> and the refined weighted scheme discussed in &#167;4. For the second-step, we use three different schemes and refer to the overall DQC algorithms as follows: (i) Home-Cover algorithm uses Algorithm 2 in the second step, (ii) G * _LP, G * _Approx, G * _Simple algorithms use Algorithm G * in the second step, with the LP-based, 2-approximation greedy, and simple greedy algorithms respectively for solving the weighted densest subgraph problem. The simple greedy algorithm to compute the weighted densest subgraph simply removes one vertex at a time to improve the weighted-density of the remaining graph, and stops when the remaining graph's weighted density can't be improved by removing any vertex. All our algorithms use the refined weights described in &#167;3.2; usage of refined weights yielded a performance advantage of around 6% compared to using the simple weights.</p><p>Random Circuit Inputs and Parameter Values. We run our simulations over randomly generate quantum circuits. To generate quantum circuit instances, we vary the following parameters: number of qubits, total number of gates per qubit, fraction of gates that are binary (CZ) gates, and number of given partitions. We use an imbalance-factor &#957; of 1.1 for all the experiments. In the below plots, we vary one of the parameters and fix the remaining three to their default values. The default value for number of qubits is 50, total number of gates per qubit is 50, and number of partitions is 10. For the fraction of binary gates, we use two default values: 50% and 80%, as this parameter has a strong impact on performance of algorithms and in practice, the fraction of binary gates can be high. <ref type="foot">2</ref> Each data point in the below experiments is obtained from an average of 5 different random instances.</p><p>Evaluation Results for Varying Parameters. We present our results in Figures <ref type="figure">4b-4f</ref>.</p><p>In general, we observe that all the three G * -based algorithms performing similarly and significantly outperforming the Martinez-19, across all our experiments. In particular, the G * -based algorithms perform up to 90% better (i.e., incurring merely 20% of the migrations/cat-entanglements used by Martinez-19; see Fig. <ref type="figure">4a</ref>). The Home-Cover algorithm, which implicitly conserves execution memory usage at any single partition, also performs up to 80% better than Martinez-19. Among the G * -based algorithms, surprisingly the G * _Simple performs slightly better than the other two; note that, this does not contradict the optimality of LP-based algorithm for the densest weighted subgraph, since densest weighted subgraph solution is only used within an iteration of the G * algorithm.</p><p>Figure <ref type="figure">4a</ref> plots results for varying fraction of binary (CZ) gates in the circuit. We observe that the performance of all our algorithms, unlike Martinez-19, improves significantly with increasing fraction of binary gates; this can be attributed to the fact that higher fraction increases the number of gates covered by a single migration, and our algorithms are able to take better advantage of it. Figures <ref type="figure">4b</ref> and <ref type="figure">4c</ref> show the performance of various algorithms for increasing number of qubits, with 50% and 80% CZ gates respectively, while using default values for other parameters. The performance of our algorithms in comparison to Martinez-19 improves with increasing number of qubits. Figures <ref type="figure">4d</ref> and <ref type="figure">4e</ref> plot results for varying number of partitions. Here, we observe that performance of Home-Cover wrt G * -based algorithms worsens with increase in partitions; this may be due to the fact that Home-Cover's restriction of home-partition execution becomes more pronounced with increase in number of partitions. Finally, Figures <ref type="figure">4f</ref> and <ref type="figure">4g</ref> plot the performance of algorithms for varying number of gates per qubit. We discuss execution memory usage in Appendix A.8.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>In this paper we considered the problem to distribute a quantum circuit over a network of QCs, such that the communication cost minimized. We presented a two-step heuristic that outperforms prior techniques. In future work, we plan to look at various generalizations of the problem, e.g., for nodes with non-uniform capacities, links with non-uniform communication costs, constraining the execution memory at each partition, allowing multiple modes of communications (e.g., teleportation as well as cat-entanglement), allowing for unary gates to be executed at any partition (i.e., allowing for dynamic home-partition of a qubit). In addition, we plan to investigate better heuristics for the first step of our algorithm.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.4 Optimal MS-HC Algorithm Psuedo-Code Algorithm 2</head><p>Input: A partitioned quantum circuit. Let T be the set of non-local gates. Output: An optimal set of migrations that home-covers all the non-local gates T . Proof. Let M 1 , M 2 , . . . , M k be the sets of migrations picked by the Greedy Algorithm over k iterations. Let a i is the number of gates covered by M i , that weren't previous covered by M 1 to M i sets. Thus, the overall greedy solution covers (a 1 + a 2 . . . + a k ) gates, which is equal to N , the total number of gates in the circuit (as the greedy algorithm only terminates when all the gates have been covered). Let be optimal solution be O; here, O is a set of migrations that cover all the N gates in the circuit.</p><p>Consider a stage when the Greedy Algorithm has already selected M 1 , M 2 , . . . , M i-1 sets of migrations. We can observe the following.</p><p>The total number of gates already covered by the greedy sets M 1 to M i-1 is i-1 j=1 a j . Thus, the number of gates covered by the optimal solution that have not been yet covered by the greedy sets M 1 to M i-1 is at least N -i-1 j=1 a j . This follows from "monotonicity" of the coverage function; in particular, from the fact that the M 1 to M i-1 sets and the optimal solution together still cover N gates. By Lemma 5.5, the optimal set O can be divided into disjoint same-partition subsets of migrations </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>E.g., the Quantum Fourier Transform (QFT) circuit has has O(n 2 ) controlled-phase gates and O(n) other gates; since the controlled-phase gates can be treated like CZ gates, the fraction of binary gates in a QFT circuit can be arbitrarily high. Also, when we convert binary gates in an arbitrary circuit to CZ gates, the fraction of binary gates is expected to be 50% as a binary gate yields a CZ gates flanked by unary gates (and long runs of unary gates can be considered as a single unary gate, in our context).</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="19" xml:id="foot_1"><p>A. Yimsiriwattana and S. J. Lomonaco Jr. Generalized GHZ states and distributed quantum computing. AMS Cont. Math., 381(131),</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2005" xml:id="foot_2"><p/></note>
		</body>
		</text>
</TEI>
