Brought to you by:
Paper

Automated window-based partitioning of quantum circuits

, , and

Published 4 January 2021 © 2021 IOP Publishing Ltd
, , Citation Eesa Nikahd et al 2021 Phys. Scr. 96 035102 DOI 10.1088/1402-4896/abd57c

1402-4896/96/3/035102

Abstract

Developing a scalable quantum computer as a single processing unit is challenging due to technology limitations. A solution to deal with this challenge is distributed quantum computing where several distant quantum processing units are used to perform the computation. The main design issue of this approach is costly communication between the processing units. Focused on this issue, in this paper, an efficient partitioning approach is proposed which combines both gate and qubit teleportation concepts in an efficient manner to minimize the communication. Experimental results show the proposed approach on average reduces the communication cost by about 29.5% in comparison with the best approaches in the literature.

Export citation and abstract BibTeX RIS

1. Introduction

While feature size in VLSI technology enters 7 nm and beyond, quantum effects should be handled [1]. However, managing quantum effects in a controlled manner may also be utilized as a feature. Feynman originally suggested [2] using these effects to efficiently solve problems that are intractable on classical computers and called the device a quantum computer. The circuit model of quantum computation is similar to the circuit model consisting of a discrete set of gates found in conventional computing. Sequences of one- and two-qubit operations constitute the fundamental logic for evolving a quantum state. However, there are some unique characteristics for quantum computing such as superposition, entanglement, and the inability to copy arbitrary quantum states [3].

Although many challenges hinder the realization of a practical quantum system, we believe that the design space for a future quantum computer should be explored now that it helps to organize the plethora of proposed quantum technologies, fault tolerance methods, and other realization choices. However, when both hardware and architecture parameters are considered, the design space grows significantly. Therefore, to manage the design space complexity, a CAD flow is needed to streamline the design process and to enable us to design large quantum circuits.

A homogeneous organization may be acceptable for a small-scale quantum computer but to build a practical full-scale quantum system, distributed computation with the necessary communications is required [4]. The distributed quantum architectures [48] are organized as quantum processing units (QPU) connected by some interfaces such as photonic networks. In such architectures, the communications have considerably more latency and are more error-prone than local operations [4, 5]. For example, in the MUSIQC hardware proposed in [5], the communication latency between QPUs is in the order of milliseconds while other operations are in the order of microseconds. Thus, minimizing the communications between QPUs has a significant effect on the final latency. The communications between QPUs can dramatically decrease if qubits of a circuit are effectively partitioned into QPUs. Focused on this issue, in this paper, we propose a partitioning approach based on a windowing strategy to distribute qubits among QPUs in a manner that the communication cost is minimized.

The remainder of this paper is organized as follows: section 2 contains some basic concepts in the field. An overview of the prior work is presented in section 3. Section 4 discusses the proposed approach in detail. Experimental results are discussed in the section 5, and finally section 6 concludes the paper.

2. Background

In this section, some terminologies and concepts are explained that would help to give a better understanding of the proposed approach.

Using teleportation to transmit data qubits from a source to a destination is known as data teleportation or teledata [9]. This procedure can be summarized in figure 1. Performing a multi-qubit gate using teleportation, without placing the target qubits next to each other, is known as gate teleportation or telegate [10]. Figure 2 shows the circuits to perform CNOT and CZ gates remotely, based on the telegate mechanism.

Figure 1.

Figure 1. Teleportation circuit to transmit qubit q from partition Pi to Pj using a single EPR pair shared between the partitions Pi and Pj .

Standard image High-resolution image
Figure 2.

Figure 2. Telegate-based application of a) CNOT and b) CZ gates on qubits q1 and q2 using an EPR pair shared between partitions Pi and Pj .

Standard image High-resolution image

The teledata and telegate concepts lead to two partitioning approaches namely gate partitioning and qubit partitioning, respectively. In the gate partitioning, a quantum circuit is partitioned according to its gates. In this approach, it is decided to which partition (QPU) a gate must be assigned. If the qubits of that gate are not in the chosen partition, they are transmitted into that using teledata. On the other hand, the qubit partitioning approach partitions the qubits of a circuit and decides where each qubit should be placed. For applying a multi-qubit gate, if the involved qubits are in different partitions, the gate will be applied remotely via telegate; otherwise, the gate can be performed locally.

3. Related work

Quantum circuit design flow like its classical counterpart can be partitioned into two main processes: synthesis and physical design. The physical design process maps the gate-level netlist generated from the synthesis process onto a physical layout. Several studies have been done on automation of different steps of the physical design process. Some researchers [6, 7, 1116] worked on the entire physical design flow and proposed techniques for each of its step while others [1720] proposed some techniques for scheduling of a quantum circuit on a layout. Mohammadzadeh et al [2123] introduced the physical synthesis concept for quantum circuits and proposed some practical physical synthesis techniques [21, 2325]. Since the main focus of this paper is on the partitioning step, in the rest of this section, the partitioning techniques are reviewed in more detail.

Squash 2 [26] partitions the quantum circuits based on the gate partitioning approach by utilizing METIS [27] as the partitioning tool. Moghadam et al [28] apply a min-cut placement-aware partitioning approach [29] to divide a quantum dataflow graph of a circuit into smaller manageable parts. Wang et al [30] modified a graph partitioning algorithm presented in [31] to find the minimum cut of the qubit interaction graph. Ahsan et al [7, 32] used an efficient graph-theoretic algorithm [33] to assign qubits to QPUs. They first generate the adjacency matrix P of an N-qubit circuit where P[i][j] is the total number of interactions between qubits qi and qj . Then P is converted into its corresponding Laplacian matrix as below:

Eigenvalues of L are computed and the eigenvector V2 corresponding to the second smallest eigenvalue is selected. Sorting V2 can determine the best order of qubits in a line in such a way that the weighted sum of the distances between the qubits is minimized. As the last step, this line of qubits is broken into some partitions and assigned to the QPUs.

Mohammadzadeh and Sargaran [6] proposed SAQIP architecture and used the multilevel k-way hypergraph partitioning algorithm introduced in [34] to partition the qubits into QPUs. In [35], the authors call METIS [27] iteratively to separate the qubits and place them on a 2D nearest-neighbor architecture. Zomorodi-Moghadam et al [36] proposed a gate partitioning procedure to minimize the number of teleportation operations. In that work, an additional exhaustive search is applied to decide how each two-qubit quantum gate should be implemented. This increases the runtime exponentially in the number of partitions and gates, making it futile in practice. However, in a recent work, they reduced the complexity of their method by proposing a genetic algorithm to solve the partitioning problem more efficiently [37].

There are some approaches proposed for quantum physical design automation on single-processor but topologically-constrained architectures. Childs et al [38] used the token swapping framework [39] and a 4-approximation algorithm [40] to insert a minimal sequence of SWAP gates into the circuit and transform an input quantum circuit to a hardware-compliant one. Chakrabarti et al [41] proposed a balanced graph partitioning technique to find global ordering of qubit lines to achieve the Linear-Nearest-Neighbor architecture with minimum number of SWAP gates by using pmetis [42], an existing multilevel graph partitioning tool. Minimum linear arrangement problem [43] employed in [44] tries to insert minimum number of SWAP gates in different parts of an interaction graph. A novel reverse traversal technique was proposed in [45] to choose the initial mapping with the consideration of the whole circuit. It takes the following gates and previous mappings into account to reduce the overhead of 2-qubit gates and movements. The authors of [46] proposed an efficient heuristic method for logical to physical qubit mapping for linear devices. This has been realized by transforming the mapping problem into an undirected graphical representation and then has implemented spectral graph theory-based approach for placing logical qubits. All these approaches focus on mapping a given circuit on a topologically-constrained architectures while our architecture is a distributed one with less constraints.

4. Our proposed approach: window-based quantum circuit partitioning (WQCP)

The main drawback of qubit partitioning approaches is that they convert a circuit into an untimed qubit interaction graph and try to partition it. Although this assignment is done in a manner that more interacting qubits are attempted to be assigned to the same partition, not using timing information in these approaches makes them inefficient. In other words, since qubits may interact in different parts of a circuit, the partitioning methods that ignore timing information do not generate good results. Therefore, a better solution is to partition the circuit based on the information in local connectivity patterns and to change partitions of qubits if the connectivity pattern of the qubits is changed while the circuit proceeds. On the other hand, the main drawback of gate partitioning methods is that they force the qubits of multi-qubit gates to be transported to the same partition to apply the gate. However, allowing remote application of multi-qubit gates can mitigate unnecessary forward and backward transfers.

For example, in figure 3, the goal is partitioning of the circuits into two parts, each consisting of three qubits. The optimal communication costs of the circuit in figure 3(a) using gate and qubit partitioning approaches are 2 and 4, respectively. On the other hand, those achieved for the circuit in figure 3(b) using gate and qubit partitioning approaches are 4 and 2, respectively. Therefore, gate partitioning generates a better result than qubit partitioning for the circuit of figure 3(a) while for the circuit of figure 3(b), qubit partitioning outperforms gate partitioning. Therefore, the superiority of a method over the other depends on the interaction pattern of the qubits of the circuit. Considering this observation, it seems that combining two partitioning approaches can improve the communication cost by mitigating the drawbacks of existing approaches.

Figure 3.

Figure 3. Gate partitioning (qubit partitioning) leads to less communication cost than qubit partitioning (gate partitioning) for the circuit a (b).

Standard image High-resolution image

Focusing on this issue, we propose a hybrid partitioning approach, called WQCP , which combines both telegate and teledata ideas in an efficient manner to minimize the communication cost. The pseudo-code of the proposed algorithm is given in algorithm 1. In the first step, single-qubit gates are removed from the circuit because single-qubit gates can be applied without any communication regardless of the partition the target qubit is assigned to. Then, the resulting circuit is levelized and a weighted window with the length of LW is moved along the circuit from the first level to the last one, level by level. Let LC , GL , and CL be the number of levels of the circuit, the set of gates in level L and the sub-circuit contained in the window of length LW beginning at L, respectively. For each level L, 1 ≤ LLC , CL is partitioned based on the qubit partitioning approach. This operation is denoted by subPartitioning(CL ). By this, the partition of each qubit is determined for applying the gates of GL and is denoted by PL . For each gate gGL , if its qubits are placed in different partitions, the gate will be applied remotely by telegate. Otherwise, g is applied locally. For two successive levels, L − 1 and L, if subPartitioning(CL ) changes the partition of one qubit with respect to its previous partition which is determined by subPartitioning(CL−1), that qubit is transported to the new partition using teledata. In the rest of this section, the subPartitioning algorithm is explained followed by an example.

Algorithm 1. WQCP

Input: A quantum circuit (Cin ), Window length (LW )
Output: The partitioned circuit, Total communication cost (Number of teledatas and telegates)
1: $C={rmvSingleQubitGates}({Cin});$ //Remove single-qubit gates from the circuit Cin
2: ${L}_{C}={levelize}(C);$ //Levelize the circuit C and return the total number of levels of C
3: ${nTD}=0;$ //Initialize the number of teledatas with zero
4: ${nTG}=0;$ //Initialize the number of telegates with zero
5: for all $L\in {Levels}=\{1,2,\ldots ,{L}_{C}\}$ do
6: ${C}_{L}={getWindow}(C,L,{L}_{w});$ //Get the sub-circuit surrounded by the window with length LW started from level L
7: ${P}_{L}={subPartitioning}({C}_{L});$ //Partition CL based on qubit partitioning approach
8: ${nTD}+\,={countTG}({P}_{L},{P}_{L-1});$ //Return the number of teledatas by comparing PL with ${P}_{L-1}$
9: ${G}_{L}={getGatesAtLevel}(L);$ //Return the gates of level L
10: ${nTG}+\,={countTD}({G}_{L},{P}_{L});$ //Return the number of gates of GL which must be applied remotely by telegate
11: end for

4.1. SubPartitioning(CL ) algorithm

The subPartitioning function implements a min-cut partitioning algorithm [27, 47, 48] which takes a sub-circuit CL as input and partitions its qubits. To do so, the sub-circuit CL is modeled using a weighted graph whose vertices are qubits and its edges between two vertices are weighted according to the number of gates applied to the corresponding qubits. The weights of the edges are calculated as follows. Interactions between qubits in different levels of CL have different importance in our approach. This is because subPartitioning(CL ) determines the partition of each qubit only for applying the gates of GL , and subPartitioning(CL+1) may change the location of qubits for the gates of the next level. Therefore, a weighted window is used to reflect this difference in the importance of interactions between the qubits in different levels. To this end, the weight of a window is defined as W = {wk }, where 1 ≤ kLW and wk represents the importance of the interactions between the qubits in the kth level of the sub-circuit CL . The weight of each edge between two vertices qi and qj , denoted by E(qi , qj ), is defined as the weighted sum of interactions between qubits qi and qj in CL . This weight can be formulated as:

Equation (1)

where ${g}_{{ij}}^{k}$ is zero if there is not any gate applied to qubits qi and qj in the kth level of the sub-circuit CL and otherwise it is equal to the number of needed EPR pairs to apply the gate by telegate.

In addition to the interactions of the qubits in CL , subPartitioning(CL ) should consider the previous partition of each qubit, which is determined by subPartitioning(CL−1). For doing so, a dummy vertex pi is added to the graph corresponding to each partition i and each qubit is connected to its previous partition vertex with weight wp , where wp represents how much a qubit tends to stay in its previous partition. We formulate subPartitioning(CL ) as an ILP problem. The input parameters and variables of the ILP model are given in table 1.

Table 1. Input parameters and variables of our ILP model.

TypeSymbolDescription
NUCC N Number of qubits in the sub-circuit CL
  M Total number of partitions
  capacity Maximum number of qubits which can be assigned to each partition simultaneously
  wp Determines how much qubits tend to stay in their previous partitions
  adjMat[N][N]An N × N matrix where adjMat[i][j] is equal to E(qi , qj ) according to equation (1)
  prevParts[M][N]An M × N binary matrix where prevParts[p][i] is equal to 1 if and only if the previous partition of qubit qi is partition p (Previous partition of a qubit is determined by subPartitioning(CL−1)).

Our model minimizes the following cost function:

It consists of two terms The first term is the weighted sum of cuts and the second term is the cost incurred by migrating qubits from their previous partitions.

The constraints of the ILP model are listed below:

  • Total number of qubits assigned to each partition must not exceed its capacity:
  • Each qubit must be assigned to only one partition:
  • If two qubits qi and qj are assigned to different partitions, cut[i][j] must be set to 1:
    Suppose that qubits qi and qj are assigned to different partitions pi and pj , respectively. For p = pi the left hand side of the above inequality is equal to 1 which forces cut[i][j] to be set to one. On the other hand, when qi and qj are in the same partitions, the left hand side of the inequality is zero for all partitions. In this case, the cost function forces cut[i][j] to be zero to minimize the cost.
  • If the partitioning algorithm changes the partition of a qubit qi , migrate[i] must be set to 1:
    Suppose that the partition of qubit qi has been changed from pprev and pi . Therefore, for p = pi the left hand side of the above inequality is equal to 1 which forces migrate[i] to be set to 1. On the other hand, when qi stays in its previous partition, the left hand side of the inequality is zero for all partitions. In this case, the cost function forces migrate[i] to be zero to minimize the cost.

4.2. An example

In this section, the proposed approach is explained by an example. Figure 4 shows a quantum circuit consisting of 6 qubits and 25 gates. Our algorithm partitions the circuit into two parts each containing three qubits. Let the window width be LW = 3, window weight be W = {w1, w2, w3} = {3, 2, 1}, and wp =2. The algorithm removes single-qubit gates from the circuit and then levelizes it in the first step, as shown in figure 5. During the next step, the window is laid on the circuit starting from the first level. Figures 6 and 7 show all steps of the algorithm. In the second column, the sub-circuit CL is depicted. The corresponding interaction graph of CL and PL , i.e., the output of subPartitioning(CL ), are shown in the third column. The last column contains the gates of level L (GL ), the qubits that should be teleported using teledata and the gates that should be applied remotely by telegate, respectively.

Figure 4.

Figure 4. An example circuit to partition into two parts.

Standard image High-resolution image
Figure 5.

Figure 5. The levelized circuit after removing single-qubit gates.

Standard image High-resolution image
Figure 6.

Figure 6. Steps 1 to 5 of our approach to partition the example circuit of 4.

Standard image High-resolution image
Figure 7.

Figure 7. Steps 6 to 9 of our approach to partition the example circuit of figure 4.

Standard image High-resolution image

For the graph of C1, the gate g1 in the first level generates an edge between vertices q2 and q3 with the weight of 3 (w1). Similarly, there are edges E(q4, q6) = 2 and E(q3, q6) = 1 corresponding to gates g4 and g6 at levels 2 and 3, respectively. It should be noted that there is no edge between qubit and partition vertices in the graph of C1 because no qubit has been assigned to any partition yet. subPartitioning(C1) partitions the qubits of C1 into two parts p1 = {q1, q2, q3} and p2 = {q4, q5, q6}, denoted by P1 = {p1, p1, p1, p2, p2, p2} in figures 6 and 7. P1 is the initial partitioning of qubits and thus no teledata is required. In the next step, it is determined how the gates of G1, i.e., g1 and g2 should be applied. Both g1 and g2 can be applied locally because their qubits are assigned to the same partition. For the next levels, each qubit vertex is connected to the vertex corresponding to its previous partition with weight 2 (wp ). In step 3, since q3 and q6 are assigned to different partitions, g6 must be applied remotely by telegate. In step 8, subPartitioning(C8) partitions the qubits of the circuit into p1 = {q1, q2, q5} and p2 = {q4, q3, q6} which in comparison with the previous partitioning P7, q3 and q5 are assigned to different partitions. Therefore, these qubits will be transported using two teledata operations.

Our hybrid approach needs 5 communication operations including 2 teledata operations and 3 telegate ones while the best solution achieved by qubit partitioning approach or gate partitioning approach requires 6 communication operations. This example shows the superiority of our hybrid approach over both qubit partitioning and gate partitioning approaches.

5. Experimental results

Our approach (WQCP) was implemented in C++ and CPLEX [49] was used as the ILP solver. It was run on a Core i7 CPU operating at 2.4 GHz with 8 GB of memory.

Single qubit gates and a two-qubit Clifford gate such as CNOT or CZ make a universal gate set for quantum computation. Therefore, the set CNOT, CZ and single qubit gates was chosen as the gate library. To evaluate the performance of WQCP, it was applied to some benchmark circuits from [50] (the first nine circuits in the tables), Revlib [51] (the circuits from 10 to 15), some quantum error-correction encoding circuits [52] (the circuits from 16 to 25), and n-qubit quantum Fourier transform circuits (QFT) [53] where n ∈ {16, 32, 64, 128, 256}.

These circuits may include some gates out of the gate library that are synthesized into the gates of the library based on the method proposed in [54].

The window length LW may potentially have a high effect on the result. Table 2 compares the communication counts obtained by WQCP for different window lengths where window weight W = {LW , LW − 1,..., 1} and wp = LW − 1. The number of qubits, the multi-qubit depth of each benchmark and the number of partitions are shown in the second column. The third column contains the type of teleportation, which can be teledata or telegate. The best obtained results are marked in bold. It is worth noting that the window weight W and wp were chosen experimentally and different results may be achieved by changing them.

Table 2. Experimental results (number of teleportations) obtained by WQCP for the benchmark circuits and different window lengths LW .

#Benchmark# qubitsTP type LW
  Depth        
  # parts 567891215
1 2of5-D1 6Telegate11163131303030
  98Teledata38301010101010
  2Total49464141 40 40 40
2 2-4dec 6Telegate9999989
  19Teledata6222444
  3Total15 11 11 11 131213
3 6sym 10Telegate5454698
  42Teledata1816141212108
  2Total232019 16 1819 16
4 9sym 12Telegate5172024292024
  71Teledata31202016161617
  3Total 36 37404045 36 41
5 Ham15-D3 15Telegate43405651595657
  177Teledata70634954434844
  4Total113103105105102104 101
6 Cycle17-3 20Telegate24625529158983702740
  8561Teledata2261221721181722165713801288
  3Total250724722409231123402082 2028
7 8bitadder 24Telegate11211926224641
  106Teledata87736558694248
  6Total9894 84 84 918889
8 Hwb50 56Telegate325386317287325293400
  4994Teledata131913801162123210571008936
  5Total16441766147915191382 1301 1336
9 Hwb100 107Telegate10756769821122597753673
  15 923Teledata3446315131172805229520291840
  7Total452138274099392728922782 2513
10 rd32_272 5Telegate2111111
  5Teledata4666666
  2Total 6 777777
11 ham7_106 7Telegate26262626262627
  38Teledata4444476
  4Total 30 30 30 30 30 3333
12 rd53_139 8Telegate6342369
  8Teledata81010121064
  2Total1413141413 12 13
13 rd53_311 13Telegate5523342
  19Teledata19222321201820
  3Total2427252423 22 22
14 parity_247 17Telegate1132434
  16Teledata3434343
  3Total 4 566777
15 adder16_174 49Telegate2222233
  19Teledata9710111187
  3Total11 9 1213131110
16 [[10,3,3]] 10Telegate4555676
  25Teledata10888888
  2Total14 13 13 13 141514
17 [[16,3,5]] 16Telegate17161827283034
  43Teledata34383321211917
  4Total515451 48 494951
18 [[21,1,7]] 21Telegate591420263223
  58Teledata51514642322837
  3Total 56 606062586060
19 [[24,3,7]] 24Telegate13173435383750
  84Teledata88886971666657
  4Total 101 105103106104103107
20 [[25,1,9]] 25Telegate19222121222643
  83Teledata79706868666753
  5Total98928989 88 9396
21 [[27,1,9]] 27Telegate17123144354645
  110Teledata89867767766869
  4Total106 98 108111111114114
22 [[31,11,6]] 31Telegate18284239456061
  149Teledata1271291109910510194
  4Total145157152 138 150161155
23 [[33,1,9]] 33Telegate9193236525655
  153Teledata132119122124102103104
  5Total141 138 154160154159159
24 [[35,1,10]] 35Telegate10222131434862
  126Teledata11411199100100102105
  4Total124133 120 131143150167
25 [[40,3,10]] 40Telegate23344050535967
  172Teledata166150153136149138122
  4Total189 184 193186202197189
26 QFT16 16Telegate00114151818
  56Teledata23232325262228
  3Total 23 23 2439414046
27 QFT32 32Telegate0000000
  120Teledata48484848484848
  4Total 48 48 48 48 48 48 48
28 QFT64 64Telegate0000000
  248Teledata106106106106106106106
  6Total 106 106 106 106 106 106 106
29 QFT128 128Telegate0000000
  504Teledata224224224224224224224
  8Total 224 224 224 224 224 224 224
30 QFT256 256Telegate00NANANANANA
  1016Teledata468468NANANANANA
  12Total 468 468 NANANANANA

Table 3 compares the best results obtained by WQCP with the qubit and gate partitioning approaches. Two different algorithms based on qubit partitioning approach are considered. In the first one, that is denoted by QPILP, the qubit partitioning is modeled using ILP and solved by CPLEX solver. Although this method produces the optimal results of qubit partitioning, it is not scalable. The method proposed by Ahsan et al [7, 32], denoted by QPGTA, is considered as the second algorithm for comparison. To implement gate partitioning approach (GP), WQCP was used where the window weight w1 was set to a very large number compared to the other weights, i.e. W\{w1} and wp . By this, WQCP is forced to apply each multi-qubit locally without any telegate operation. Table 3 shows that WQCP decreases the communication cost, on average, by 37.6% and 21.4% in comparison to the qubit and gate partitioning approach, respectively. For the QFT circuits, the best partitioning approach is gate partitioning because of their particular structures and WQCP produces the same results as GP for these circuits.

Table 3. The partitioning results achieved by WQCP for the benchmark circuits compared with the gate and qubit partitioning approaches.

#BenchmarkTP typeGPQPILPQPGTAWQCPImprovement (%)
       GPQPILPQPGTA
1 2of5-D1 Telegate0505030352020
  Teledata620010   
  Total62505040   
2 2-4dec Telegate013189591538
  Teledata27002   
  Total27131811   
3 6sym Telegate022224422727
  Teledata280012   
  Total28222216   
4 9sym Telegate057725163750
  Teledata430031   
  Total43577236   
5 Ham15-D3 Telegate012615657352035
  Teledata1550044   
  Total155126156101   
6 Cycle17-3 Telegate039794323740154953
  Teledata2372001288   
  Total2372397943232028   
7 8bitadder Telegate0NA1471918NA43
  Teledata103NA065   
  Total103NA14784   
8 Hwb50 Telegate0NA324729344NA60
  Teledata2323NA01008   
  Total2323NA32471301   
9 Hwb100 Telegate0NA499567334NA50
  Teledata3825NA01840   
  Total3825NA49952513   
10 rd32_272 Telegate010102254040
  Teledata8004   
  Total810106   
11 ham7_106 Telegate03232265766
  Teledata71004   
  Total71323230   
12 rd53_139 Telegate017176252929
  Teledata16006   
  Total16171712   
13 rd53_311 Telegate041414154646
  Teledata260018   
  Total26414122   
14 parity_247 Telegate011111206363
  Teledata5003   
  Total511114   
15 adder16_174 Telegate017172304747
  Teledata13007   
  Total1317179   
16 [[10,3,3]] Telegate016185271827
  Teledata18008   
  Total18161813   
17 [[16,3,5]] Telegate0NA552730NA12
  Teledata69NA021   
  Total69NA5548   
18 [[21,1,7]] Telegate0NA80514NA30
  Teledata65NA051   
  Total65NA8056   
19 [[24,3,7]] Telegate0NA1401313NA28
  Teledata117NA088   
  Total117NA140101   
20 [[25,1,9]] Telegate0NA1112224NA20
  Teledata117NA066   
  Total117NA11188   
21 [[27,1,9]] Telegate0NA1631215NA40
  Teledata116NA086   
  Total116NA16398   
22 [[31,11,6]] Telegate0NA2313920NA40
  Teledata172NA099   
  Total172NA231138   
23 [[33,1,9]] Telegate0NA2201912NA37
  Teledata157NA0119   
  Total157NA220138   
24 [[35,1,10]] Telegate0NA267219NA55
  Teledata132NA099   
  Total132NA267120   
25 [[40,3,10]] Telegate0NA328346NA44
  Teledata196NA0150   
  Total196NA328184   
26 QFT16 Telegate08794007375
  Teledata230023   
  Total23879423   
27 QFT32 Telegate0NA16500NA70
  Teledata48NA048   
  Total48NA16548   
28 QFT64 Telegate0NA27500NA61
  Teledata106NA0106   
  Total106NA275106   
29 QFT128 Telegate0NA38500NA41
  Teledata224NA0224   
  Total224NA385224   
30 QFT256 Telegate0NA60500NA22
  Teledata468NA0468   
  Total468NA605468   

The runtime of our WQCP approach in comparison to the previous approaches (GP, QPILP, and QPGTA) is reported in table 4. Since our approach and GP both use the same approach, their runtimes are approximately the same. QPILP is faster than ours for small circuits because the ILP qubit partitioning is run only once in QPILP while WQCP calls it for each level of a circuit. However, by increasing the size of the circuits, QPILP's resource overhead grows exponentially and it fails to generate an output as the memory runs out. Finally, QPGTA is the fastest approach. Its runtime is less than one second for all benchmark circuits.

Table 4. The runtimes of WQCP (in milisecond) for the benchmark circuits compared with the gate and qubit partitioning approaches.

BenchmarkGPQPILPQPGTAWQCP
2of5-D1912230537500
2-4dec225912241762
6sym295615843774
9sym837960567231
Ham15-D318 631140 087620 710
Cycle17-3717 058148 22612871 793
8bitadder11 8671490621 051
Hwb50758 887NA20865 777
Hwb1004 235 861NA394 266 630
rd32_272144514041392
ham7_106578821556580
rd53_139257321862266
rd53_311660052057343
parity_247151016651938
adder16_17499782095159724
[[10,3,3]]195933471835
[[16,3,5]]9188NA510 802
[[21,1,7]]7491NA67294
[[24,3,7]]18 235NA714 239
[[25,1,9]]17 441NA1422 624
[[27,1,9]]20 741NA1426 925
[[31,11,6]]26 014NA1138 533
[[33,1,9]]27 008NA1027 712
[[35,1,10]]28 471NA1224 308
[[40,3,10]]41 794NA1434 374
QFT1611 352179 462125268
QFT3216 216NA1815 491
QFT6434 099NA3737 288
QFT128131 541NA57150 314
QFT2561.386e6NA1561.391e6

The runtime of WQCP grows by increasing the number of qubits, the number of partitions, the window size, and the multi-qubit depth of a circuit. Since our approach calls the ILP solver only for each window, the runtime of our tool is manageable and, as it can be seen, it is applicable to large circuits such as Hwb100 and QFT256. Although the number of qubits in a window is the same as the total number of qubits in the circuit, the adjacency matrix is sparse for a sub-circuit contained in a window. This feature enables our approach to partition large circuits while other approaches using the ILP solver without windowing strategy fails to produce an output. Moreover, using some techniques such as hierarchical partitioning and utilizing a faster algorithm rather than ILP to implement subPartitioning(Cl ) can accelerate WQCP probably at the cost of decreasing the quality of results.

6. Conclusion

The main challenge of distributed quantum computing is costly communications between processing units which may be an order of magnitude more time consuming and error prone than logical operations. In this paper, we proposed an automated window-based partitioning method called WQCP to minimize such communications. The proposed method reduces the communication cost by about 29.5% in comparison with the best approaches reported in the literature.

Although the execution time of WQCP is more than existing approaches, it is not a challenge as it runs offline before actual computation. Moreover, one may speed up WQCP by utilizing a faster algorithm rather than the ILP one. Furthermore, in this paper the window weights and wp are fixed and set manually based on our experiments. While these weights may highly effect the results. Automating window weight setting based on the input circuit can be followed as future work.

Acknowledgments

The authors acknowledge the financial support by the Iran National Science Foundation (INSF).

Please wait… references are loading.
10.1088/1402-4896/abd57c