Investigating the effect of circuit cutting in QAOA for the MaxCut problem on NISQ devices

Noisy Intermediate-Scale Quantum (NISQ) devices are restricted by their limited number of qubits and their short decoherence times. An approach addressing these problems is quantum circuit cutting. It decomposes the execution of a large quantum circuit into the execution of multiple smaller quantum circuits with additional classical postprocessing. Since these smaller quantum circuits require fewer qubits and gates, they are more suitable for NISQ devices. To investigate the effect of quantum circuit cutting in a quantum algorithm targeting NISQ devices, we design two experiments using the Quantum Approximate Optimization Algorithm (QAOA) for the Maximum Cut (MaxCut) problem and conduct them on state-of-the-art superconducting devices. Our first experiment studies the influence of circuit cutting on the objective function of QAOA, and the second evaluates the quality of results obtained by the whole algorithm with circuit cutting. The results show that circuit cutting can reduce the effects of noise in QAOA, and therefore, the algorithm yields better solutions on NISQ devices.


Introduction
Quantum computing promises to solve problems that are intractable for classical high-performance computers [1,2].However, the current Noisy Intermediate-Scale Quantum (NISQ) devices have a limited number of qubits, and computations on them are flawed due to decoherence of the quantum state, inaccuracy of implemented gates, and erroneous measurements [3,4].These errors accumulate during the computation, and thus, the error rate of a NISQ device restricts the width and depth of executable circuits [4,5].Due to these deficiencies, NISQ devices demand hybrid algorithms that combine shallow quantum circuits and classical computing to leverage the advantages of both [5].This includes the class of Variational Quantum Algorithms (VQAs) that repeatedly run a parameterized quantum circuit, the so-called ansatz, on a quantum device and utilize a classical optimizer for parameter optimization [6].These algorithms can tolerate certain amounts of noise during their computations and are considered a promising approach to achieve quantum advantage on NISQ devices.A prominent VQA for solving combinatorial optimization problems is the Quantum Approximate Optimization Algorithm (QAOA) [7].Even for circuits of small depth, QAOA cannot generally be simulated efficiently on any classical device [8].
Still, the number of available qubits is too small [9], and the noise of the current NISQ devices is too high [10][11][12] to exploit the potential of QAOA to achieve quantum advantage.Consequently, strategies that reduce the size of quantum circuits, i.e., their width and depth, may help to overcome these challenges.One method to address this is quantum circuit cutting [13][14][15].The main idea is that a large quantum circuit that requires many qubits can be cut into smaller subcircuits requiring fewer qubits.In a postprocessing step, a classical computer produces the result of the original quantum circuit by combining the results obtained from running the subcircuits.This allows the evaluation of large quantum circuits using small quantum computers and additional classical postprocessing.Moreover, cutting circuits into subcircuits can also reduce their depth [14].Since shallower circuits are less susceptible to noise, they are better suited for NISQ devices [5].However, the potential of circuit cutting in QAOA is an open question.
Therefore, we approach this question by investigating to what extent circuit cutting can improve the results of QAOA in the presence of noise.To answer this, we design two experiments that apply circuit cutting in QAOA for the Maximum Cut (MaxCut) problem and execute them on state-of-the-art NISQ devices.The first experiment evaluates how the computed objective function on NISQ devices changes when circuit cutting is applied in QAOA.The second experiment studies how these changes in the objective function influence the approximated solution of QAOA.Therefore, we classically optimize the parameters of the QAOA circuits with and without circuit cutting and compare the results achieved on NISQ devices.
The remainder of the paper is structured as follows.Next, Section 2 introduces the relevant background, and Section 3 discusses related work.Based on this, Section 4 motivates the problem and refines the research question.Following this, Section 5 describes the research design, and Section 6 presents the results.Afterward, Section 7 discusses the findings, and Section 8 concludes the work.

Background
This section briefly highlights the fundamentals of QAOA and its application to the MaxCut problem.Afterward, we introduce quantum circuit cutting and present the later used technique.

QAOA and the MaxCut problem
QAOA enables solving combinatorial optimization problems such as the MaxCut [7] and the maximum independent set problem [16].The goal is to find a bitstring z = (z 1 , ..., z n ) ∈ {0, 1} n that maximizes an objective function C(z) which is encoded on the diagonal of the cost Hamiltonian H C such that H C |z⟩ = C(z) |z⟩.Thus, the optimization problem can be solved by finding the eigenvector |z⟩ of H C with maximal eigenvalue max To this end, QAOA employs an ansatz that applies alternately p times the cost operator U (H C , γ l ) = exp (−iγ l H C ) and mixing operator U (H B , β l ) = exp (−iβ l H B ) to the initial state |+⟩ ⊗n [7].Herein, H B = i X i , where X i is the Pauli X matrix applied to the i-th qubit.Thus, the result state of the QAOA ansatz is: where β = (β 1 , ..., β p ) and γ = (γ 1 , ..., γ p ) are variational parameters.A classical optimizer updates the parameters β, γ to maximize the expectation value of the observable H C on the ansatz |ψ(β, γ)⟩: In general, higher values of p can result in better approximations of the optimal solution, but at the cost of more computations [7].For the sake of simplicity, we shall write ⟨H C ⟩ β 1 ,γ 1 = ⟨H C ⟩ (β 1 ),(γ 1 ) for p = 1.
A commonly studied problem for QAOA is the MaxCut problem [7,11,12,[17][18][19].It occurs in many application fields, e.g., solid-state physics and integrated circuit design [20] as well as data clustering [21].Thus, solving the MaxCut problem efficiently helps speeding up computing solutions of these problems.A cut of a graph G = (V, E) splits its set of vertices V into two partitions and can be represented as a bitstring z ∈ {0, 1} |V | , where each bit assigns one of the vertices to one of the two partitions.The size of the cut is the number of edges crossing these two partitions.A maximum cut is at least as large as any other cut of the graph.Finding such a cut for a given graph is known as the MaxCut problem, a well-known NP-hard problem [22].
To solve the MaxCut problem with QAOA [7], the cost Hamiltonian for a graph G is defined as where Z v is the Pauli Z matrix applied to the v-th qubit.
which can be implemented as a product of R ZZ (−γ l ) = exp (iγ l Z v Z w /2) gates up to a global factor exp (−iγ l |E|/2).The corresponding unitary operator for the Hamiltonian which is a product of R X (2β l ) = exp (−iβ l X v ) gates.The R ZZ (γ l ) gate implements a two-qubit rotation about Z ⊗ Z with angle γ l and the R X (2β l ) gate implements a single qubit rotation about X with angle 2β l .Thus, the problem instance of the MaxCut problem determines the ansatz structure since each edge of the graph translates to a two-qubit R ZZ gate in the ansatz.However, problem structures of practical interest often cannot be trivially mapped to a planar architecture of current NISQ devices with restricted qubit connectivity [11].Instead, their quantum circuits have to be adapted by inserting additional SWAP operations to permute qubits and adapt the circuit to the connectivity of the quantum device [23].

Quantum circuit cutting with gate cuts
Quantum circuit cutting is a set of techniques that enables to split a quantum circuit into multiple smaller circuits with fewer qubits and gates such that the result of executing the collection of the smaller circuits is the same as the result of executing the original circuit by exploiting subsequent classical postprocessing [14,15].The primary focus of quantum circuit cutting is on reducing the number of required qubits, but the subcircuits also consist of fewer gates such that cutting may also reduce the circuit depth of the executed circuits.The circuit cutting process consists of three steps: (i) cutting the circuit into a set of smaller subcircuits, (ii) execution of these subcircuits, and (iii) the classical recombination of the subcircuit results.There are two different approaches to cut a circuit: (i) gate cutting replaces two-qubit gates by sets of local operations [15,24] and (ii) wire cutting divides circuit wires carrying quantum information in the form of

QPU
Cut circuit into subcircuits 3. Recombine results Result for gate " ! (#) : Result for gate " ! (%) : a qubit into sets of measurement and state-preparation operations [14].In this work, gate cutting is used, which is presented in Figure 1 and described in more detail below.Let H (1) ⊗ H (2) denote a bipartite n-qubit quantum system consisting of Hilbert spaces H (1) and H (2) .The quantum state of this system can be described by a density operator ρ, which is a positive Hermitian matrix of size 2 n × 2 n with trace equal to one.Moreover, consider an n-qubit gate represented by a unitary U .The evolution of the quantum state under the action of gate U can be described by a superoperator S(U ), which is a linear operator that acts on the space of density operators.The new state resulting from the application of U on the state described by ρ is given by S(U )ρ = U ρU † .
A gate cut of the unitary operator U is its quasiprobability decomposition (QPD) into a set of operators where V (1) i and V (2) i are operators acting on subsystem H (1) and H (2) , respectively [15].Each operator V (j) i has to be physically realizable, that is, its superoperator S(V (j) i ) is a completely positive linear map that is trace-nonincreasing, i.e., 0 ≤ tr[S(V (j) i )ρ] ≤ 1 for any density operator ρ [25].Therefore, this includes non-unitary transformations such as projections.The decomposition of the gate U is depicted in the first step of Figure 1.
Consider for the partition of the quantum system H (1) ⊗ H (2) an observable O = O (1) ⊗ O (2) that acts independently on each subsystem, and a separable initial state 0 .Thus, the subsystems are independent of each other concerning their initialization and measurement.Then, the evolution of density operator ρ 0 according to the unitary U and subsequent measurement with observable O can now be reproduced by applying the operations V i and weighting their measurement with O according to c i : This allows us to evaluate each of the expectation values R (1) i and R (2) i individually and then recombine them to produce the expectation value of the original circuit as shown in step two and three of Figure 1.The computational overhead of this procedure can be measured by the number of additional shots needed to approximate the expectation value of the original circuit [26].Although the expectation value of the original circuit, computed from the subcircuit results, remains unchanged, additional shots are necessary due to the increased variance resulting from sampling the subcircuits.This increase in variance is characterized by the factor κ := i |c i | with κ ≥ 1.Previous studies have demonstrated that achieving a fixed statistical accuracy requires an additional O(κ 2 /ϵ 2 ) shots for estimating the original expectation value within the error ϵ [27].It is worth noting that the computational cost of the classical recombination step is limited by the number of shots on the subcircuits, as only the sampled results need to be recombined through multiplication.
Given cuts for gates U and Ũ with QPDs {(V i , c i )} and {( Ṽi , ci )}, respectively, the cut of their product U Ũ can be constructed as following Thus, a quantum circuit constructed of multiple subsequent gates can be cut by cutting its individual gates.However, the overhead, measured in terms of the number of additional shots, grows in the worst case exponentially as each cut introduces a multiplicative factor κ [26].

Gate cutting for QAOA's MaxCut ansatz
The only multi-qubit gates in the MaxCut ansatz for QAOA are R ZZ gates.Thus, the ansatz for a given graph can be cut by partitioning its qubits into separate parts and then cutting all R ZZ gates between these qubit partitions.To cut a single R ZZ gate, we use the QPD for cutting arbitrary two-qubit gates introduced by Mitarai and Fujii [15,24]: with and The two operations forming B are the projections on the states |1⟩ and |0⟩, respectively, i.e., (I −Z)/2 = |1⟩⟨1| and (I +Z)/2 = |0⟩⟨0|.Although these operations are not unitary and thus cannot be directly implemented as a gate of a quantum circuit, they can be realized by a post-selective measurement [15].By applying this QPD, the circuit with the R ZZ gate can be replaced by circuits that perform at each of the two previously connected qubits one of the following five operations instead: an gate, or a measurement for the projections.Since the two qubits are now separable, ten different subcircuits consisting of the five operations for the upper and five for the lower qubit must be executed.Appendix A provides a proof of the cutting formula in Equation (12).

Related Work
An umbrella term that combines techniques to perform the computation of a large quantum circuit as the execution of smaller subcircuits is circuit knitting [28].It includes circuit cutting and entanglement forging [29,30], which is a closely related technique.
In circuit cutting experiments, various works have practically demonstrated that circuit cutting can be used to expand the size of executable circuits beyond the physical capabilities of NISQ devices, e.g., for a set of benchmark circuits [31], GHZ circuits [32], and circuits for linear cluster states [33].Moreover, it has been shown experimentally that in noisy simulations and on NISQ devices, circuit cutting can lead to better, i.e. less noisy, execution results [31][32][33][34].Circuit cutting can help with gate errors and short decoherence times since smaller circuits are less susceptible to these noise sources; in contrast, readout errors can be detrimental to the circuit cutting procedure since it requires additional measurements [34].In addition, it was investigated how errors occurring in different subcircuits affect the final recombined result, and thereby, different error propagation characteristics and sensitivities were found for different subcircuits [35].
The cost of circuit cutting grows in the worst case exponentially with the number of cuts [36].Since the introduction of the first gate-cutting technique [13], various methods and adaptions that lower the cost have been developed [14,15,24,26,36].Peng et al. [14] present the first wire cut, which cost is reduced by Lowe et al. [36] via randomized measurements.Mitarai and Fujii [15,24] lower the cost of a gate cut.Moreover, based on gate teleportation [37], Piveteau and Sutter [26] show that extending gate-cutting with classical communication between subcircuits can reduce the overhead further, although the overhead is still exponential.
In addition, further improvements and tooling around the cutting methods themselves were developed.A cutting framework that automatically determines optimal cut locations and quickly finds solution states of large quantum circuits was introduced [31,38].This framework is part of Qiskit's Circuit Knitting Toolbox, which is a collection of tools to decompose quantum circuits [39].Furthermore, maximumlikelihood tomography applied to wire cutting produces better results by ensuring nonnegativity and normalization for the computed result distribution [40].This method estimates the result of a circuit even in noise-free experiments with higher fidelity than the execution of the full circuit.Moreover, to lower the number of subcircuits, a machine learning approach applied to gate-cutting can approximate the result of the original circuit with a significantly reduced set of subcircuits [41].
Furthermore, there is work that investigates partitioning circuits of VQAs.Saleem et al. [42] apply circuit cutting in a VQA to solve the Maximum Independent Set problem.However, they execute their approach only on a simulator since the complex multicontrol gates used result in circuits that are too noise sensitive for current NISQ devices.Moreover, Lowe et al. [36] apply wire cuts in QAOA and perform numerical experiments on a simulator to show the speed up of their method.Both works do not investigate the effect of circuit cutting on VQAs running on error-prone NISQ devices.Additionally, it has been demonstrated for a classification task that a restricted ansatz consisting of multiple local circuit partitions can be used to improve the training of a VQA and its robustness against noise [43].However, this approach does not allow any interaction between these partitions, and therefore, this technique cannot generally implement the exact ansatz defined by QAOA's cost Hamiltonian of a specific problem instance.
Besides methods on the circuit level, decomposition approaches on the algorithmic level exist [18,[44][45][46][47][48][49].Examples include divide and conquer approaches for QAOA [18,44] and quantum local search methods that iteratively optimize local subproblems of a problem [45][46][47][48].In addition, Ayanzadeh et al. [49] demonstrate for the MaxCut problem that by removing nodes with high connectivity from the graph and accounting for their possible assignments in the objective function, the circuit size can be reduced, and the fidelity of QAOA on NISQ devices can increase.

Motivation and research questions
Due to the already discussed problems of NISQ devices, algorithms for applications on them are an active research field.One promising pathway is the ongoing development of VQAs [6], like the prominent QAOA.Since the introduction of QAOA [7], much work has been done to develop the algorithm further to achieve quantum advantage on NISQ hardware.For example, new objective functions have been introduced [50,51], research on warm-starting QAOA has been done [17,52], and different modifications of QAOA have been presented [53][54][55].
However, noise is still an open problem for QAOA.It limits the advantage of increasing p to improve QAOA solutions in practical experiments on NISQ devices [12].This issue is particularly pronounced for problem instances that have connectivity that does not match the hardware, as the quality of results rapidly declines with increasing problem size due to the added noise from SWAP operations [11].In general, noise in the computation of VQAs leads to exponentially vanishing gradients in the training landscape [19,56].This effect is referred to as noise-induced barren plateaus (NIBPs) [10].Although NIBPs and noise-free barren plateaus [57,58] both lead to vanishing gradients, they differ in the way they affect the training landscape.Noise-free barren plateaus allow the global minimum to reside inside a deep, narrow valley, while NIBPs exponentially flatten the entire landscape [10,56].Notwithstanding the fact that noise does not change the optimization direction in the QAOA parameter space, and optimal parameters stay nearly the same [19], resolving the exponential concentrated training landscapes to a fixed precision requires an exponential number of shots [10,59].Moreover, although gradient-free optimization methods do not use gradient information, they also scale exponentially in the number of shots for ansätze with exponentially small gradients [60].
Since common strategies that address noise-free barren plateaus do not work for NIBPs, two basic strategies for preventing NIBPs remain [10]: (i) lowering the hardware noise level and (ii) reducing the circuit complexity, mainly focusing on the depth but also the width of the circuit.While error mitigation can reduce the effect of hardware noise in VQAs [61,62], a broad class of these techniques cannot remove the connected exponential resource scaling [59].
In this work, we apply quantum circuit cutting to reduce the complexity of the executed circuits.Quantum circuit cutting can be classified as a decomposition technique on the quantum circuit level as it is independent of the algorithm generating the circuits [14,15].Like other decomposition approaches, it can reduce the number of required qubits and increase the size of solvable problem instances with available quantum devices [31,33].However, our work focuses not on expanding the problem size but on decreasing circuit complexity.Hence, we aim to decrease the susceptibility of noise during computation and consequently increase the result quality with circuit cutting.Such an examination of circuit cutting on NISQ devices in the context of a VQA, e.g., QAOA, is missing.In particular, there is a lack of research on the extent to which circuit cutting applied in QAOA can help to reduce the influence of noise and whether its application can lead to better results.Thus, to address this issue, we formulate the main research question (RQ) of this work: Main RQ: To what extent can circuit cutting improve the results of QAOA when executing on NISQ devices?
To answer our main RQ, we refine it into two sub-RQs (SRQs) that need to be answered.First, we focus on what impact cutting the QAOA ansatz has on the objective function.Of particular interest is whether the reduced size of subcircuits can help to reduce the effect of NIBPs.The first SRQ is as follows: SRQ 1: How does cutting the QAOA ansatz influence its corresponding objective function on NISQ devices?
The second part includes the classical optimization of the parameters for the cut ansatz.We investigate how cutting influences the classical optimization process in QAOA.In particular, we are interested in whether cutting helps to obtain better solutions on NISQ devices.This is formulated in our second SRQ: SRQ 2: Can QAOA with circuit cutting obtain better solutions on NISQ devices when the entire algorithm, including parameter optimization, is executed?

Research design
To tackle these RQs, we design and conduct two experiments focussing on the MaxCut problem for unweighted graphs.In the following, we introduce our adaptions to the gate cut described in Section 2.3 and afterward give a general overview of the experimental setup that all experiments have in common.Subsequently, we describe in detail the design for each experiment.In the end, we introduce metrics to evaluate the obtained results.

Adaptions to the cut of the R ZZ gate
We adapt the approach described in Section 2.3 to reduce the number of subcircuits.Our adjustment of Equation ( 13) is as follows: The mathematical details are provided in Appendix B. By harnessing this equality, the number of different subcircuits of the R ZZ gate decreases from ten to eight since the two local R Z − π 2 gates disappear and A can be expressed as a weighted sum of the I, Z, and R Z π 2 gates.The results for the I and Z operations can be reused as they are stored in classical memory.Although this substitution reduces the number of subcircuits, it introduces larger factors c i in the QPD.This does not change the equality, but it leads to a higher variance, and therefore, more shots are needed for convergence [63].However, this can be a worthwhile trade-off for a small number of cuts as the factors for them remain relatively small.This technique is applied in all our experiments.

Overview of the experimental setup
This section describes the steps that all experiments have in common.At a high level, each experiment consists of three parts as depicted in Figure 2: (i) the generation of the problem instance, the corresponding circuits, and subcircuits, (ii) the execution of the latter on quantum devices, and (iii) the postprocessing including the recombination of subcircuit results and the evaluation of the objective function.Experiments can repeat  " (2) steps (ii) and (iii) in a variational optimization loop to refine the circuit parameters with a classical optimizer until a termination condition is met.Hereafter, we will call the QAOA ansatz that uses circuit cutting cut-QAOA.As quantum devices for the execution, we use IBM's superconducting hardware.The experiments are implemented with Qiskit [64].Wherever a classical optimizer is necessary, COBYLA, in its default configuration from Qiskit, is employed.Since the optimizer only supports minimization, we transformed the maximization of QAOA's expectation value in our implementation into a minimization problem by flipping the sign of the classically computed objective function.Thus, the optimal value corresponds to the minimal objective value.In the following, the individual steps are described in more detail.

5.2.1.
Problem instance generation, circuit generation, and cutting.First, a problem instance has to be created.This is a crucial step since the edges of the graph map to the R ZZ gates in the QAOA ansatz and therefore, the graph structure determines the number of cuts that are later needed to separate the ansatz.To enable the resulting ansatz to be separable with a specific number of cuts, we first generated two independent connected random subgraphs of equal size with the Erdős-Rényi G(n, p) model [65], i.e., each possible edge in a graph with n vertices is selected with the probability p.Here, we enforce that each subgraph is connected by repeating the generation procedure until it yields two connected subgraphs.This is illustrated in step a of the generation in Figure 2. Afterward, we connect the two subgraphs by randomly inserting two edges between them and thereby fixing the locations of two intended cuts (see step b).
Following this, we generate an ansatz based on the resulting graph where the parameters can be inserted afterward (see step c).In our experiments, only QAOA ansätze with depth p = 1 are considered, meaning that the cost Hamiltonian and mixer Hamiltonian are executed only once.Although the IBM quantum devices used in our experiments do not natively support R ZZ gates, we use these gates in the circuit generation phase to apply our cutting procedure and leave it to the transpiler to replace the remaining R ZZ gates with native gates, i.e., two CNOT gates and a R Z rotation.
There are multiple equivalent QAOA ansätze for each problem instance since the R ZZ gates in the cost unitary U (H C , γ l ) commute, and thus they can be arranged in an arbitrary order.Different orderings of the R ZZ gates have no impact on a noise-free device and lead to equivalent results.However, this is different for NISQ devices.Depending on the arrangement of operations, the depth of the circuit may vary.In addition, specific arrangements of operations can be mapped to the restricted connectivity of the quantum device with fewer additional SWAP operations and consequently lead to transpiled circuits of a lower depth.Moreover, independent simultaneous operations on different qubits may affect each other unintendedly and hence are another source of noise [66].All these issues have to be taken into account for optimal circuit transpilation.However, optimal circuit transpilation is NP-complete, and transpilers rely mostly on heuristics [67].Moreover, the used Qiskit transpiler does not conduct all these optimizations without providing custom transpiler passes [68].Therefore, to incorporate the effect of different arrangements of operations in our experiments and compare them to the effect of circuit cutting, we consider this in our experiments manually and generate two equivalent ansätze for each problem instance.One ansatz optimizes the number of simultaneous gates in the circuit by executing the gates of the two graph partitions sequentially, and the other ansatz optimizes the depth of the circuit by executing the gates of the two graph partitions in parallel.They are schematically depicted in Figure 3.The sequential version of the ansatz starts with all R ZZ operations of the first partition.Then we apply the operations that get cut.Finally, we add the R ZZ operations for the second partition.The parallel version of the ansatz applies all R ZZ operations of both graph partitions in parallel and then the gates that get cut in a final step.We use the same arrangement of R ZZ operations within the partitions in both versions as well as in the subcircuits of the cut ansatz.The sequential version results in a deeper circuit with fewer parallel operations.In contrast, the parallel ansatz is shallower but with more parallel operations.
Next, the generated circuit gets cut.To produce its subcircuits that are depicted in step d of Figure 2, we use the following procedure: (i) Remove all multi-qubit operations between the different qubit partitions corresponding to the two subgraphs such that the circuit decomposes into separable circuit fragments.
(ii) Generate for each of these circuit fragments their respective subcircuits by inserting all possible combinations of I, Z, R Z π 2 and measurement at the positions of the removed gates.

Sequential ansatz variant
Parallel ansatz variant

Execution.
The execution includes not only running the subcircuits for cut-QAOA but also the original QAOA circuits, which will be used later in the evaluation for comparison purposes.But before an experiment's generated circuits and subcircuits can be executed, their parameters must be fixed.The parameters are defined either as hyperparameters of the experiment or are selected by the classical optimizer as part of the variational loop.Afterward, we transpile all circuits for the selected quantum device with Qiskit's internal transpiler in its default configuration [64].All transpiled circuits get batched into jobs and submitted jointly to the quantum device for execution.
In our experiments, we executed the original QAOA ansatz with the same number of shots as the total amount of shots across all subcircuits of cut-QAOA.Consequently, the same number of shots were spent with and without cutting in our experiments.For the different subcircuits of a cut-QAOA ansatz, we evenly distribute the number of shots among them.Another possibility would be to distribute the shots across the subcircuits in proportion to their factor c i in the QPD.However, we choose not to do so since Qiskit does not allow to specify the number of shots per circuit but only for all circuits in one job.

Postprocessing.
Lastly, postprocessing is performed.Thereby, the results of all subcircuits get processed and recombined according to Equation ( 9) to reproduce the result of the original circuit.Subsequently, the MaxCut objective value is calculated for all results, i.e., of the original circuits' execution results and the subcircuits' recombined results.

Description of the experiments
In the following, a detailed description of the two experimental setups used for the evaluation is given.

Experiment 1:
Objective function.To investigate RQ 1, we perform an experiment that evaluates the function at equally distributed parameter configurations.More precisely, it consists of the following steps: (i) Generate problem instance and corresponding circuits: For this experiment, we randomly generated problem instances with the method described above that lead to separable circuits with two cuts.We use the two introduced variations of the ansatz.
(ii) Cut circuit into subcircuits: Although we generate and execute the sequential and parallel ansatz for later comparison, we only cut one of the two ansätze to keep the number of circuits to be executed smaller, allowing us to execute more runs of the experiment with different graphs.In contrast to the original circuits, the subcircuits of both ansätze have the same depths and the same number of parallel operations since they have the same arrangement of two-qubit gates by design and differ only in the position of the cut.Therefore, we choose to cut only the sequential variant of the ansatz into the predefined partitions and then generate its parameterized subcircuits.
(iii) Create circuits with parameters: Since the parameters define rotations about the angles 2β and γ, the objective function is periodic, and we can restrict the parameters to (β, γ) ∈ D = ([0, π], [0, 2π]).Therefore, we sample the parameter space D with an equidistant grid with 20 × 40 points resulting in 800 function evaluations.For each parameter configuration of the grid, the two ansatz variants of the original circuit and the subcircuits with fixed parameters are created.
(iv) Execute circuits: All circuits and subcircuits are transpiled for and executed on the selected quantum device.We extract from the execution not only the aggregated result of all shots but also the results of every single shot.This enables us to analyze how the result behaves with different shot numbers by considering only parts of the shots, e.g., the first 1000 shots from execution with 10000 shots.
(v) Recombine subcircuits: In classical postprocessing, we recombine the subcircuits with the procedure above.
(vi) Evaluate objective function: The objective values for the results of the original circuits and the cut circuits are computed for different shot numbers.

Experiment 2:
QAOA execution.To investigate RQ 2, we run the entire VQA, including parameter optimization for QAOA and cut-QAOA, ten times on the same problem instance.In each execution, new randomly selected initial parameters were used in order to reduce the influence of the initial parameters and reason over them in the later analysis.To allow later evaluation across the two experiments, we conducted this experiment with the same problem instances as the experiment above.Furthermore, to minimize the effect of different calibrations of the quantum devices, this experiment was always performed directly following the previous experiment for each problem instance.In detail, this experiment consists of the following steps: (i) Generate problem instance: We use the randomly generated problem instance from the experiment above.
(ii) Repeat: We repeat the following steps ten times.

Metrics for the evaluation of the objective functions
In this section, we introduce metrics to assess and quantify the objective functions computed on the quantum devices in our first experiment for both QAOA variants, QAOA and cut-QAOA.Therefore, we consider metrics for the objective function themselves and its gradients.First, we assess the mean absolute objective value difference between a QAOA variant on a quantum device and QAOA without cutting on the simulator.The smaller the difference, the closer the computed objective values of the NISQ device are to the noise-free values of the simulator.The mean absolute difference (MAD) between the simulated objective function ⟨H C ⟩ SIM β,γ and the computation on a NISQ device ⟨H C ⟩ QPU β,γ for the finite parameter sets P β ⊂ [0, π] and Secondly, we have examined the mean absolute objective value difference between a QAOA variant on a NISQ device and the maximally mixed state (MMS), i.e., the state with density operator I/2.It measures the concentration of the objective function and is an indicator for NIBPs [10].The larger its distance from the objective value of the MMS, the more the computed objective values of the QAOA variant deviate from the objective value achieved by randomly picking states with equal probability.The MAD from the objective value of the MMS of n qubits is This metric can be calculated for the NISQ device and the simulator.We compute the objective value of the MMS classically by sampling uniformly at random from the solution space and computing the expected objective value for the sample.Furthermore, we assess the gradients of the objective function's computed parameter landscapes as NIBPs lead to vanishing gradients.The gradient of the objective function is We numerically computed the partial derivatives using second-order accurate central differences [69].To assess the gradients of the computed parameter landscapes, we consider their size [10] and the mismatch in their direction compared to the simulation.For the size of the gradient, we have taken into account their average size and the variance in size.The average gradient size is defined as where ∥•∥ 1 is the 1-norm.The variance of the gradient size is A decreasing variance means that smaller and smaller changes in the gradients occur, which makes it more and more difficult for a classical optimizer to find an optimum.
Besides the pure size of the gradients, the gradients must also guide the classical optimizer to the optima.For this, we analyze the direction of the gradients.We compare the gradient direction from the NISQ device with the simulated gradients.To this end, we employ the average pairwise cosine similarity between those gradients [70,71].It is defined as follows:

Results
This section presents the results for the two introduced experiments in order to answer our RQs.Since the differences in the results between the sequential and parallel ansatz of the QAOA are minor compared to the results of cut-QAOA and since, in direct comparison, the parallel ansatz performs slightly better on average, we show only the results for the parallel ansatz in this section for the sake of clarity.However, the code and all data generated during the experiments are publicly available at [72].

Results for experiment 1: Objective function
Following our experiment design, the first question to consider is how cutting the QAOA ansatz influences the corresponding objective function using NISQ devices.In the following, this is first discussed in detail, for one example graph, and then the results for the whole dataset are presented.

Example graph
In this section, the computed objective functions for an example graph are visualized and evaluated using the introduced metrics for the experiment.Figure 4 shows the ten-node example graph (Figure 4a) with its corresponding noisefree objective function (Figure 4b).The cut of edges (0, 8) and (4, 9) separates the graph into two subgraphs of equal size.The objective function is evaluated on a simulator for p = 1 QAOA layers.The desired minima are shown in shades of red in the colorized parameter map.The higher the objective value, the more the color changes to a blue tone.
Figure 5 visualizes the objective functions as parameter landscapes with 1000 and 10000 shots for QAOA and cut-QAOA, respectively, executed on the NISQ device ibmq ehningen, a 27-qubit superconducting quantum device by IBM with their Falcon chip architecture.All parameter landscapes are colorized with the same color scale as used for the simulated result in Figure 4b.It is evident that QAOA and cut-QAOA computed on ibmq ehningen achieve a smaller range of objective values than the simulated objective function (see Figures 4b and 5).However, the QAOA ansatz achieves a substantially smaller range of objective values than the cut-QAOA.As seen in Figure 5, the maxima and minima of the cut-QAOA emerge more clearly from the plane compared to QAOA.While with QAOA no noticeable change in the objective function can be seen between 1000 and 10000 shots, with cut-QAOA the objective function becomes noticeably smoother.
In Figure 6, the introduced metrics for the objective functions above are plotted as a function of the number of shots.It can be seen that the objective function of cut-QAOA is closer to the simulated objective function.The MAD between cut-QAOA on ibmq ehningen and the simulation is significantly lower than for QAOA (Figure 6a).Moreover, the deviation of cut-QAOA from the objective value of the MMS is substantially higher than the deviation of the QAOA ansatz (Figure 6b).Furthermore,  the more pronounced minima and maxima in the cut-QAOA objective function result in larger gradients with higher variance (Figures 6c and 6d).The average gradient size of the QAOA ansatz is significantly lower than with cutting, and the variance of gradient size is close to zero, indicating approximately equal size among the gradients.In contrast, the objective function of cut-QAOA has strikingly increased gradients of higher variance.This implies more salient structures in the computed objective function.However, cut-QAOA still does not achieve the same magnitude and variance of gradients as the simulator.Apart from the gradient size, the gradients of the cut-QAOA ansatz have a significantly higher cosine similarity value than the QAOA ansatz on the quantum devices (Figure 6e).Consequently, the mismatch in the gradient direction is smaller, and thus, the optimum and the optimization path to the optimum deviates less from the noise-free objective function.
All metrics as functions of the number of shots show the same pattern in Figure 6.They are approaching a plateau.A reasonable explanation for this phenomenon is shot noise which refers to statistical sampling errors that occur when estimating the result distribution of a quantum circuit with a finite number of shots [73].With an increasing number of shots, the sampled result distribution of a quantum circuit converges to its expected distribution.This behavior manifests itself in decreasing values for all metrics as the number of shots increases except for gradient similarity.There, the values increase as they approach a plateau.The more the curves approach the plateaus, the less shot  noise matters, but NISQ errors remain.While this state is reached with QAOA with relatively few shots, cut-QAOA needs significantly more shots and thus is considerably more affected by shot noise.This effect also fits with the fact that cut-QAOA must distribute the number of shots among all subcircuits, and thus, each individual subcircuit receives significantly fewer shots.
6.1.2.Evaluation using a set of random graphs Our data set includes 61 randomly generated graphs as listed in Table 1.The individual graphs can be viewed in the published dataset [72].Figure 7 shows each metric as a function of the number of shots for the random graphs from the data set.For each metric and QAOA variant, the mean value is plotted as a line.The shaded area around the mean represents a percentile interval of 80% of the data.Thus, the highest and lowest 10% of the values are not visualized.To ensure better comparability of the metrics between problem instances, the MAD to MMS, average gradient size and variance of gradient size were normalized by dividing by the metric value of the simulator.The other two metrics were not normalized further, since they already incorporated the simulator.The averaged values of the data set show the same patterns as observed for the example graph.First, all metrics approach a plateau value with an increasing number of shots.As in the example above, cut-QAOA converges significantly slower on average and starts further away from its plateau value than QAOA for all metrics.Both facts indicate that cut-QAOA is more susceptible to shot noise.Moreover, the computed objective function of the cut-QAOA ansatz is closer to the simulated objective function and deviates more from the objective value of the maximally mixed state.Furthermore, its gradients are of larger size and have a higher variance in their magnitude.Additionally, the gradients of the objective function of the cut-QAOA are more similar to the simulated gradients.Although Figure 7 shows only averaged metrics, the individual results of the experiments also follow these patterns and resemble the results of the example graph above (Figure 4a).Moreover, while cut-QAOA performs better regarding the metrics, it tends to have a slightly higher spread in metric values across the problem instances except for gradient similarity, where the variance for QAOA is significantly larger.However, these patterns are more or less pronounced depending on the number of nodes in the generated graph.With an increasing number of nodes, and thus large quantum circuits, the advantage of cut-QAOA in the metrics decreases and approaches the value of QAOA.For this, a probable reason is that with increasing graph size, the QAOA ansätze grow in depth and width, and so do the subcircuits of cut-QAOA.Thus, each subcircuit is more susceptible to the noise of the NISQ devices, and thus the combined result of cut-QAOA gets noisier.

Major observations:
• Compared to QAOA without cutting, the objective function of cut-QAOA computed on NISQ devices is closer to the simulated objective function.
• It deviates more from the objective value of the maximally mixed state.
• It has larger gradients with a higher variance that are more similar to the simulated gradients.
• However, cut-QAOA is more affected by shot noise.

Results for experiment 2: QAOA execution
Next, we evaluate whether these improvements in the computed objective function lead to better results in QAOA to clarify SRQ 2, i.e., lower expectation values and better MaxCut solutions.We extracted the shot results of the ansatz for the optimized parameters β * , γ * for each run of QAOA and cut-QAOA and considered the obtained expectation value ⟨H C ⟩ β * ,γ * and the most frequently sampled state z * .To compare these results among different MaxCut problem instances, we normalize the result with the optimal objective value C opt .This yields the expectation ratio and the approximation ratio While QAOA optimizes the expectation value, the state most frequently sampled is usually taken as the solution to the problem.In addition to the computed solution with the QAOA variants on the quantum devices, we classically draw samples randomly from a uniform distribution of the solutions, i.e., we classically simulated sampling the MMS.This sample serves as a benchmark to show whether the QAOA variant on the quantum device performs better than random sampling.Furthermore, we executed QAOA on a noise-free simulator to obtain the expectation ratio and approximation ratio in an ideal, noise-free scenario.These noise-free solutions provide a reference point for evaluating the impact of noise and imperfections in the executions of the QAOA variants on the NISQ devices.
We conducted this experiment on a subset of graphs from experiment 1.The data set includes 47 graphs as described in Table 1.The obtained expectation values can be seen in Figure 8 for the different QAOA variants and graph sizes.Cut-QAOA achieves the highest expectation ratio, while QAOA performs hardly better than random sampling.However, as the graphs get larger, the advantage of cut-QAOA gets smaller, and it also approaches the expectation ratio of QAOA.Considering the results from experiment 1, the computed objective function gets noisier, and thus, it is less accurate and flattens.Therefore, it is harder for the classical optimizer to find optimal parameters, and even for optimal parameters, the sampled noisy solutions yield a lower expectation value in general.The fact that QAOA without cutting is only marginally different from random sampling suggests that noise in its computation prevails, rendering the results impractical.The better expectation values of cut-QAOA also translate to solutions of higher approximation ratio, as can be seen in Figure 9. Cut-QAOA follows the previously established pattern and approaches the approximation ratio of QAOA with increasing graph size.

Major observations:
• QAOA without cutting performs only marginally better than random sampling.
• Cut-QAOA achieves noticeably better expectation values than QAOA, but this advantage decreases with increasing size of the graphs.
• Higher expectation values of cut-QAOA translate to solutions with noticeably higher approximation ratios.

Discussion
The observed result that quantum circuit cutting applied in QAOA for the MaxCut problem produces execution results less affected by noise are in line with evaluations of circuit cutting on benchmark circuits [31,33,34].The reasoning is that there is a range in the size of the subcircuits where their execution is significantly less noisy than the execution of the original circuit.Thus, the classical, noise-free recombination procedure can estimate a less noisy result for the original circuit based on the less noisy subcircuit results.While our work focuses only on the decomposition of the circuit into two fragments that bisects the number of qubits in the subcircuits, another work shows that by decomposing the circuit into more fragments, the effect is further enhanced [34].However, each additional cut of the circuit increases the number of subcircuits and the shots required to sample them in the worst case exponentially, and thus choosing the number of cuts and fragments is a tradeoff between smaller circuits and additional cost.This additional cost is also manifested in our experiments by the higher shot noise, i.e., more shots are needed for convergence.This observation is consistent with theoretical considerations of circuit cutting in other work [24,26] and, in general, the applicability of circuit cutting to suppress errors and tackle NIBPs depends on structural properties of the circuits that determine the number of cuts and therefore the required overhead.That means, to effectively scale cut-QAOA to larger problem instances while keeping computational overhead manageable, it is crucial to ensure that the number of required cuts to partition the circuit grows slowly compared to the problem size.By doing so, we can delay the exponential increase in overhead, thereby expanding the number of problem instances solvable in an acceptable time.The corresponding circuits of these problem instances should exhibit a specific cluster structure that is characterized by high two-qubit gate connectivity among qubits within the same cluster and low connectivity between clusters.Leveraging this structure allows for efficient circuit partitioning with a small number of cuts relative to the problem size.

Limitations
While our findings offer valuable perspectives and potential avenues for further investigation, their generalizability may be limited due to the influence of various factors, such as the selection of the problem, the problem instances, and the experimental setup.
Even though we focus on solving the MaxCut problem with QAOA in our work, the application of circuit cutting is independent of it since it operates at the circuit level and is detached from the underlying problem and algorithm.Moreover, the experiments were only conducted with a limited set of graphs due to the restricted availability of NISQ devices and execution time on these.All investigated graphs have either eight, ten, or twelve nodes since, for graphs with more nodes, the noise dominates even with cut-QAOA.To control the number of cuts per problem instance in the experiments, each graph in the dataset has a particular structure: it consists of two equal-sized subgraphs connected by two edges.The choice of two edges has two reasons.On the one hand, the solution to the MaxCut problem cannot be generated in general by combining the solutions of the two subgraphs.On the other hand, the used circuit cutting approach must execute only 32 subcircuits for the two cuts.Other work focuses on experiments with more than two cuts [33,34] and also on significantly reducing the number of required subcircuits [41].Regarding the selection of the NISQ devices, the experiments were conducted on multiple devices of IBM's Falcon chip architecture.Furthermore, the calibration of the devices changes periodically and cannot be controlled by the user.This harms the reproducibility of the experiments.To compensate for this, we have included all calibration data of the quantum devices in the dataset to improve the traceability of results [72].Despite all these limitations, we obtained comparable results for the different graphs, NISQ devices, and their calibrations that show the same patterns concerning the application of circuit cutting.

Correlation between objective function metrics and QAOA result
The connection between the computed objective function's quality and the quality of the result of the algorithm becomes evident when combining the data from the two conducted experiments.We can see a correlation between the introduced metrics of objective functions and the expectation and approximation ratio of the corresponding QAOA results.QAOA tends to produce better results for objective functions that perform better concerning the metrics.Figure 10 visualizes the correlations between three metrics and the increase in expectation ratio compared to the MMS, i.e., ⟨H C ⟩ β * ,γ * − 1 2 n tr [H C ] C −1 opt .The correlation of other metrics can be found in the data set [72].While there are always runs where no improvement was achieved, e.g., due to poor initial parameters and local optima, the advantage in expected value increases  as the metrics improve.This positive correlation supports the relevance of the metrics used, and in addition, it fortifies the assumption that an improved objective function aids the classical optimizer in finding better solutions.Furthermore, these results were obtained, although the COBYLA optimizer in the default configuration was employed, i.e., without adjusting its hyperparameters to the optimization task.Further tuning of the optimizer to the underlying problem and noise can lead to better results [75].

Summary and Future Work
In this paper, we applied circuit cutting in QAOA to solve the MaxCut problem on current NISQ devices.To answer SRQ 1, which deals with the influence of circuit cutting on the objective function, our results provide insight that circuit cutting suppresses the effects of noise in the computed objective function on NISQ devices.We observe objective functions computed with circuit cutting that are less affected by NIBPs.They are closer to the noise-free result, are less concentrated around the objective value of the maximally mixed state, and have larger gradients with more variance that are more similar to the expected gradients than the computed objective functions without circuit cutting.However, their computation with circuit cutting exhibits more shot noise, i.e., more shots are needed for convergence.Regarding SRQ 2, it can be observed in our experiments that QAOA with circuit cutting obtains better solutions on NISQ devices by classical optimization of the cut ansatz than without cutting.It achieves better expectation values and produces better solutions, i.e., its most frequently sampled state, with a higher approximation ratio.Part of our future work will be to investigate how large the range of problem instances is where advantages can be obtained with circuit cutting on NISQ devices.Small problem instances at the lower end of the range can also be computed with low

Figure 1 :
Figure 1: Circuit cutting process: A large quantum circuit gets split into smaller subcircuits by replacing non-local with local operations.The smaller subcircuits get executed on one or more quantum devices and produce the subcircuit results that get recombined to produce the result of the initial circuit.

Figure 2 :
Figure 2: General overview of the evaluation process of the experiments with a refined preprocessing example consisting of problem generation, circuit generation, and circuit cutting.

( a )
Choose initial parameters: We randomly choose initial parameters β ∈ [0, π] and γ ∈ [0, 2π].(b) Execute QAOA: The QAOA is performed twice for the generated problem instance: once for each presented ansatz variant.Each run gets started with the chosen initial parameters.(c) Execute cut-QAOA: Subsequently, the cut-QAOA is executed on the same initial parameters.
Objective value in the parameter landscape for the example graph simulated for p = 1.

Figure 4 :
Figure 4: (a) Example graph from the data set and (b) its QAOA objective function for the MaxCut problem in noise-free simulation.

Figure 5 :
Figure 5: Computed objective functions on ibmq ehningen as parameter landscapes with the same color scale as the simulated objective function in Figure 4b.

Figure 7 :
Figure 7: Mean value of a metric surrounded with a percentile interval of 80%.

Figure 8 :Figure 9 :
Figure8: Boxplots of the expectation ratio for each variant with median value.The whiskers extend from the box by 1.5 times the interquartile range, and points outside this range are drawn as outliers[74].

Figure 10 :
Figure 10: Correlation between metric and the increase in expectation ratio compared to the MMS.The black line visualizes the regression line surrounded by a confidence interval of 0.95 for the regression estimate.

Table 1 :
Statistics about the data set: (a) graphs and (b) quantum devices.
cut-QAOA QAOA (e) Average pairwise cosine similarity of gradients Figure 6: Metrics for example graph as functions of shots