Unbalanced penalization: a new approach to encode inequality constraints of combinatorial problems for quantum optimization algorithms

Solving combinatorial optimization problems of the kind that can be codified by quadratic unconstrained binary optimization (QUBO) is a promising application of quantum computation. Some problems of this class suitable for practical applications such as the traveling salesman problem (TSP), the bin packing problem (BPP), or the knapsack problem (KP) have inequality constraints that require a particular cost function encoding. The common approach is the use of slack variables to represent the inequality constraints in the cost function. However, the use of slack variables considerably increases the number of qubits and operations required to solve these problems using quantum devices. In this work, we present an alternative method that does not require extra slack variables and consists of using an unbalanced penalization function to represent the inequality constraints in the QUBO. This function is characterized by larger penalization when the inequality constraint is not achieved than when it is. We evaluate our approach on the TSP, BPP, and KP, successfully encoding the optimal solution of the original optimization problem near the ground state cost Hamiltonian. Additionally, we employ D-Wave Advantage and D-Wave hybrid solvers to solve the BPP, surpassing the performance of the slack variables approach by achieving solutions for up to 29 items, whereas the slack variables approach only handles up to 11 items. This new approach can be used to solve combinatorial problems with inequality constraints with a reduced number of resources compared to the slack variables approach using quantum annealing or variational quantum algorithms.


I. INTRODUCTION
The exploration of quantum enhancement in combinatorial optimization problems has been extended over several years.There are three main reasons for this; first, many combinatorial optimization problems can be encoded in Hamiltonians, the ground state being the optimal solution of the problem [1,2]; second, they are commonly hard to solve and have practical applications [3]; and third, the quantum algorithms to solve these problems need few resources and can be tested on current state-of-the-art quantum hardware [4,5].
The usual approach for encoding combinatorial optimization problems on quantum processing units (QPUs) is to transform them in their quadratic unconstrained binary optimization (QUBO) representation and obtain the cost Hamiltonian after a change of variables.In the QUBO encoding, the constraints of the combinatorial optimization problems are added to the cost function as penalization terms along with the objective function.
However, we are currently in a noisy intermediatescale quantum (NISQ) stage and QPUs are restricted in terms of qubits number, noise level, and circuit depth [6].The main consequence is that numerous problems solvable by quantum algorithms such as Shor's algorithm [7,8] remain unpractical to test on real quantum hardware.Despite this fact, solving combinatorial optimization problems is possible on current hardware.For example, on gate-based QPUs, Variational Quantum Algorithms (VQA) [9] are applicable even with the above limitations thanks to their short depth and resistance to noise [10].Besides, quantum annealers, using the quantum annealing (QA) principle [11][12][13], are another quantum technology used to solve combinatorial optimization problems [14][15][16].In this respect, QA shows some advantages under certain conditions compared to classical annealing [17][18][19].
Of the aforementioned VQA algorithms, the most studied for combinatorial optimization problems is the Quantum Approximate Optimization algorithm (QAOA) [20].Even if it is still unclear whether QAOA will give any advantage compared with other algorithms [21], its simplicity makes it of great interest for analytical and practical purposes [4].In the simplest QAOA version, the cost Hamiltonian of a combinatorial optimization problem is encoded in a parametric unitary gate along with a "mixer", a second parametric unitary gate that does not commute with the first unitary gate.In this context, the parameters are adjusted to minimize the expectation value of the cost Hamiltonian using a classical optimizer.Multiple approaches for solving combinatorial optimization problems using QAOA or QA can be found in the literature, for example, in logistics [22], finance [5,[23][24][25], energy [26], communications [27], automotive industry [28], traffic signaling [29], among others.
From the different sets of problems that can be solved using quantum hardware, the ones with inequality constraints require an extra number of variables to get their QUBO representation.For example, the BPP and the TSP have many inequality constraints that increase with the number of items and cities, respectively.The usual approach implemented in software development kits (SDKs) such as Qiskit [30] or D-Wave Ocean [31] is to use slack variables to encode the inequality constraints [32].Still, such an approach increments the number of qubits and connections needed to solve these problems.We remark that an alternative approach to handle inequality constraints with penalization would be to develop direct algorithms that respect the constraints by construction, as illustrated in [33] for the asset exchange problem.
In this paper, we propose an alternative method to encode inequality constraints in the QUBO formulation of combinatorial optimization problems.In this new heuristic encoding method, inequality constraints are encoded using an unbalanced penalization formula.This formulation adds larger penalization when the inequality constraint is not fulfilled than when it is.We test our method on the TSP, the BPP, and the KP using OpenQAOA [34], a python-based library developed by Entropica Labs, and the Jülich universal quantum computer simulator (JUQCS) [35][36][37] for problems with up to 43 qubits.In the majority of the cases, the optimal solution of the original combinatorial optimization problem is encoded in the ground state of the cost Hamiltonian of their respective QUBO.However, in some cases, the optimal solution is in the vicinity of the ground state.For such cases, our method still provides a good approximation given the whole set of eigenvalues.
Additionally, we test the performance for finding solutions for the BPP using the D-Wave Hybrid solver and the quantum annealer D-Wave Advantage 5.3 System JUPSI, a quantum computing system developed by D-Wave Systems Inc that has more than 5000 qubits located in Jülich, Germany.Our findings demonstrate that our method outperforms existing approach in terms of both the quantity and quality of solutions obtained for the BPP.Specifically, using the unbalanced penalization approach, we achieve remarkable results on both systems.With the D-Wave Advantage system, we obtain solutions for up to 7 nodes, while with the D-Wave Hybrid solver, we achieve solutions for up to 29 nodes.In contrast, the traditional slack variables approach only handles solutions for up to 7 and 11 nodes for the D-Wave Advantage and D-Wave Hybrid, respectively.
The rest of the paper is organized as follows.Sec.II provides a description of the implementation of the com-binatorial optimization problems using QUBOs and an overview of the TSP, BPP, and KP.Section III presents a description of the unbalanced penalization approach and a new metric called the coefficient of performance (CoP) to study the efficiency of QAOA to solve combinatorial optimization problems.Section IV presents the results and discussion.Finally, Section V provides some conclusions.The source code for the new method setup and implementation on the TSP, BPP, and KP can be found at https://jugit.fz-juelich.de/qip/unbalanced-penalizations-qubo.

II. QUADRATIC UNCONSTRAINED BINARY OPTIMIZATION
The set of combinatorial problems that can be represented by the QUBO formulation are characterized by functions of the form where n is the number of variables, q ij ∈ R are coefficients associated to the specific problem, and x i ∈ {0, 1} are the binary variables of the problem.Note that x i x i ≡ x i and q ij = q ji in this formulation.Therefore, the general form of a combinatorial optimization problem solvable by QPUs is given by the cost function and equality constraints are given by and inequality constraints are given by To transform these problems into the QUBO formulation the constraints are added as penalization terms.In this respect, the equality constraints are included in the cost function using the following penalization term where λ 0 is a penalization coefficient that should be chosen to guarantee that the equality constraint is fulfilled and C is a constant value.In the case of inequality constraint, the common approach is to use a slack variable [32,38].The slack variable, S, is an auxiliary variable that makes a penalization term vanish when the inequality constraint is achieved.
Therefore, when Eq.( 4) is satisfied, Eq.( 6) is already zero.This means the slack variable, S, must be in the range 0 ≤ S ≤ B − min x n i=1 l i x i .To represent the slack variable in binary form, the slack is decomposed in N = ⌊log 2 (max x B − n i=1 l i x i ) + 1⌋ binary variables: where s k are the slack binary variables.Then, the inequality constraints are added as penalization terms by Combining Eq.( 2) and the two kinds of constraints Eq.( 5) and Eq.( 8), the general QUBO representation of a given combinatorial optimization problem is given by min Following the same principle, more constraints can be added and note that after some manipulations, Eq.( 9) can be rewritten in the form of Eq.( 2).The last step to represent the QUBO problem on QPUs is to change the x i variables to spin variables z i ∈ {1, −1} by the transformation x i = (1 − z i )/2.Hence, Eq.( 2) represented in terms of the cost Hamiltonian model reads

A. Traveling salesman problem
In the TSP, a traveler has to stop by a set of cities, finding the shortest route to visit every city once and return to the starting point.This problem is normally solved classically for thousands to millions of variables using heuristic techniques [39,40].For instance, the Concorde TSP solver [41] can solve all 110 problems from TSPLIB [42] to optimality, the largest having 85000 cities.
Different formulations of the TSP exist, but in this work, we focus on the Dantzig-Fulkerson-Johnson (DFJ) formulation [43], which is the linear programming (LP) version of the TSP.Note that while there are other formulations of this problem that do not require inequality constraints, e.g., [1], the DFJ formulation is used here to test the capabilities of the unbalanced penalization method to encode multiple inequality constraints.In this formulation, the problem is given by subject to the set of constraints, n i=1,i̸ =j n j=1,j̸ =i i∈Q j̸ =i,j∈Q where n is the number of cities (nodes), c ij is the distance between the cities i and j, x ij is a binary variable that represents if the path from the city i to the city j is used or not, and Q represents a sub-tour on a set of cities.From the above equations, Eq.( 11) is the cost function for the distance traveled, Eq.( 12) and Eq.( 13) restrict the traveler to take only one path to enter and one to leave a city, and Eq.( 14) are the inequality constraints that prohibit sub-tours in the solution.Of all the inequality constraints analyzed in this work, constraint Eq.( 14) is the most expensive in terms of the number of slack variables.Fig. 1 (a) shows the number of qubits needed to represent the TSP for a different number of cities.
For this work, we randomly place n cities on a 50x50 grid with c ij being the Euclidian distance between cities.We vary from 2 to 7 cities and for problems up to 5 cities, we generate 10 random problems to test the generalization of our method.

B. Knapsack problem
In the KP, a set of items with associated weights and values should be stored in a knapsack.The problem is to maximize the value of the items (nodes) transported in the knapsack.The KP is restricted by the maximum weight the knapsack can carry.Typically, this problem is solved classically for thousands of variables and accepts pseudo-polynomial algorithms, i.e., an algorithm with complexity bounded by a polynomial in the number of items and the maximum weight [44].
KP is the simplest nontrivial integer programming model with binary variables, only one constraint, and positive coefficients.It is formally defined by max n i=1 where n is the number of items, p i and w i are the value and weight of the ith item, respectively, x i is the binary variable that represents whether the ith item is in the knapsack or not, and W is the maximum weight that the knapsack can transport.Fig. 1 (c) shows the number of qubits needed to solve the KP for a different number of items.The set of problems is created with weights and values selected randomly ranging between 1 and 63 for the values, between 1 and 127 for the weights, and the knapsack maximum weight is equal to 70% of the sum of the weights of all the items.

C. Bin packing problem
In the BPP, a collection of items must be stored in the minimum possible number of bins.In this case, the items have a weight associated and the bins are restricted to carry a maximum weight.Commonly, this problem accepts hundreds of bins, but it has been shown to be NP-Hard in the strongest sense, i.e., there is no pseudopolynomial algorithm to solve this problem [44].The problem has many real-world applications such as loading trucks with a weight restriction [45], container scheduling [46], or design of FPGA chips [47].Its formulation is given by min subject to: where n is the number of items (nodes), m is the number of bins, w i is the i-th item weight, B is the maximum weight of each bin, x ij and y j are binary variables that represent if the item i is in the bin j, and whether bin j is used or not, respectively.From the above equations, Eq.( 18) is the cost function to minimize the number of bins, Eq.( 19) is the inequality constraint for the maximum weight of a bin, Eq.( 20) is the equality constraint to restrict that an item is only in one of the bins, and Eqs.( 21) and (22) means that y i and x ij are binary variables.Fig. 1 (b) shows the number of variables needed to solve the BPP when the number of bins equals the number of items.The weights are chosen randomly with values between 4 and 20 and the maximum bin weight is equal to 20.Without losing generality, two further simplifications are made, the first assigns the first item to the first bin x 11 = 1 and therefore x 1j = 0 ∀ j ∈ {2, ..., m}.Second replaces y j = 1 ∀ j ∈ {1, 2, ..., N bin min } with N bin min = ⌈( n i=1 s(i)⌉)/B because the minimum of bins required is known.
An estimate of the number of slack variables needed to represent the QUBO of the TSP, the BPP, and the KP is shown in Fig. 2. In the TSP, the number of slack variables increases exponentially with the number of cities added.The BPP requires many slack binary variables increasing proportionally to the number of bins of the problem.Lastly, in the KP, the number of slack variables needed is constant and depends on the maximum weight allowed in the knapsack.

III. UNBALANCED PENALIZATION
The unbalanced penalization approach we propose is an approximation for including inequality constraints of combinatorial optimization problems in the QUBO representation of the cost function.This method does not require additional slack variables, and the inequality constraint of Eq. ( 4) given by is replaced by a penalization term in the QUBO formulation that makes larger penalization for negative terms and smaller for positive ones.The idea of the unbalanced penalization function comes from the shape of the exponential decay curve, f (x) = e −h(x) .In this function, positive values of h(x), given by Eq. ( 23), make f (x) ≈ 0 while negative values make it grow exponentially, as shown in Fig. 3.However, the exponential function cannot be encoded as a QUBO penalization term.Therefore, we consider an expansion of the exponential function to quadratic order.The penalization term is given by In general, we modify Eq. ( 24) to include free parameters to be adjusted for the different kinds of problems.Therefore, Eq. ( 24) is rewritten by where λ 1,2 are a set of multipliers that can be tuned for the specific problem and the constant term is removed because the position of the cost function's minimum is independent of the constant term.Note that this approach can be extended to g x) .The new cost function based on Eq. ( 9) and the unbalanced penalization approach is given by min To tune the λ 0,1,2 parameters of Eq. ( 27) for the TSP, the BPP, and the KP, we use the Nelder-Mead optimization method to explore the set of λ 0,1,2 values that brings the optimal solution of the original combinatorial optimization problems as close as possible to the ground state of the cost Hamiltonian, Eq. ( 10), of their QUBO representation.For the TSP we use a random problem with 4 cities, for the BPP we use a random problem with 4 items, and for the KP we use a problem with 13 items.The set of values λ 0,1,2 found by the Nelder-Mead optimization method for each problem is presented in Table I.After we find the best set of values for the specific problem, we test if those values generalize to other random cases.In the random cases used, we find that a set of λ 0,1,2 of one problem works well for different and/or larger problems.Therefore, we use a unique set of these Comparison between e −h(x) and the unbalanced function 1 parameters for each problem.However, this is not necessarily the case for other instances of the same problem with a different distribution of parameters.For example, the TSP has some instances that are harder to solve [48].In general, one can repeat the procedure mentioned above with multiple cases of the specific distribution to estimate a good set of λ 0,1,2 parameters on the new distribution.The results on these aspects of the unbalanced penalization method are presented in Sec.IV.Quantum annealing is a computational technique designed to tackle optimization problems by leveraging quantum mechanical effects in the search for low-energy states of Ising Hamiltonians [49,50].By employing this method, the system can explore a broader solution space and potentially find good-quality solutions more efficiently than classical optimization approaches.
In order to validate the effectiveness of our proposed method on real hardware, we conducted experiments using the D-Wave Advantage and the D-Wave Hybrid Quantum-Classical solver to obtain results for the BPP.To ensure a fair comparison, we selected random BPP problems with the same settings as described in Section II C, without any simplifications.
For the experiments using the D-Wave Advantage, we conducted 20 trials on random problem instances ranging from 3 to 8 nodes (items).Each trial consisted of 5000 samples.For the hybrid solver, we performed 20 experiments with a single run, allowing a maximum execution time of 3 seconds.The problem instances for the hybrid solver ranged from 5 to 31 nodes (items).
In both cases, we utilized the λ 0,1,2 parameters as presented in Table I.It is important to note that our objective was not to fine-tune the hyperparameters for either solver but rather to demonstrate that under similar conditions, unbalanced penalization yields significantly better solutions in terms of both quantity and quality.Results are presented in Sec.IV.

Quantum Approximate Optimization Algorithm
To test the difference between the slack variables approach vs. the unbalanced penalization encoding on different random problems of the TSP, the BPP, and the KP, we use the QAOA with one layer.It is important to acknowledge that, while the impact of penalization terms of the constraints remains uncertain in terms of their influence on the performance of the Quantum Approximate Optimization Algorithm (QAOA) [51], we employ QAOA to demonstrate the contrasting probability distributions resulting from the two approaches.In this case, the cost Hamiltonian, H c , obtained from the QUBO formulation, is translated into a parametric unitary gate given by where γ is a parameter to be optimized.A second unitary operator applied is where β is the second parameter that must be optimized and X = n i=1 σ x i with σ x i the Pauli-x quantum gate applied to qubit i.The general QAOA circuit is shown in Fig. 4. Here, R X (θ) = e −i θ 2 σx , p represents the number of repetitions of the unitary gates Eqs.28 and 29 with each repetition having separate values for γ p and β p , and the initial state is a superposition state |+⟩ ⊗n .The case of QAOA with p = 1 is suitable for visualization of the landscape and in Sec.IV, we show a comparison between both encodings and even though they are similar, the addition of extra qubits for the slack variables approach has a great impact on the probability of finding the optimal solution of the original combinatorial optimization problem.

Coefficient of performance
Additionally, we introduce the coefficient of performance (CoP), a quantifier that expresses how good a |+i < l a t e x i t s h a 1 _ b a s e 6 4 = " D 1 i l 5 S o +

e x i t s h a 1 _ b a s e 6 4 = " D 1 i l 5 S + + P U W w 7 F h o I J H 1 + C c i N E = " >
e W g 6 O d I j t e r N x f + 8 X q q j W z + j I k k 1 E X i 5 K E q Z r W N 7 n o A 9 o J J g z a a G I C y p u d X G I y Q R 1 i a n k g n B X X 1 5 n b T r N f e q V n + 4 r j T u 8 j i K c A b n U A U X b q A B 9 9 A C D z A 8 w T O 8 w p s 1 s V 6 s d + t j 2 V q w 8 p l T + A P r 8 w f K X 5 F 2 < / l a t e x i t > R X (2 p ) < l a t e x i t s h a 1 _ b a s e 6 4 = " x s C c 2 i X B 8 D X k P t g 6

S n H P N S d H K m h X P R y 8 T + v l 6 n o y p / Q O M 0 U i f H 8 o y h j U C U w z w T 2 q S B Y s b E m C A u q d 4 V 4 i A T C S i d n 6 h C c x Z O X S b t e c 8 5 r 9 d u L c u O 6 i K M E j s E p q A I H X I I G u A E t 4 A I M H s E z e A V v x p P x Y r w b H
< l a t e x i t s h a 1 _ b a s e 6 4 = " x s C c 2 i X B 8 D X k P t g 6 n a J g R 3 c e V l 0 q y U 3 b N y 5 e a 8 U L 2 a x 5 F D x + g E l Z C L L l A V X a M 6 a i C K H t A T e k G v 1 q P 1 b L 1 Z 7 7 P S F W v e c 4 T + w P r 8 B k N y n p E = < / l a t e x i t > FIG. 4. Schematic representation of QAOA for p layers.The parameters γ and β for each layer are the ones to be optimized.QAOA is used to compare the slack variables method vs. the unbalanced penalization method for the encoding of the inequality constraints.
quantum algorithm is to give a specific solution compared to random guessing over a set of solutions.In this perspective, we use the CoP to determine how likely it is to find the optimal solution for the TSP, the BPP, and the KP on the local minima of the QAOA algorithm for p = 1.The CoP is given by where P ( * x) is the probability of getting a bit string * x using a quantum algorithm, e.g, QAOA, and P R = 1/2 n is the probability of random guessing with n equal to the number of qubits involved in the problem.

Optimal solution distribution
To determine the applicability of our method, we sorted the energy eigenvalues of the cost Hamiltonian for the different combinatorial optimization problems.The parameter of interest is the position of the energy eigenvalue that describes the optimal solution of the original combinatorial optimization problems within the list of all sorted energy eigenvalues of the cost Hamiltonian of their QUBO representation.If the position of the optimal solution is far from the ground energy of the cost Hamiltonian, the unbalanced penalization encoding is not working as expected.Otherwise, if the energy eigenvalue corresponding to the optimal solution is close to the ground state energy given the total number of eigenvalues, the unbalanced encoding is advantageous.
Figure 5 shows the energy eigenvalues of 10 random cases for the TSP with 5 nodes using the unbalanced penalization encoding approach.The eigenvalues are sorted in terms of their energy from minimum to maximum with the x-axis representing the eigenvalue position (number) of eigenvalues, and the y-axis its energy.In the inset, showing the lowest 25 energy eigenvalues, the big circles represent where the optimal result is positioned compared to all the other eigenvalues (small circles) of the cost Hamiltonian.Here, the worst eigenvalue position is 7 out of 2 20 = 1048576 eigenvalues.This means that in the worst scenario, the optimal solution is among the lowest 0.00057% of all eigenvalues.FIG. 5. Eigenvalues distribution for the TSP with 5 cities (20 qubits) for 10 randomly generated problems using the unbalanced penalization method.The inset shows the lowest 25 energy eigenvalues.The big circles are the optimal solutions for the random problem and the small circles are the different eigenvalues of the cost Hamiltonian of its QUBO representation.Note that each eigenvalue is degenerate with multiplicity two, because the problem is symmetric, e.g., one clockwise solution and another anti-clockwise.
From Fig. 5 it is seen that the unbalanced penalization approach does not ensure that the optimal solution is the lowest eigenvalue of the cost Hamiltonian, but in all the cases analyzed, it is very close to it.For example, in Fig. 6, this feature is illustrated for one of the random problems of Fig. 5. Here, there are a couple of eigenvalues that do not fulfill all the restrictions of the problem but have lower energy than the optimal solution.In the end, the unbalanced penalization encoding is a trade-off between adding slack variables and ensuring the ground state of the cost Hamiltonian corresponds to the optimal solution of the combinatorial optimization problem, or reducing the number of variables but encoding the optimal solution in the vicinity of the ground state.In many cases, there is no preference for encoding the optimal solution in the lowest eigenvalue of the cost Hamiltonian.For instance, QAOA brings the expectation value of a cost Hamiltonian to a region of overall minimum energy, so we can expect that the optimal and sub-optimal so-lutions occur with reasonable probability in regions of minimal energy.
FIG. 6.The first 25 sorted eigenvalues of the Hamiltonian Hc corresponding to the worst case shown in Fig. 5.The big circle corresponds to the optimal solution of this problem and the small circles are the other eigenvalues.The solution shown in the red inset is the ground state, which is an invalid solution of the TSP.The solution in the green inset is the optimal solution.Fig. 7 shows the energy eigenvalues sorted for the cost Hamiltonian of the BPP for 10 random problems with 5 items.In the inset, showing the 40 lowest eigenvalues, the optimal solutions of the cost Hamiltonians are depicted as big circles and the small circles are the other eigenvalues of the cost Hamiltonian.In this case, there is an increasing number of degeneracies explained by the symmetries of the problem (exchange items of one bin with the others).To keep the same number of qubits (21) for all the problems, we select 10 random cases that require more than 3 bins to store the items.Here, the worst case is in position 36 out of 2 21 =2097152 eigenvalues which means it is within 0.00124%, of all eigenvalues.
Finally, Fig. 8 shows the eigenvalue distribution for 10 random cases of the KP with 21 items (21 qubits).In this case, there are no degeneracies, but the possible solutions are close to each other.The inset shows the first 50 eigenvalues.In this case, for one of the random cases, the optimal solution is located at position 49 out of 2 21 =2097152 eigenvalues, which means it is within 0.0023% of the first eigenvalues.In this case, the parameters λ 0,1,2 were optimized for a random case with 13 items, which suggests the great generalization capabilities of the current method.

Quantum Annealing
Figure 9 illustrates the results obtained using the D-Wave Advantage.The upper x-axis in this figure, as well as in subsequent plots, represents the logical variables required to represent each problem size, as per the for-FIG.7. Eigenvalue distribution for the BPP with 5 items (21 qubits) for 10 randomly generated problems using the unbalanced inequality-constrained penalization method.The inset shows the 40 lowest energy eigenvalues.The big circles are the optimal solutions for the random problem and the small circles are the different eigenvalues for the cost Hamiltonian of its QUBO representation.Note that each eigenvalue has a different degeneracy because the problem has some symmetries.
FIG. 8. Eigenvalue distribution for the KP with 21 items (21 qubits) for 10 randomly generated problems using the unbalanced penalization method.The inset shows the 50 lowest eigenvalues.The big circles give the position of the optimal solutions for the random problems and the small circles depict the position of the different eigenvalues of the cost Hamiltonian.In this case, there are no degeneracies but the energies are close to each other.mulation provided in Section II C. Across all cases, the unbalanced penalization approach consistently generates more optimal solutions compared to the slack variables approach.Although the optimal solution cannot be guaranteed to be the ground state of the Hamiltonian, more optimal solutions found for the unbalanced penalization method.
For the scenario involving 8 nodes, neither of the two methods were able to find valid solutions.However, it is important to acknowledge that the number of statistical samples used in these experiments might not have been sufficient to draw conclusions about the limit beyond which no further solutions are found on the quantum annealer.
FIG. 9. D-Wave Advantage success rate of finding optimal solutions for different cases of the BPP ranging from 3 to 8 nodes (items).The 20 small circles (squares) for each problem size consist of 5000 samples for the unbalanced (slack) method.
Fig. 10 presents the results of the same experiments depicted in Fig. 9, focusing on the number of valid solutions obtained.Consistent with the observations in Fig. 9, the employment of the unbalanced penalization method enhances the likelihood of discovering valid solutions compared to the slack approach.However, it is important to note that in the case of 7 nodes, where the slack approach appears to exhibit superior performance, more samples of valid solutions (than could be produced by the machine) would be needed to draw a solid conclusion.
In order to investigate the viability of finding valid solutions for larger problem sizes, we employed the D-Wave Hybrid solver.We conducted experiments using problem sizes ranging from 5 to 31 nodes (items), with intervals of 2 and executed 20 random problem instances for each case.Each problem instance was solved using a single run on the D-Wave Hybrid solver, with a maximum time limit of 3 seconds.
Figure 11 depicts the probability of finding optimal solutions across the 20 random problems for each node case.Similar to the observations with the D-Wave Advantage, the unbalanced penalization method consistently outperforms the slack approach.It successfully discovers optimal solutions for nearly all cases, with the exception of 19, 23, 27, and 31 nodes.Conversely, the slack approach fails to find optimal solutions after 7 nodes.
Figure 12 illustrates the probability of finding valid so- lutions using both the unbalanced penalization method and the slack variables approach on the D-Wave Advantage.Consistent with the findings in Figure 11, the unbalanced penalization method demonstrates superior performance compared to the slack variables approach.Up to 17 nodes, the unbalanced penalization method consistently discovers valid solutions, whereas the slack variables approach only manages to find valid solutions up to 11 nodes.Beyond 17 nodes, the probability of finding valid solutions gradually declines for the unbalanced penalization method, with no solutions being found for the 31-node case.This behavior can be attributed to the fact that the maximum time limit remains constant throughout the experiments.
Even though increasing the time will impact positively on improving the performance of both methods, it is important to reiterate that the primary objective of this section is not to provide high-quality solutions but rather to evaluate the performance of both methods under similar conditions.

QAOA landscape
In this section, the results of the cost Hamiltonian energy landscape using the QAOA algorithm with one layer are presented for the TSP, the BPP, and the KP.The slack variables method and the unbalanced penalization method are used for the cost Hamiltonian encoding.For visualization purposes, the energy landscape for all cases is limited to the region of γ ∈ (−π/ max{q ij }, 0) where q ij is given in Eq. ( 10) and β ∈ (−π/2, 0).Fig. 13 shows the energy landscape of the QAOA algorithm for a TSP with 4 cities.This is equivalent to a 17 and 12 qubits problem for the slack variables and the unbalanced penalization encodings, respectively.The probabilities for finding the optimal solutions of the TSP are 0.02% and 0.227% for the slack variables and the unbalanced encodings, respectively.This shows a clear advantage of the unbalanced encoding method that increases the probability more than 10 times compared to the probability obtained with the slack variables encod-ing.This characteristic is mainly attributed to the difference in the number of qubits needed to represent the problem and thus the size of the set of possible solutions.For the slack variables encoding, there are 131072 (2 17 ) possible solutions while for the unbalanced penalization encoding, there are 4096 (2 12 ) possible solutions.Fig. 14 shows the energy landscape of the QAOA algorithm for a BPP with 3 items, which is equivalent to a 6 and 19 qubits problem for the unbalanced and slack encodings, respectively.The probability of finding the optimal solution of the BPP in the local minima of QAOA is 0.001% for the slack variables encoding and 10.66% for the unbalanced penalization method.Here, the optimal solution is 10000 times more likely to be found using the unbalanced penalization approach than using the slack variables approach.Finally, Fig. 15 shows the energy landscape of the QAOA algorithm for a KP with 10 items, which is equivalent to an 18 and 10 qubits problem for the slack variables and the unbalanced penalization encoding, respectively.Here, the probability of finding the optimal solution of the KP is 0.001% and 0.346% for the slack variables and the unbalanced penalization encoding.This means finding the solution with the unbalanced penalization method is 346 times more likely than with the slack variables encoding.

The coefficient of performance
Figure 16 shows the CoP for different problem sizes for the TSP, BPP, and KP using the unbalanced penalization method.All the cases show the probability of finding the optimal solution for each problem using QAOA with 1 layer in the minimum point of the energy landscape region β ∈ (−π/2, 0) and γ ∈ (−π/ max{q ij }, 0).At this point, we do not use a classical optimizer to find the optimal beta and gamma, but instead, we divide the beta and gamma region in a 50x50 grid landscape and take the probability of the optimal solution at the minimum energy.It is expected that using an optimization method for this step improves the CoP while keeping the same tendency.Fig. 16 (a) shows the CoP for the TSP problem with 2, 3, 4, 5, 6, and 7 cities (2, 6, 12, 20, 30, and 42 qubits).For the cases with 30 and 42 qubits, we use JUQCS [35,36] in its GPU-accelerated version [37] to obtain the results.Note that the same TSP cases using the slack variables approach will require 2, 7, 17, 36, 73, 148, and 304 qubits.We thus explore problem sizes (5, 6, and 7 cities) with the unbalanced penalization approach that are unfeasible using the slack approach on quantum computer simulators.Interestingly, the CoP increases exponentially with the problem size, which means that the probability of finding the optimal solution exponentially increases when we compare it with random guessing in the set of possible configurations (cf.[52]).
Fig. 16 (b) shows the CoP for the BPP for 2, 3, 4, 5, 6, and 7 items (2, 6, 13, 21, 32, and 43 qubits).For this case, the CoP is improving even more rapidly compared to TSP and also here for the BPP with 6 and 7 items, the solutions using slack variables (61 and 78 qubits) are beyond the range of what can be simulated with quantum computer simulators.The largest simulations up to 43 qubits were performed on JUWELS Booster [53] as they required more than 1/8 PiB of distributed memory and took more than one million core hours.Finally, Fig. 16 (c) shows the CoP for the KP with 5 to 41 items (5 to 41 qubits) with increments of two (same number of qubits).In this case, the CoP increases linearly with the number of qubits.This is a poor performance if we compare it with the TSP and BPP.We suspect the improved CoP is related to the number of constraints of the problem.

V. CONCLUSIONS
We have presented unbalanced penalization, a new method to encode inequality constraints of combinatorial optimization problems into QUBO penalizations.The method does not require extra slack variables to encode the inequality constraints.This is extremely beneficial because there is no increase in the number of variables or qubits needed to encode the problems, especially for the TSP and the BPP.
The method is suitable for VQA on gate-based quantum computers and QA on quantum annealers, which both require expressing the combinatorial optimization problems in terms of QUBOs.We have tested the method using QA for the BPP and QAOA with 1 layer for the TSP, BPP, and KP.We have shown that in 10 random cases for different problem sizes the optimal solution of the combinatorial optimization problem is located in the vicinity of the ground state energy of the cost Hamiltonian.The method is highly generalizable in the sense that the tuned parameters λ 0,1,2 for small problems generalize well to larger problems.For QA, we have demonstrated that the unbalanced penalization method outperforms the slack variables approach, finding better solutions in terms of quality and quantity.Additionally, the probabilities to find the optimal solutions among the local minima of the unbalanced penalization (Figs. 13 to 15) are much better than using slack variables.Qualitatively, this can be understood because the addition of slack variables increases the search space and therefore reduces the probability of finding optimal and nearlyoptimal solutions.
The proposed method was evaluated on both the D-Wave Advantage and D-Wave Hybrid solvers.Our findings demonstrate that the method utilizing unbalanced penalization consistently outperforms the slack variables approach in solving the BPP.Specifically, the unbalanced penalization method successfully finds solutions for problem instances with up to 31 items using the Hybrid solver, while the slack variables approach achieves solutions for only up to 11 items.We remark that additional evaluations of this method on D-Wave systems for the TSP support the same conclusions [54].
We have tested the unbalanced penalization method close to the limits of what is possible to be simulated on supercomputers using JUQCS.For the case of the TSP, we went up to 42 qubits, for the BPP up to 43 qubits, and for the KP up to 41 qubits.Indeed, we are exploring regions unfeasible using the slack variables method.For example, for the TSP with 6 and 7 cities using the slack variables method it requires 73 and 148 qubits, and for the BPP with 6 and 7 items it requires 61 and 78 qubits.In terms of probability, the unbalanced penalization method shows large advantages for the small problems we made a comparison for.For the TSP with 4 cities, the probability of finding the optimal solution at the local minimum energy of the QAOA with p = 1 using the unbalanced penalization method is more than 10 times larger than the one using the slack variables approach.For the BPP with 3 items, the probability of finding the optimal solution with the unbalanced penalization method is more than 10000 times larger, and for the KP with 10 items, it is almost 346 higher.Finally, we have presented a new metric, the CoP, which compares the probability of obtaining a specific solution using a quantum algorithm against random guessing over the set of possible solutions.The CoP is useful to characterize how an algorithm performs with increasing problem size or to compare how different combinatorial optimization problems perform for a specific algorithm.We have shown the CoP for the TSP, BPP, and KP using QAOA with p = 1 in the region γ ∈ (−π/ max{q ij }, 0) and β ∈ (−π/2, 0).Interestingly, for all of them, the CoP increases with the problem size.For the KP the increase is linear while for the TSP and BPP, the increase is exponential.Further research is worthwhile to see if this provides an implicit advantage of using QAOA to solve combinatorial optimization problems, what the characteristics of the TSP and the BPP are that make the CoP increase exponentially, and to see how increasing the number of QAOA layers improves the CoP.

FIG. 1 .
FIG. 1. Number of qubits needed to solve (a) the TSP for different numbers of cities, (b) the BPP for different numbers of items, and (c) the KP for different numbers of items.The solid line represents the number of variables of the problem and the dashed line represents the variables of the problem plus the slack variables needed to represent the inequality constraints of the problem for a different number of nodes.

FIG. 2 .
FIG.2.The number of slack variables needed to solve the TSP (diamonds), the BPP (circles), and the KP (triangles) based on the number of nodes.The solid lines are guides to the eye.
3 5 p x s 5 h D + w P n 8 A a 7 A k F Q = < / l a t e x i t > |+i < l a t e x i t s h a 1 _ b a s e 6 4 = " D 1 i l 5 S o + + P U W w 7 F h o I J H 1 + C c i N E = " > A A A B 8 H i c b V D L S g N B E O y N r x h f U Y 9 e F o M g C G E 3 C n o M e v E Y w T w k W c L s p D c Z M j O 7 z M w K I e Y r v H h Q x K u f 4 8 2 / c Z L s Q R M L G o q q b r q 7 w o Q z b T z v 2 8 m t r K 6 t b + Q 3 C 1 v b O 7 t 7 x f 2 D h o 5 T R b F O Y x 6 r V k g 0 c i a x b p j h 2 E o U E h F y b I b D m 6 n f f E S l W S z v z S j B Q J C + Z B G j x F j p 4 e m s o 4 j s c + w W S 1 7 Z m 8 F d J n 5 G S p C h 1 e H T x x T 6 z S c 6 N Y 2 Z L G n a m / J 8 Z E a D 0 S o e 0 U x A z 0 o j c V / / P a q Y m u g j G T S W p Q 0 v m i K O W u i d 3 p 9 2 6 P K a S G j y w h V D F 7 q 0 s H R B F q b E Y F G 4 K / + P I y a V T K / n m 5 c n d R q l 5 n c e T h C I 7 h F H y 4 h C r c Q g 3 q Q E H A M 7 z C m 6 O c F + f d + Z i 3 5 p x s 5 h D + w P n 8 A a 7 A k F Q = < / l a t e x i t > |+i < l a t e x i t s h a 1 _ b a s e 6 4 = " D 1 i l 5 S o + + P U W w 7 F h o e H T x x T 6 z S c 6 N Y 2 Z L G n a m / J 8 Z E a D 0 S o e 0 U x A z 0 o j c V / / P a q Y m u g j G T S W p Q 0 v m i K O W u i d 3 p 9 2 6 P K a S G j y w h V D F 7 q 0 s H R B F q b E Y F G 4 K / + P I y a V T K / n m 5 c n d R q l 5 n c e T h C I 7 h F H y 4 h C r c Q g 3 q Q E H A M 7 z C m 6 O c F + f d + Z i 3 5 p x s 5 h D + w P n 8 A a 7 A k F Q = < / l a t e x i t > |+i < l a t e x i t s h a 1 _ b a s e 6 4 = " D 1 i l 5 S o + + P U W w 7 F h o e H T x x T 6 z S c 6 N Y 2 Z L G n a m / J 8 Z E a D 0 S o e 0 U x A z 0 o j c V / / P a q Y m u g j G T S W p Q 0 v m i K O W u i d 3 p 9 2 6 P K a S G j y w h V D F 7 q 0 s H R B F q b E Y F G 4 K / + P I y a V T K / n m 5 c n d R q l 5 n c e T h C I 7 h F H y 4 h C r c Q g 3 q Q E H A M 7 z C m 6 O c F + f d + Z i 3 5 p x s 5 h D + w P n 8 A a 7 A k F Q = < / l a t e x i t > U C ( p ) < l a t e x i t s h a 1 _ b a s e 6 4 = " q U s t b l 3 7 W R K d I 0 D Y L A q c o 3 E J N t A = " > A A A B 9 H i c b V B N T 8 J A E J 3 i F + I X 6 t F L I z H B C 2 n R R I 9 E L h 4 x s U A C T b N d t r B h d 1 t 3 t y S k 4 X d 4 8 a A x X v 0 x 3 v w 3 L t C D g i + Z 5 O W 9 m c z M C x N G l X a c b 6 u w s b m 1 v V P c L e 3 t H x w e l Y 9 P 2 i p O J S Y e j l k s u y F S h F F B P E 0 1 I 9 1 E E s R D R j r h u D n 3 O x M i F Y 3 F o 5 4 m x O d o K G h E M d J G 8 r 2 g W e 0 P E e c o S C 6 D c s r h w 5 p x 7 m X t P m D I q l W 1 / G y u r a + s b m 6 U t c 3 t n d 2 / f O j h s y y Q T m L g 4 Y Y n o h k g S R m P i K q o Y 6 a a C I B 4 y 0 g l H z d zv P B A h a R L f q 3 F K f I 4 G M Y 0 o R k p L g X V S c Y N m 1 R s g z l G Q n p l 3 Q b c K 6 1 5 I V P 4 M r L J d s 2 e A y 8 Q p S B k U a A X W l 9 d P c M Z J r D B D U v Y c O 1 X + B A l F M S N T 0 8 s k S R E e o Q H p a R o j T q Q / m Z 0 x h R W t 9 G G U C F2 x g j P 1 9 8 Q E c S n H P N S d H K m h X P R y 8 T + v l 6 n o y p / Q O M 0 U i f H 8 o y h j U C U w z w T 2 q S B Y s b E m C A u q d 4 V 4 i A T C S i d n 6 h C c x Z O X S b t e c 8 5 r 9 d u L c u O 6 i K M E j s E p q A I H X I I G u A E t 4 A I M H s E z e A V v x p P x Y r w b H / P W F a O Y O Q J / Y H z + A P 2 m l u k = < / l a t e x i t > -R X (2 p ) < l a t e x i t s h a 1 _ b a s e 6 4 = " x s C c 2 i X B 8 D X k P t g 6 n H P N S d H K m h X P R y 8 T + v l 6 n o y p / Q O M 0 U i f H 8 o y h j U C U w z w T 2 q S B Y s b E m C A u q d 4 V 4 i A T C S i d n 6 h C c x Z O X S b t e c 8 5 r 9 d u L c u O 6 i K M E j s E p q A I H X I I G u A E t 4 A I M H s E z e A V v x p P x Y r w b H / P W F a O Y O Q J / Y H z + A P 2 m l u k = < / l a t e x i t > -• • • < l a t e x i t s h a 1 _ b a s e 6 4 = "L v g O d Q I N f W Z M c 6 M 9 6 t J Y p S F U o g Y = " > A A A C F 3 i c b V D L T g I x F O 3 4 R H y h L t 0 0 E h L c k B k 0 0 S W R j U s 0 D p A w Z N I p B R r a m U l 7 x 4 R M + A s 3 / o o b F x r j V n f + j Q U m U cG T t D k 9 5 9 7 c 3 h P E g m u w 7 S 9 r Z X V t f W M z t 5 X f 3 t n d 2 y 8 c H D Z 1 l C j K X B q J S L U D o p n g I X O B g 2 D t W D E i A 8 F a w a g + 9 V v 3 T G k e h X c w j l l X k k H I + 5 w S M J J f q J R c v 1 7 2 B k R K 4 s e n + d K t 3 y 7 j q h c w m L 0 9 2 o v g 5/ I L R b t i z 4 C X i Z O R I s r Q 8 A u f X i + i i W Q h U E G 0 7 j h 2 D N 2 U K O B U s E n e S z S L C R 2 R A e s Y G h L J d D e d 7 T X B J a P 0 c D 9 S 5 o S A Z + r v j p R I r c c y M J W S w F A v e l P x P 6 + T Q P + y m / I w T o C F d D 6 o n w g M E Z 6 G h H t c M Q p i b A i h i p u / Y j o k i l A w U e Z N C M 7 i y s u k W a 0 4 Z 5 X q z X m x d p X F k U P H 6 A S V k Y M u U A 1 d o w Z y E U U P 6 A m 9 o F f r 0 X q 2 3 q z 3 e e m K l f U c o T + w P r 4 B t 8 O d 1 A = = < / l a t e x i t > • • • < l a t e x i t s h a 1 _ b a s e 6 4 = " L v g O d Q I N f W Z M c 6 M 9 6 t J Y p S F U o g Y = " > A A A C F 3 i c b V D L T g I x F O 3 4 R H y h L t 0 0 E h L c k B k 0 0 S W R j U s 0 D p A w Z N I p B R r a m U l 7 x 4 R M + A s 3 / o o b F x r j V n f + j Q U m U cG T t D k 9 5 9 7 c 3 h P E g m u w 7 S 9 r Z X V t f W M z t 5 X f 3 t n d 2 y 8 c H D Z 1 l C j K X B q J S L U D o p n g I X O B g 2 D t W D E i A 8 F a w a g + 9 V v 3 T G k e h X c w j l l X k k H I + 5 w S M J J f q J R c v 1 7 2 B k R K 4 s e n + d K t 3 y 7 j q h c w m L 0 9 2 o v g 5/ I L R b t i z 4 C X i Z O R I s r Q 8 A u f X i + i i W Q h U E G 0 7 j h 2 D N 2 U K O B Us E n e S z S L C R 2 R A e s Y G h L J d D e d 7 T X B J a P 0 c D 9 S 5 o S A Z + r v j p R I r c c y M J W S w F A v e l P x P 6 + T Q P + y m / I w T o C F d D 6 o n w g M E Z 6 G h H t c M Q p i b A i h i p u / Y j o k i l A w U e Z N C M 7 i y s u k W a 0 4 Z 5 X q z X m x d p X F k U P H 6 A S V k Y M u U A 1 d o w Z y E U U P 6A m 9 o F f r 0 X q 2 3 q z 3 e e m K l f U c o T + w P r 4 B t 8 O d 1 A = = < / l a t e x i t >• • • < l a t e x i t s h a 1 _ b a s e 6 4 = " L v g O d Q I N f W Z M c 6 M 9 6 t J Y p S F U o g Y = " > A A A C F 3 i c b V D L T g I x F O 3 4 R H y h L t 0 0 E h L c k B k 0 0 S W R j U s 0 D p A w Z N I p B R r a m U l 7 x 4 R M + A s 3 / o o b F x r j V n f + j Q U m U cG T t D k 9 5 9 7 c 3 h P E g m u w 7 S 9 r Z X V t f W M z t 5 X f 3 t n d 2 y 8 c H D Z 1 l C j K X B q J S L U D o p n g I X O B g 2 D t W D E i A 8 F a w a g + 9 V v 3 T G k e h X c w j l l X k k H I + 5 w S M J J f q J R c v 1 7 2 B k R K 4 s e n + d K t 3 y 7 j q h c w m L 0 9 2 o v g 5/ I L R b t i z 4 C X i Z O R I s r Q 8 A u f X i + i i W Q h U E G 0 7 j h 2 D N 2 U K O B U s E n e S zS L C R 2 R A e s Y G h L J d D e d 7 T X B J a P 0 c D 9 S 5 o S A Z + r v j p R I r c c y M J W S w F A v e l P x P 6 + T Q P + y m / I w T o C F d D 6 o n w g M E Z 6 G h H t c M Q p i b A i h i p u / Y j o k i l A w U e Z N C M 7 i y s u k W a 0 4 Z 5 X q z X m x d p X F k U P H 6 A S V k Y M u U A 1 d o w Z y E U U P 6 A m 9 o F f r 0 X q 2 3 q z 3 e e m K l f U c o T + w P r 4 B t 8 O d 1 A = = < / l a t e x i t > p < l a t e x i t s h a 1 _ b a s e 6 4 = " R 7

FIG. 10
FIG. 10.D-Wave Advantage success rate of finding valid solutions for different cases of the BPP ranging from 3 to 8 nodes (items).

FIG. 12
FIG. 12. D-Wave Hybrid solver success rate of finding valid solutions for different cases of the BPP ranging from 5 to 31 nodes (items).Each circle (square) represents the mean value of 20 random problems with a maximum time for the hybrid solver of 3 seconds for the unbalanced (slack) method.

FIG. 13 .
FIG. 13.QAOA energy for p = 1 for the cost Hamiltonian ⟨Hc⟩ of the TSP with 4 cities with (a) slack variables encoding and (b) unbalanced penalization encoding.The minimum energy is indicated with a white dot.The probability that this minimum corresponds to the optimal solution of the TSP problem is P(*x) = 0.02% and P(*x)=0.227% for encoding (a) and (b), respectively.

FIG. 14 .
FIG. 14.Same as the Fig.13for the BPP with 3 items.The probability that this minimum corresponds to the optimal solution of the BPP problem is P(*x) = 0.001% and P(*x)=10.661%for encoding (a) and (b), respectively.

FIG. 15 .
FIG.15.Same as the Fig.13for the KP with 10 items.The probability that this minimum corresponds to the optimal solution of the KP problem is P(*x) = 0.001% and P(*x)=0.346% for encoding (a) and (b), respectively.

FIG. 16 .
FIG. 16.Coefficient of performance for (a) the TSP, (b) the BPP, and (c) the KP for different problem sizes.The error bars represent the standard deviation over 5 different random cases for problems with less than 22 qubits.For problems with more than 21 qubits the data points represent 1 random case.