Multiobjective variational quantum optimization for constrained problems: an application to cash handling

Combinatorial optimization problems are ubiquitous in industry. In addition to finding a solution with minimum cost, problems of high relevance involve a number of constraints that the solution must satisfy. Variational quantum algorithms (VQAs) have emerged as promising candidates for solving these problems in the noisy intermediate-scale quantum stage. However, the constraints are often complex enough to make their efficient mapping to quantum hardware difficult or even infeasible. An alternative standard approach is to transform the optimization problem to include these constraints as penalty terms, but this method involves additional hyperparameters and does not ensure that the constraints are satisfied due to the existence of local minima. In this paper, we introduce a new method for solving combinatorial optimization problems with challenging constraints using VQAs. We propose the multi-objective variational constrained optimizer (MOVCO) to classically update the variational parameters by a multiobjective optimization performed by a genetic algorithm. This optimization allows the algorithm to progressively sample only states within the in-constraints space, while optimizing the energy of these states. We test our proposal on a real-world problem with great relevance in finance: the cash handling problem. We introduce a novel mathematical formulation for this problem, and compare the performance of MOVCO versus a penalty based optimization. Our empirical results show a significant improvement in terms of the cost of the achieved solutions, but especially in the avoidance of local minima that do not satisfy any of the mandatory constraints


Introduction
Quantum computing holds the promise of a major impact on science and industry due to its capacity to solve complex problems.Some potential applications are the quantum simulation of many-body problems [1,2], numerical analysis tasks such as solving differential equations [3,4], machine learning [5,6,7], etc.A widely studied area among them is combinatorial optimization (CO) which consists of searching for an optimal solution among a finite set of elements [8].Finding the exact solution to these problems is in many cases a well-known NP-hard problem [9], i.e., it is not possible in polynomial time using classical computing.Some canonical examples are the traveling salesman problem [10] or maxcut [11,12].Although quantum computers may not be able to solve this task exactly in polynomial time either, heuristic quantum algorithms may still be able to achieve better approximate solutions to these problems in terms of speed, quality of the obtained solutions, or even resource savings.One of the leading quantum paradigms to tackle these problems in the near term are the Variational Quantum Algorithms (VQA) [13].Such algorithms use a hybrid quantum-classical approach to approximate the minimum energy state of a Hamiltonian that encodes our problem.The combination of quantum and classical computing allows the use of shallow quantum circuits suitable for the noisy intermediate-scale quantum (NISQ) era [14,15].CO arises in a wide range of industrial domains such as finance [16], logistics [17], artificial intelligence [18], supply chain [19] or drugs discovery [20].Therefore, even if they are potentially only approximations of the global optimum, better CO solutions have a significant practical value.Real-world combinatorial optimization problems usually involve not only the minimization of a cost function, but also a number of equality and inequalities hard constraints that must be satisfied by feasible solutions.In the context of quantum optimization, a desirable technique would be to implement constrained algorithms such as the Quantum Alternating Operator Ansatz, with quantum circuits capable of natively preserve the constraints [21,22].Some work has been done in this direction, including recent experimental demonstrations on text summarization [23].However, the constraints of realistic formulations of practical problems are often too difficult to be efficiently mapped to a quantum processor.In this scenario, the stateof-the-art strategy to force constraint satisfaction is to transform the original cost function by introducing penalty terms that artificially raise the energy of infeasible solutions [24].Although easily implementable, this technique does not guarantee the convergence of the algorithm to a solution that satisfies all constraints, and suffers from shortcomings such as the adjustment of instance-dependent penalty hyperparameters.Despite their critical relevance in practical scenarios, few studies have been conducted to explore new possibilities for general constraint encoding in VQAs.Indeed, studies on the performance of VQAs are usually performed using wellstudied theoretical problems that, although very valuable in addressing the potential of quantum algorithms, lack direct practical applicability in the industrial sector.
In this work, we propose the multiobjective variational constrained optimizer (MOVCO), a method for improving the convergence of variational quantum algorithms to optimal solutions satisfying a set of restrictions, and test its performance on an industrially relevant problem.MOVCO relies on a genetic algorithm to simultaneously optimize the projection of the variational wavefunction onto the subspace of solutions satisfying all constraints, and the energy of the feasible solutions.This is, to the best of our knowledge, one of the first papers addressing a variational method for the efficient optimization of realistic problems involving a large number of general constraints.Very recently, during the preparation of this manuscript, a study has been released that introduces a modification to the objective function of VQAs to deal with hard constraints [25].Specifically, they propose the minimization of an in-constraint energy instead of the penalized cost function.However, despite also being one of the only papers discussing this question, our multiobjective approach is clearly different.
The manuscript is structured as follows.First, we provide a brief review of key concepts to understand the new method, such as Variational Quantum Algorithms (sec.2.1), constrained Combinatorial Optimization (sec.2.2), and the Non-dominated Sorting Algorithm (sec.2.3).We introduce MOVCO in section 3, and pose the strengths of the method.We then demonstrate the effectiveness of the method on a real-world financial and logistical problem known as Cash Management problem.We describe in detail the specifications of this practical problem in section 4.1, and propose a novel mathematical formulation in terms of spins in section 4.2.In section 5 we show the empirical results that support an advantage of MOVCO to avoid infeasible local minima.We conclude with some remarks on the importance of the work, future lines of research, and potential improvements of MOVCO.

Variational quantum optimization
Variational quantum algorithms (VQA) are a class of hybrid quantum-classical algorithms widely studied due to their ability to solve complex problems with shallow quantum circuits [13].Their potential is based on the combination of powerful classical optimization methods with the suitability of quantum circuits to explore the entire Hilbert space efficiently.There is a great variety of VQAs whose features are adapted to the intended application.Nevertheless, they have some common ingredients and operate according to the following scheme.
VQAs rely on a quantum computer to build an ansatz through the application of a sequence of quantum operations U ( θ) to the input states: where N is the number of qubits.These operations depend on a number of tunable parameters θ.The unitary operations U ( θ) can be expressed as a product of L unitaries, where L is the number of layers of the ansatz that determines the depth of the quantum circuit: Then, the parameters of the ansatz are tuned according to a cost function that encodes the problem.This tuning is performed by a classical computer that takes measurements on the ansatz as input to find the optimal parameters that bring |Ψ( θ) as close as possible to the best solution.This kind of algorithms have been extensively studied in the context of combinatorial optimization [8,26,27,28,29,30,31].Although there is still no proof that these algorithms can provide a quantum advantage in optimization problems, there are indications of their potential [32,33,34,35].Examples of VQAs that have been studied in this context are the Quantum Approximate Optimization Algorithm (QAOA) [36], Variational Quantum Eigensolver (VQE) [37], recursive QAOA (RQAOA) [38], Filtering VQE (F-VQE) [39], Layer VQE (L-VQE) [40], ADAPT-QAOA [41], Variational Quantum Imaginary Time Evolution (VarQITE) [42], etc.In this manuscript, we will focus our numerical experiments on the Variational Quantum Eigensolver, whose problem-agnostic construction can be well adapted to a black-box optimization such as the Cash Management problem introduced in section 4.However, the new multiobjective technique presented in this paper can be easily extended to any of these variational algorithms.

Variational Quantum Eigensolver
The VQE was originally proposed as a variational algorithm for finding the ground state energy of a chemical molecule described by a Hamiltonian Ĥ [37], but it has also been applied in the context of quantum optimization [29].
In VQE, the parameters of an ansatz are trained by the minimization of the expectation value of the Hamiltonian Ψ( θ)| Ĥ |Ψ( θ) .Although some problem-dependent or probleminspired ansätze for VQE has been proposed to solve problems in quantum chemistry [43,44,45], in this algorithm the variational ansatz (1) is free, in the sense that it does not need to be determined by the Hamiltonian of the problem as in QAOA.In particular, a typical hardware-efficient choice is the sequence of quantum gates shown in Fig. 1: where R y (θ nl ) ≡ e iθ nl Ŷn are singlequbits rotations around the y axis, ) is an entangling gate made up of two-qubit control-Z gates between each qubit with its nearest-neighbor qubit in a linear quantum processor topology, and Ŷn and Ẑn are the Pauli Y and Z operators acting on the n-th qubit.In the computational experiments shown in section 5, the parameters θ n0 are initialized around ∼ π/4 so that the initial state is close to the full superposition, with small random perturbations θ nl ∼ 10 −2 .
The flexibility and first-neighbor connection of hardware-efficient ansätze make them suitable for the realization of this algorithm on real quantum processors.However, they are not tailored to the problem and they may suffer from trainability issues such as barren plateaus [46].

Constrained combinatorial optimization
Combinatorial optimization problems with relevance in industry usually involve the satisfaction of a large number of hard constraints.These problems can be formulated as the minimization of a black-box cost function, such that the solution z must fulfill a number of inequality and equality constraints: where N is the number of variables of the problem, and i = 1, ..., I , j = 1, ..., J, with I ∈ N and J ∈ N the number of equality and inequality constraints, respectively.This cost function can be translated to a Hamiltonian Ĉ through the change of variables z n → Ẑn , where Ẑn is the Pauli Z matrix acting on the n-th qubit.Thereby, the problem becomes the search of the groundstate of this Hamiltonian.Regarding the variational algorithms, this is equivalent to finding the parameters θ that bring the wavefunction closer to that ground state: such that where the constraints (5) are also formulated in terms of operators Bi , Ĝj through the previous transformation.We denote the subspace composed of the states satisfying (7) as the inconstraint or feasible subpace S. In a realistic scenario, the trial state |Ψ( θ) is prepared in an N qubits quantum computer so that we do not have access to the exact probability amplitudes of the wavefunction.The average energy Ψ( θ)| Ĉ( Ẑ) |Ψ( θ) is then estimated simultaneously measuring K times all qubits in the Pauli z basis.Each measure produces a classical state Hence, the sample mean is the estimator that we actually minimize, where k = 1, ..., K denotes each of the K samples measured.Therefore, constrained variational quantum optimization aims to approximate a wavefunction able to sample classical states with low

Non-dominated Sorting Genetic Algorithm
The Non-dominated Sorting Genetic Algorithm (NSGA-II), introduced in [47], is an evolutionary algorithm to perform multiobjective optimization which has been proven to be suitable for largescale optimization problems [48,49].The ultimate goal of NSGA-II is to find a set of optimal solutions for multiple cost functions.This set is known as the set of non-dominated solutions, or Pareto Front.A solution x 0 is said to be nondominated if and only if there is no different solution x i = x 0 such that x i is better than x 0 in at least one objective function, and at the same time is equal to or better than x 0 for all other cost functions being optimized.NSGA-II belongs to the subset of evolutionary algorithms known as genetic algorithms, which are meta-heuristic optimization techniques based on biological-inspired operators such as natural selection, crossover, and mutations [50].The optimization process is carried out by iteratively evolving a population (a set of solutions) to obtain better individuals (better solutions) using the concept of survival of the fittest and the previous operators.Specifically, NSGA-II involves the following steps (see Appendix D for more details): 1. Population initialization: the algorithm is initialized with a set of randomly sampled solutions (individuals).After the first iteration, the population will be a combination of the parent and offspring populations from the previous iteration.The population size is a hyperparameter of the model.

2.
Non-dominated sorting: the individuals of the population are sorted and classified in fronts according to its Pareto dominance.In other words, individuals which are nondominated by any other element are assigned to front 1 and eliminated from consideration.Then, the non-dominated individuals from the remaining population are classified as front 2, and so on with all the elements of the population.

3.
Create parents population: the new population is created from the front classification so that individuals belonging to the first fronts are promoted to the next iteration. .The hybrid algorithm is composed of a variational ansatz U( θ) running on a quantum computer, and a multiobjective optimization performed by the classical genetic method NSGA-II.Initially, the quantum circuit U is sampled with a collection of sets of random angles θ i to compute the fitness functions (9) and (10), that address the quality of the solutions in terms of constraint satisfaction and energy optimization.These results are ranked in terms of Pareto fronts so that angles belonging to the first fronts are saved for the next iteration (parent population).A collection of new sets of angles (offspring population) is generated from the application of genetic operators (tournament selection, crossover, and mutation) on the parent population.In the next iteration, the quantum circuit U is sampled with only the offspring parameters, performing the multi-objective optimization with the parent angles from the previous iteration and the new offspring results (Color online).

4.
Crowding distance sorting: when a front is partially taken, i.e. not all elements of a front are needed to fill the parent population, the solutions of this front are sorted according to a density-based metric.To encourage a broader exploration of the solution space, individuals in less dense spaces are promoted.

5.
Create offspring population: using genetic operators a new population is generated from the parent population.First, a binary tournament selection is performed so that only the best individuals (based on front and crowding distance ranking) from a random sampling are allowed to reproduce.Then, the variables (genes) of two of these individuals (parents) are combined following a certain crossover rule and giving rise to a new solution (child).Lastly, a mutation may be applied to the new individuals, changing some of their variables according to a chosen probability.
6. Iterate the process: steps (ii) to (iv) are re-peated with a new population composed of the parents and offspring populations.Because of the genetic simile, each iteration of the algorithm is called a generation.
In our numerical experiments we use the NGSA-II implementation of the open-source Python framework pymoo 0.5.0 [51].

Variational constraint optimization with multiobjective cost
We now introduce the multiobjective variational constrained optimizer (MOVCO), a new method to solve combinatorial optimization problems with hard constraints which combines the quantum variational framework with a genetic multiobjective optimization such as NSGA-II, where each individual is a set of variational parameters θ (see Fig. 2).In this algorithm the parameters of a variational wavefunction |Ψ( θ) are iteratively updated through the simultaneous optimization of two fitness functions: one addresses the quality of the solutions in terms of the constraints , and the minimum cost that satisfies all constraints 12. Therefore the zero-dimensional Pareto front is (P, E) P F = (1, −6).We trace the trajectory of optimal solutions found in each generation of MOVCO-VQE.We also show the results with 10 5 set of variational angles drawn from a random uniform sampling, and highlight the region where the variational wavefunction cannot sample (Color online).
satisfaction, and the other deals with the energy optimization, but preserving the variational algorithm from consuming time optimizing solutions outside the in-constraints regime.
• Fitness function to maximize the constraints satisfaction.-Ideally,we would be interested in directly maximizing the projection of the wavefunction |Ψ( θ) on the feasible space, i.e, the subspace of solutions that satisfy the constraints.However, this calculation is impractical in a realistic scenario since we do not even generally know which is the feasible space for large-size systems.All we can compute efficiently is whether or not a given solution satisfies given constraints, i.e, we can know if the solution belongs to such a subspace.Furthermore, if we only maximize solutions with full overlap on the feasible space, we would easily reach a stagnation since that subspace will presumably be small in comparison with the whole Hilbert space.Instead, we maximize the percentage of constraints that are satisfied by the variational ansatz.
where P k with k = 1, ..., K is the percentage of constraints satisfied by each sampled solution.
• Fitness function to minimize the energy of feasible solutions.-Besidesstates that satisfy the constraints, the optimization problem aims to find the lowest energy solutions among them.To avoid wasting time sampling and optimizing from solutions that lay outside the feasible subspace, instead of the traditional average energy (8), we propose to simultaneously optimize the following restricted energy: where C k with k = 1, ..., K are the values of the costs calculated for each sampled solution, and S is the subspace of solutions that satisfy all constraints.Therefore, only the subset of the K sampled states belonging to the feasible space S is used to compute the restricted energy.max [C] is a non-strict upper bound to the cost function value that can be efficiently calculated in practical scenarios.For instance, given a QUBO problem Note that this choice confines the restricted energy since The selection of specific suitable sampled states for computing the energy estimator of variational quantum algorithms is connected with other ideas in the literature such as the Conditional Valueat-Risk VQE (CVaR-VQE) [52], where only the lowest energy states are used in the energy calculation.
The combination of these fitness functions creates a favorable landscape (see Fig. 3) that enhances the convergence of the algorithm to lowenergy feasible solutions, as shown in section 5.
The Pareto front that maximizes the projection on the feasible space and minimizes the restricted energy is zero-dimensional in the space of the functions: (P, E) P F = (1, min E).This fact allows the algorithm to quickly increase P in such a way that after a number of iterations the ansatz only samples feasible states, avoiding wasting time optimizing infeasible solutions.Furthermore, the optimized functions may be simultaneously measured in the quantum processor.
One of the advantages of the multiobjective optimization implemented by MOVCO is the capability to globally explore the phase space, as opposed to the narrower scope of an optimization defined by one optimal trajectory.As explained in section 2.3, the genetic algorithm enables to sample states which may be far away in the variational parameter space, leading to a more efficient exploration of the Hilbert space.As an example, we can observe in Fig. 3 how even a random sampling of angles allows reaching a wide area of the fitness functions space of the small size problem.This feature, together with the optimization of the projection (9), provides a higher chance of convergence to the in-constraint subspace.
We believe that this method is well suited for combinatorial optimization problems where a considerable number of solutions fulfill at least some of the constraints.Scenarios where the subspace of states satisfying none of the constraints is too large could result in effective barren plateaus where the restricted energy is trapped at 0, due to the difficulty of sampling a state within the feasible space S. Nevertheless, many problems of industrial interest satisfy this requirement.
Note that this method targets the objective function used to optimize the variational parameters.Therefore, although the results shown in later sections are centered on VQE (MOVCO-VQE), our approach is ansatz agnostic and can be broadly applied to other variational algorithms such as QAOA or layer-VQE.Known modifications of objective functions, such as CVaR-VQE [52], can also benefit from this method.In this case, the values of C k in (10) would not be restricted to all states in S, but only the lowest energy values among them would be taken, which may lead to faster convergence to solutions with a minimal overlap with the ground-state.The money is supplied and withdrawn from a vault to ensure that an ATM network has the correct amount of cash.(b) Example of a CMP with two optimized ATMs over a four-day period.The cash in each ATM every day must be in the range [ 6k,9k ], and the sum of all money in the network must be less than 13k on the last day.
Optimal cash transfers are scheduled to fulfill the above requirements (Color online).
4 Application to a real-world problem: Cash Management In this section, we introduce the Cash Management problem (CMP), a non-convex optimization problem of high industrial interest.We also introduce a novel mathematical formulation of this problem that scales linearly with the number of cash points and the length of the optimized temporal period.

Description of the problem
One of the tasks that are part of a bank's routine operations is planning cash deliveries to branches and automated teller machines (ATMs) to provide them with the cash they need on a daily basis.This planning is based on a forecast of the daily cash that will be available at each branch and ATM, and is subject to certain restrictions.For example, when the amount of cash at a given ATM is expected to be less than a certain preset minimum amount, a request is made to send cash to that branch.This money comes from the vaults in which the amounts of cash that may be needed in the next few days are stored (see Fig. 4).
The transport from the vault to the branch or ATM is usually carried out by an external com-pany that has been requested to send or withdraw cash to a certain branch.The external company is then in charge of the transport, defining the route, which usually changes for security reasons.Thus, the Cash Management problem is not a generalization of the Traveling Salesman Problem as it might seem at first glance since we have no control over the specific delivery routes.Instead, it is a scheduling problem more similar to the Nurse Scheduling Problem [53], where a series of tasks have to be scheduled over a time interval so that they meet certain constraints.In addition, the Cash Management problem has an associated optimization problem since the cash delivery to each branch or ATM has an associated cost, which depends generally on its location.
Therefore, the Cash Management optimization problem consists of finding the optimal scheduling of cash delivery to a network of branches and ATMs in a given geography in a way that the cost of the transactions performed is minimized, while satisfying three requirements: • On each day of the time interval considered, the amount of cash in each branch and ATM must be higher than a pre-established minimum that guarantees the operability of the branch or ATM, and less than a maximum for security reasons.
• We can send or withdraw money from the branch or ATM only once a day.
In addition, there are a number of hard constraints that must be taken into account in the optimization process: • The number of deliveries made on a given day must not exceed a pre-established maximum due to the limited number of available delivery trucks.
• Shipments and withdrawals are made in discrete cash amounts, thus facilitating the management of the entire operation.
• Total cash in the branch and ATM network cannot exceed a pre-set maximum amount on the last three business days of each month for regulatory reasons.
Finally, it should also be noted that making a cash delivery on the first day of the time interval considered costs more money than any other day, since it is equivalent to making an urgent request only twenty-four business hours in advance.Additional conditions could be added to the description of the problem, such as the existence of non-working days for some ATM during the period, but in this manuscript we will limit to the Cash Management problem as described in this section.

Mathematical formulation
Assume C branches and ATMs (both will also be denoted as cash points) distributed according to a particular geography, and let's suppose that we want to plan the daily cash delivery at each of the C cash points for a period of D days.We will label each cash point and each day of the period with the integers c ∈ [0, C) and t ∈ [0, D) respectively.The CMP is completely defined by the following sets of variables, • k 0 c ∈ R + : price of sending or withdrawing money at the branch or ATM c on the first day of the period under consideration.
• k c ∈ R + , k c < k 0 c : price of sending or withdrawing money at the branch or ATM c on any day except the first day of the considered temporal period.
• p ct ∈ Z : initial prediction of the cash that will be available at cash point c on day t.Such a prediction may be outside the limits imposed on the amount of daily cash that each branch and ATM must have.
In addition to the variables that will define the constraints of the optimization problem: • v l , v h ∈ R + : minimum and maximum available cash respectively that each branch and ATM must have each day of the interval to guarantee its operability and for security reasons.
• v f ∈ R + : maximum value of total cash in the network of branches and ATMs that can be held on the last days of the time interval under consideration.For simplicity, and without loss of generality, we will set the last day of the time interval as the day subject to this restriction (t = D − 1).
• l ∈ N : maximum number of transactions (shipments and withdrawals) that can be made throughout the branch and ATM network each day.
The value of these variables cannot be modified in the optimization process, and will be given by the particularities of each geographical area and time interval.From these fixed variables it is possible to code the CMP with different cost functions by changing the definition of the variables to be optimized.
In the present formulation, the variables to be optimized are the discretized cash available at each branch and ATM c each day t, m ct .For convenience, we define the normalized cash where n = 0, 1, ..., h, is the spacing between levels, and h is the number of discrete cash values allowed.Note that a total of V = log 2 (h) • C • D binary, x = {0, 1}, or spins, z = (2x − 1), variables will be necessary for the encoding.For simplicity, and without loss of generality, we take h = 4, v h = h − 1, and v l = 0, so that where z i ct ∈ {−1, +1} with i = 0, 1.The advantage of this formulation is that we impose that all quantities are within the bounds set by the constraints of the problem, i.e., by definition it is satisfied that We further note that money transfers and withdrawals are always made with cash packages of discrete amounts multiple of ∆m as stated in the problem description.This fact makes that the initial prediction is also discretized p ct ∈ Z by ∆m, but can take an infinite number of values unlike M ct (see Fig. 5c).Note that p ct is a non-exact estimate, so approximating its value to the nearest allowed integer does not imply any noticeable shortcoming in practical application.
To construct the cost function of the problem it is convenient to define the cash that would be available at the cash point c on the day t if no shipment or withdrawal is made at that location on that day, Therefore, the cost function to be minimized in order to optimize the cost of shipments and withdrawals is as follows where δ(x) = 1 if x = 0, and δ(x) = 0 otherwise.Note that solving this optimization problem is equivalent to finding the ground state of the following Hamiltonian: where Ẑi ct denotes the Pauli Z operator acting on the corresponding qubit.Notice that the above Hamiltonian is defined exclusively in terms of Pauli Z operators such that their eigenstates are separable and computational basis states.
As discussed in the previous section, the optimization of the Cash Management problem is also subject to constraints which must be formulated in terms of inequalities: • On the last day of the time interval the sum of cash from all cash points in the network cannot exceed a certain amount: • There is a limit to the daily number of shipments and withdrawals made throughout the network: (c) Scenario for cash point one in which the red dashed and solid blue lines show the discrete cash levels inside and outside the constraint interval [v l , v h ] respectively, while the red dashed and solid blue circles show the initial prediction and the optimal result respectively (Color online).
Therefore, there are a total of D + 1 additional constraints.One way to include these constraints in the optimization process is to include penalty terms in the cost function that raise its value when a solution fails to satisfy any of the constraints.In this manuscript, we explore the advantages of performing a multi-objective optimization of the cost function that takes into account these constraints without the need to include additional terms.

Single instance example
For the sake of clarity, we include an example of a CMP with two cash points where the cash transactions are optimized during a period of four days.The practical scenario is shown in Fig. 4b.
In the discretized space the initial prediction of the available cash in each branch or ATM is with delivery costs k 0 0 = 4, k 0 1 = 8, and k 0 = 2, k 1 = 4.The constraints of the problem are defined by v l = 0, v h = 3, v f = 1, l = 1.As can be seen in Fig. 5a, without any transaction, cash point 1 violates the constraint imposed on available cash.The constraint on the total cash value, as the sum of the cash from the two points, is also violated on the last day of the interval.The optimal state of this instance, which satisfies all the constraints and minimizes the delivery costs, will be or in terms of the spin variables z i ct , , that corresponds to performing the transactions (shipments and withdrawals) shown in Fig. 4b and Fig. 5.The cost C of the optimal solution is 10.

Numerical results
Next, we test the proposed variational optimization method on a real-world problem of great industrial interest and challenging constraints: the Cash Management problem formulated in the previous section.Due to the characteristics of this problem, we use the VQE variant of the algorithm (MOVCO-VQE) because of its flexibility in constructing shallow ansätze with high expressibility.In addition, we shall make a comparison with the results obtained by optimizing the CMP using a Variational Quantum Eigensolver where the constraints are encoded as penalty terms.The computational experiments have been performed on CMP instances randomly generated as explained in Appendix B.

(f) (g) (e) (d)
Figure 6: Empirical results from the resolution of 120 CMP instances (2 cash points within a period of 2,3, and 4 days; V = 2CD is the number of variables/qubits required for encoding the problem) with the classical simulation of MOVCO-VQE using the single-layer ansatz (3).(a) Average of the percentage of constraints that are satisfied by the 8192 sampled solutions.(b) Percentage of instances where the variational wavefunction reached an overlap with the global minimum higher than 0.1.(c) Average of the approximation ratio.(d-g) Success rates in which an instance is successful if it achieves an approximation ratio higher than 0.95,0.9,0.85, and 0.8 respectively.Plots (a) and (c) show a 95% percentile interval around the average ranging from 2.5 to 97.5 percentiles of the distribution.The plots (b) and (d-g) display the 95% confidence interval in the average estimation (Color online).
Figure 7: Evolution of the solution reached by MOVCO-VQE when solving 10 independent CMP instances with 2 cash points over a period of 4 days.Each line corresponds to the trajectory of an instance in the fitness functions space as we increase the number of iterations of the algorithm.The restricted energy is normalized between 0 and -1 for all samples (Color online).

Performance of the multiobjective approach
We analyze the convergence and the efficiency of the algorithm as we increase the number of generations for different problem sizes (Fig. 6).We use one layer of the VQE ansatz described in section 2.1.1where the variational parameters are updated following the NSGA-II routine with binary tournament selection, simulated binary crossover [54], and polynomial mutation [55] as genetic operators.The population of the genetic algorithm was set to 10, also creating an offspring of ten new individuals in each generation.This means that the variational algorithm evaluates the quantum circuit with a different set of parameters θ ten times per generation.Here and in the subsequent results, the quantum circuits are classically simulated under noise-free conditions, sampling K = 8192 bit strings from the final probability distribution that mimic the finite number of measurements performed on a quantum computer.
To conduct the benchmark we will use metrics that tell us about the quality of the solutions in terms of the number of constraints that they satisfy and the transactions cost of these solutions, regardless of the VQA we apply.As for the constraint satisfaction, we may straightforwardly use the fitness function P ( θ sol ) from Eq. (9), where θ sol are the final variational parameters.Regarding the transactions cost we shall use the approximation ratio ( θ) defined as where C max = c k 0 c + (D − 1) c k c is the maximum possible cost, and C min is the minimum achievable cost provided that all constraints are satisfied.Hence, when P ( θ sol ) = 1 and ( θ sol ) = 1 the algorithm achieved the best solution, i.e the global minimum that fulfills the constraints |Ψ gs .We also analyze the overlap of the variational wavefunction with the global minimum ρ = | Ψ( θ)|Ψ gs | 2 by the success rate.We consider that the global minimum is easily sampled from the wavefunction, and thus has achieved success, if ρ > 0.1.
In Fig. 6a, we observe how the method prevents the variational wave function from being trapped in local minima outside the feasible space.Indeed, after two hundred generations of the algorithm the solutions of all instances satisfy all the constraints of the CMP.Moreover, these solutions have low transaction costs, as shown in Fig. 6c-g.Although Fig. 6c reflects some variability in the approximation ratio, the mean is high indicating that most instances achieve low energy solutions.This fact can also be seen in Fig. 6d-g where we plot the percentage of instances that achieved an approximation ratio ( θ sol ) > 0.95, 0.9, 0.85, 0.8 respectively.Even for the largest problems, most instances reach a solution whose approximation ratio is higher than 80%.As for the exact optimum, we see in Fig. 6b that that for about 80% of the instances with 2 cash points over a period of 4 days, and about 90% of the instances with 2 cash points over a period of 3 days, the variational wave function allows to efficiently sample the global minimum after only a hundred generations.
We may divide the execution process of the algorithm into two phases (see Fig. 7).During the first stage, the training raises the projection P ( θ) while the energy E( θ) might stay steady at 0. When the projection is high enough, the wavefunction starts sampling solutions in S, and the energy is simultaneously minimized as the projection keeps increasing so that progressively the optimization is performed only on the feasible subspace.This behavior can not only accelerate convergence in the second stage, but also avoids getting stuck in local minima and greatly reduces the probability of ending up outside the in-constraint subspace thanks to the training performed in the first stage.For small-size systems or problems with a high density of states satisfying all constraints, such as those implemented in Fig. 6 and Fig. 8, the first stage can be quickly overcome.In this scenario the energy is also minimized from the first generations.

Comparison with the penalties approach
After confirming the good convergence to optimal solutions of the multiobjective approach, we study the advantages of MOVCO over a standard approach.
A traditional technique to deal with hard constraints is transforming the original Hamiltonian including penalty terms.These terms increase the expectation value of the Hamiltonian when the solutions do not fulfill the constraints.In the Cash Management problem, the penalized Hamiltonian may be expressed as (20) where Ĉ, G D , and N t are defined in section 4.2, L is the Heaviside step function (L(x) = 1 if x ≥ 0 , L(x) = 0 otherwise, x ∈ R), and λ f and λ l are tunable positive hyperparameters.Thereby, the variational optimization becomes the search of the state that minimizes Ψ( θ)| Ĉpen ( Ẑ) |Ψ( θ) .Note that this state will satisfy all constraints if λ f and λ l are large enough.
We compare MOVCO-VQE with the VQE with penalties in CMP instances with 2 cash points optimized in a 4 days period, i.e 16-qubit problems.In VQE, the variational parameters are optimized by the simultaneous perturbation stochastic approximation (SPSA) [56].This gradient-free classical optimizer only performs two evaluations of the quantum circuit per iteration independently of the number of free parameters, making it one of the efficient methods for VQAs.[13].The value of the penalty hyperparameters was chosen in such a  way as to ensure that all solutions that fail to satisfy any constraint of the problem have a higher cost than solutions in the feasible subspace.To improve the significance of the experiments, we tuned these values to maximize the average constraint satisfaction and the average approximation ratio as explained in Appendix A, setting λ f = λ l = 25.For both MOVCO-VQE and VQE we use again the previously exposed one layer ansatz (3), and the noise-free quantum computer simulation with K = 8192 shots.
As shown in Fig. 9, MOVCO-VQE overcomes the algorithm with penalties in every considered metric.To make a fair comparison, we take into account that SPSA only needs two queries to the quantum processor per iteration, while MOVCO uses as as the size of the genetic algorithm (10 in our setup).Therefore, we plot the results in terms of the number of evaluations performed on the quantum computer.The main improvement is the ability of MOVCO-VQE to explore only the feasible space after a number of iterations, while VQE with penalties gets trapped in local minima outside the inconstraint regime.This fact can be seen in Fig. 8a, where we display the percentage of instances in which the overlap between the final wavefunction and the inconstraint subspace was almost total, i.e P ( θ sol ) > 0.99.This behavior is also noticeable in Fig. 8d, where we directly show P ( θ).
The percentage of instances achieving the global minimum is also increased and the cost of the solutions is reduced.Moreover, this improvement in convergence is associated with a lower dispersion of the results as seen in Figs.8ce.In Fig. 8e the instances with an approximation ratio higher than 1 reached low energy but non-feasible solutions.One point to note is that MOVCO allows parallel processing of each of the quantum circuit evaluations performed in an iteration thanks to the genetic algorithm mechanism, which greatly reduces the execution time.
The advantage of MOVCO arises from the combination of the multiobjective approach and the genetic algorithm technique.As a test, we also analyzed the performance of VQE with penalties when optimizing the variational parameter using a single objective genetic algorithm.
Problem size (C,D;V) (10,7; 140) (18,7; 252) (22,7; 308) (14,7; 196) Figure 9: Comparison between the performance of the MOVCO approach and the penalty approach using product states, as we increase the number of iterations.The same 80 instances are solved by both algorithms, and we compare the quality of the solutions by (a) the difference in the percentage of constraints that are satisfied by the sampled states, and (b) the percentage difference between the transactions cost of the sampled solutions.In cases above the pink dashed line, MOVCO achieved a higher quality solution.The boxes show the lower and the upper quartiles, the whiskers extend to 1.5 interquartile ranges, and the remaining points are observational data outside this distribution (Color online).
Specifically, we used the same genetic operators and hyperparameters as for MOVCO, but the ordering of the solutions performed in each iteration was done based on the average value of the penalized energy Ψ( θ)| Ĉpen ( Ẑ) |Ψ( θ) .As shown in Appendix C, the result is similar to that obtained with SPSA in Fig. 9 thus the advantage of MOVCO is maintained by the proposed multiobjective optimization.

Benchmarking for larger systems by product states
The classical simulation of entangled states such as (3) involves an exponential cost of resources which prevents the analysis of problems with a high number of variables.In order to increase the number of cash points and days in our computational experiments, and compare how the performance of both algorithms evolves in this scenario, we need to reduce the complexity of the variational ansatz.For this reason, we now analyze the performance of the two methods using the fully separable ansatz, (21) with |ψ(θ n ) = cos θ n |0 + sin θ n |1 .This variational form is a borderline case of the ansatz (3) with 0 layers.Since we eliminate the entanglement of the quantum circuit, the variational wavefunction probability distribution of ( 21) can be efficiently sampled using an array of just N elements {cos 2 θ n }.Note that this simple architecture is already able to reproduce any classical state from the Hilbert space.Therefore, there always exist a set of parameters θ sol for which (21) is the ground state of the CMP Hamiltonian.The use of this family of states may be seen as a quantum-inspired alternative to the variational quantum algorithms with entangling layers tested in the previous results.
We test the algorithms by optimizing ATM networks consisting of 10 to 22 cash points for a whole week, which corresponds to problems up to V = 308 variables.Due to the increased size of the problems, in these results we apply MOVCO-VQE with a population set to 100, and VQE with penalties where λ f = λ l = 50.The number of bit strings sampled per evaluation is again K = 8192.The procedure is as follows.We generate one random instance of the CMP as previously explained, and resolve the optimization problem by both algorithms.The larger size of these instances does not enable us to know the exact solution to the problem by exhaustive search, so we performed the benchmarking between the two methods by directly comparing the solutions obtained.We use two metrics: the constraint satisfaction gap defined as and the cost gap where {V QEpen, M OV CO}, and θ M OV CO , θ V QEpen the final angles after the MOVCO-VQE and VQE with penalties algorithms respectively.In Fig. 9 we display the evolution of the solutions as we increase the number of cost function evaluations.The cases above the pink line reflect better solutions for MOVCO-VQE.The improvement is noticeable in both constraint satisfaction and transaction cost of the sampled solutions and becomes more pronounced as we increase the iterations of the algorithms.Indeed, we achieve better solutions for 100% of the instances when the algorithms have performed 10000 evaluations.

Conclusions
In this manuscript, we introduce the Multi-Objective Variational Constrained Optimizer (MOVCO), a variational quantum method to solve combinatorial optimization problems (CO) with hard constraints.This method allows improving the performance of Variational Quantum Algorithms (VQAs) through a genetic algorithm that simultaneously optimizes two fitness functions: one dealing with the satisfaction of the constraints, and the other with the energy minimization of the solutions within the feasible subspace.
We provide empirical evidence of the robust performance of MOVCO on a very relevant industrial problem, the Cash Management problem (CMP).We propose a novel formulation of CMP in terms of binary variables and solve it using the MOVCO version of VQE.We compare these results with a standard approach in which VQE optimizes a cost function with penalty terms that artificially increase the energy of infeasible states.Our study reveals that MOVCO provides benefits in avoiding unfeasible minima while enhancing convergence to lower energy solutions.A detailed study of the influence of the hyperparameters of the genetic algorithm on MOVCO performance, such as population size or genetic operators, may lead to better results.
The MOVCO method is ansatz agnostic, so it can be applied to a wide range of VQAs beyond VQE.New studies may be conducted implementing the method on problem-dependent ansätze such as QAOA where the CO, usually formulated as a QUBO, is mapped to the quantum processor.Further enhancements can be incorporated into MOVCO.For example, a CVaR version of MOVCO can be analyzed, in which only the lowest energy percentage of the sampled states would be used to compute the fitness functions of the multiobjective optimization.Since CO does not require a complete overlap between the variational wavefunction and the in-constraint subspace but a high probability of sampling lowenergy feasible solutions, another interesting possibility is to set an upper limit to the percentage of sampled solutions that satisfy all constraints.
This work provides further insight into the application of variational algorithms to optimization problems of practical interest.Furthermore, this is, to the best of our knowledge, the first attempt to solve CMP with quantum computing.As such, this work may open the way for further studies on this and other problems in the large set of real-world combinatorial optimization problems with hard constraints.

Disclaimer
This paper is purely scientific and informative in nature and is not a product of BBVA SA or any of its subsidiaries.Neither BBVA nor such subsidiaries are aware of or necessarily share the premises, conclusions or contents in general of this document.Consequently, the responsibility for its originality, accuracy, reliability or for any other reason lies exclusively with the authors.This document is not intended as investment research or investment advice, or a recommendation, offer or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.

A Penalty hyperparameter tunning
As explained in the paper, a typical approach to solving combinatorial optimization problems with hard constraints is to include penalty terms in the cost function, as is done in eq.(20).The penalty hyperparameters λ l and λ f control the gap between feasible and infeasible solutions, and their value affects the performance of the algorithm.A low value of these hyperparameters may speed up the convergence of the algorithm, but it also increases the possibility of getting trapped in a local minimum, and vice versa.
To increase the significance of the comparison conducted in sec.5.2 between MOVCO-VQE and penalized VQE, we numerically studied how the value of these hyperparameters affects the average penalized VQE performance and selected the best ones.We performed 500 iterations of the classically simulated algorithm on 120 randomly generated instances of the Cash Management problem with two cash points within a four days period with different values of the hyperparameters.As in sec.5.2, we used the ansatz (3), the SPSA optimizer, and 8192 shots.Since λ l and λ f have no qualitative differences we chose λ l = λ f .We show the results in Fig. 10, where we compare the percentage of instances whose almost all sampled solutions are within the feasible space and the approximation ratio obtained.We note that λ l = λ f = 25 achieved the best results.

B Cash Management instances generation
In the computational experiments shown in section 5 the Cash Management instances with C cash points were randomly generated following the next procedure: • The price of sending or withdrawing cash at the branch or ATM c, k c , is an integer drawn from the discrete uniform distribution in the interval [1,4].The price on the first day is double, k 0 c = 2k c .
• The initial prediction of cash available in every cash point c and day d, p cd , is an integer drawn from the discrete uniform distribution in the interval [−2, 5].
• the maximum total cash in the network of branches and ATMs on the last day of the time interval is established in v f = C.
• The maximum number of transactions (shipments and withdrawals) that can be made each day is l = 1 if C = 2, l =  As explained in section 4, the total number of hard constraints is D + 1, excluding those constraints that are satisfied by any state given the formulation of the problem (lower and upper bound to the available cash in each ATM each day).However, for the small-size problems generated in section 5.1 and 5.2 it could happen that no state satisfies all the constraints.In such cases, the maximum number of constraints satisfied by at least one solution is calculated, and the restricted energy and constraint satisfaction results are computed based on this value.

C Single objective genetic algorithm optimization
After observing the performance of MOVCO, one may wonder whether its advantage is intrinsically due to the constraint multiobjective approach, or whether it is only a consequence of the optimization mechanism of the genetic algorithm.In this regard, in Fig. 11 we compare its performance with a VQE in which the classical optimization of the variational parameters is conducted by a single objective genetic algorithm and a cost function with penalty terms.MOVCO-VQE utilizes the NSGA-II algorithm as an optimization subroutine, as explained in sec.3.In the shown numerical results NSGA-II is implemented with simulated binary crossover and polynomial mutation as genetic operators, and a population size equal to 10.In order to fairly compare the two approaches, in Fig. 11 the VQE with penalties uses a genetic algorithm with the same genetic operators and hyperparameters as the NSGA-II of MOVCO-VQE, but instead of promoting solutions according to the multiobjective optimization of the two fitness functions introduced in sec.3, the only quality metric is the energy of the penalized Hamiltonian defined in (20): Ψ( θ)| Ĉpen ( Ẑ) |Ψ( θ) .Therefore, the MOVCO ranking based on dominance fronts is replaced by a rating based on the average penalized energy value with λ l = λ f = 25.
We can observe how the CMP results obtained using the single objective genetic algorithm are similar to those obtained with the SPSA optimizer (see Fig. 8), showing an advantage for MOVCO both in the cost of the achieved solutions, in the dispersion of the results, and especially in the ability to converge to solutions that satisfy all the hard constraints.These empirical results indicate that the advantage of MOVCO lies in the combination of the multiobjective approach that simultaneously addresses both the energy of the solutions and the satisfaction of complex constraints in the optimization process, and the effective exploration of the Hilbert space thanks to the use of a genetic algorithm.

D NSGA-II algorithm
In this section, we provide the details of the Non-dominated Sorting Genetic Algorithm (NSGA-II) [47], including the general steps of the algorithm (Algorithm 1), the classification of the solutions into fronts (Algorithm 3), and the calculation of the crowding distance (Algorithm 2).

Algorithm 1 NSGA-II Algorithm
Input: population size N , number of generations T , two fitness functions, genetic operators (selection, crossover, mutation) Output: a set of solutions that compose the Pareto front 1: P (0) ← RandomPopulation(N) Generate the initial population 2: Evaluate(P (0)) Assign a fitness or rank using non-dominated sorting (see Algorithm 3) and then crowding distance operator (see Algorithm 2 M (t) ← CrossoverSBX(W (t)) Allowing to explore the search space creating new solutions [54] 7: Exploitation [55] 8: Evaluate(Q(t)) Evaluating the new population members for each q = p in P do 5: if p < q then if p dominates q according to the fitness functions for each q in S p do each element dominated by p 23: n q ← n q -1 the space that dominates q is decremented in one element 24: if n q = 0 then q is now a non-dominated solution, therefore it belongs to the next front 25: q rank ← i + 1

Figure 1 :
Figure1: One layer of the harware-efficient variational quantum circuit (3) for five qubits.We apply parameterized single qubit R y rotations and control-Z entangling gates between first-neighbors.

Figure 2 :
Figure2: Schematic diagram of the constrained multi-objective variational optimizer (MOVCO).The hybrid algorithm is composed of a variational ansatz U( θ) running on a quantum computer, and a multiobjective optimization performed by the classical genetic method NSGA-II.Initially, the quantum circuit U is sampled with a collection of sets of random angles θ i to compute the fitness functions (9) and(10), that address the quality of the solutions in terms of constraint satisfaction and energy optimization.These results are ranked in terms of Pareto fronts so that angles belonging to the first fronts are saved for the next iteration (parent population).A collection of new sets of angles (offspring population) is generated from the application of genetic operators (tournament selection, crossover, and mutation) on the parent population.In the next iteration, the quantum circuit U is sampled with only the offspring parameters, performing the multi-objective optimization with the parent angles from the previous iteration and the new offspring results (Color online).

Figure 3 :
Figure 3: Optimization landscape in the space of fitness functions for an 8-qubits CMP instance with 2 cash points optimized in a 2 days period as explained in section 4. The maximum cost of transactions is

Figure 4 :
Figure 4: (a) Scheme of the Cash Management problem.The money is supplied and withdrawn from a vault to ensure that an ATM network has the correct amount of cash.(b) Example of a CMP with two optimized ATMs over a four-day period.The cash in each ATM every day must be in the range [ 6k,9k ], and the sum of all money in the network must be less than 13k on the last day.Optimal cash transfers are scheduled to fulfill the above requirements (Color online).

Figure 5 :
Figure 5: Example of a single instance of a CMP with two cash points optimized over a four-day period.(a) Situation given by the initial prediction p cd when no transactions are performed.(b) Optimal solution m cd .(c)Scenario for cash point one in which the red dashed and solid blue lines show the discrete cash levels inside and outside the constraint interval [v l , v h ] respectively, while the red dashed and solid blue circles show the initial prediction and the optimal result respectively (Color online).

Figure 8 :
Figure 8: Comparison between the MOVCO-VQE and VQE with penalties performance.We show the results from 120 CMP instances (2 cash points within a period of 4 days).We conducted a classical simulation of both algorithms using the single-layer ansatz (3) and 8192 shots.(a) Percentage of instances whose almost all sampled solutions are within the feasible space.(b) Percentage of instances where the variational wavefunction reached an overlap with the global minimum higher than 0.1.(c,e) Average and distribution respectively of the approximation ratios.(d) Average of the percentage of constraints that are satisfied by the sampled solutions.Plots (c) and (d) show a 95% percentile interval around the average ranging from 2.5 to 97.5 percentiles of the distribution.Plots (a) and (b) displays the 95% confidence interval in the average estimation.The boxes in plot (e) show the lower and the upper quartiles, and the whiskers extend to 1.5 interquartile ranges (Color online).

Figure 10 :
Figure 10: Comparison of VQE performance optimizing the CMP penalized cost function (20) with different hyperparameters λ l = λ f values.The results show the average from 120 random instances with two cash points optimized within a four-day period after 500 iterations of the algorithm with SPSA.(a) Percentage of instances whose almost all sampled solutions are within the feasible space (P ( θ) > 0.99).(b) Average of the approximation ratios (taking into account only instances with P ( θ) > 0.99).Both plots display the 95% confidence interval in the average estimation.(Color online).

3 4 C
otherwise (where • denotes the rounding function to the nearest integer value).

Figure 11 :
Figure 11: Comparison between the MOVCO-VQE and VQE with penalties performance.We show the results from 120 CMP instances (2 cash points within a period of 4 days).In the VQE with penalties, a single objective genetic algorithm with the same genetic operators and hyperparameters as MOVCO is used as classical optimizer.Plots (c) and (d) show a 95% percentile interval around the average ranging from 2.5 to 97.5 percentiles of the distribution.Plots (a) and (b) displays the 95% confidence interval in the average estimation.The boxes in plot (e) show the lower and the upper quartiles, and the whiskers extend to 1.5 interquartile ranges (Color online).

) 3 :
Q ← ∅ Initialize offspring population with an empty set 4: for t = 0 to T − 1 do 5: W (t) ← TournamentSelection(P (t)) compare individuals two by two and pick the winners 6:

9 :P 7 :F 11 :Algorithm 3
new (t) ← P (t) ∪ Q(t) Merge population calculating fitness 10: P (t + 1) ← Survivor(P new (t)) Choosing better N solutions for next generation 11: end for Algorithm 2 Crowding Distance Calculation Input: a set of solutions composing the front F , objective functions f m Output: an ordering of the solutions in terms of a density-based distance d i 1: r ← |F | Cardinality of F i.e. number of solutions in front F 2: M ← number of objective functions 3: for i = 1 to r do 4:d i ← 0for each element of front F set its distance d i to zero 5: end for 6: for m = 0 to M do ← sort(F , f m ) sort the elements i of front F according to their objective function f m (i) values 8: d 1 ← d r ← ∞ set the distance d of the most extreme solutions to large values 9: f m,max ← max f m (i) 10: f m,min ← min f m (i) for i = 2 to r -1 do 12:d i ← d i + |f m (i + 1) − f m (i − 1)|/(f m,max − f m,min )calculate the distance of each solution as the sum of the differences of the values of the objective functions f m of the previous and the next element in the sorted front F 13: end for 14: end for solutions in less dense spaces, i.e., solutions with higher d, are promoted.Front classification Input: a set of solutions or population P , fitness functions Output: an ordering of the solutions in terms of Pareto dominance 1: F 1 ← ∅ Initialize the first front with an empty set 2: for each p in P do 3: S p ← ∅, n p ← 0 4: