Optimal control theory for a unitary operation under dissipative evolution

Michael H Goerz; Daniel M Reich; Christiane P Koch

doi:10.1088/1367-2630/16/5/055012

1. Introduction

Quantum effects such as entanglement and matter interference are predicted cornerstones of future technologies. Their exploitation requires the ability to reliably and accurately control complex quantum systems. A major obstacle is that a quantum system can never completely be isolated from its environment, and the interaction with the environment causes decoherence [1]. This is particularly true for condensed phase settings as encountered in, e.g., solid-state quantum devices. A number of concepts, such as decoherence-free subspaces [2], and noiseless subsystems [3], dynamical decoupling [4], and spectral engineering [5], have been developed to cope with decoherence. The applicability of these strategies is tied to specific conditions on the interaction between system and environment and, in practice, is often limited to systems that can be described by simple models. For complex quantum systems, numerical optimal control offers an alternative approach. It calculates the external controls that implement a desired target operation by performing an iterative search in the parameter space of the controls [6].

For quantum systems that are subject to decoherence, numerical optimal control was first employed to realize laser cooling of internal degrees of freedom in molecules [7]. Further applications, also utilizing a Markovian master equation to describe the open system dynamics, include controlling coherences [8], automatic protection against noise [9], selective photoexcitation of charge transfer [10], electric current in a molecular junction [11], quantum gates [12], and quantum memories [13]. Due to the formal equivalence between Markovian dissipation and quantum measurements, optimized observations can be determined using the same set of tools [14]. Numerical optimal control can also be applied to non-Markovian quantum systems [15–18] provided the dynamics can be calculated with sufficient efficiency.

The question of numerical effort becomes particularly important in the optimization of high-fidelity quantum gates. High fidelities, or small errors, are best achieved with monotonically convergent optimization algorithms that utilize gradient information and thus require repeated forward and backward propagation [19, 20]. Gate optimization under coherent dynamics implies propagation of a set of states that span the Hilbert (sub)space on which the target is defined [21, 22]. For open system dynamics, this was generalized to a set of states that span the corresponding Liouville (sub)space [9, 12, 18, 23]. It requires not only propagation of density matrices instead of wavefunctions but also a significantly larger number of states since Liouville space dimension is the square of Hilbert space dimension. Realistically, this limits quantum gate optimization to but the simplest examples, i.e., one-qubit and two-qubit operations.

The direct extension from Hilbert to Liouville space [9, 12, 23] overlooks the fact that in quantum gate optimization, the target is a unitary operation and not a general dynamical map. The latter would indeed require a basis that spans the full Liouville space. However, much less information is required to assess how well a desired unitary is implemented. This observation is not only relevant for optimal control but also provides the basis for all current attempts at reducing the resources for estimating the average gate error [24–28]. In fact, only two states are necessary to distinguish any two unitaries, irrespective of Hilbert space dimension [29]. We show here that these two states, together with a third state enforcing the dynamical map on the optimization subspace to be contracting and population conserving, can be utilized to construct an optimization functional that attains its optimal value only if the desired gate is implemented with unit fidelity.

The two states that are required for unitary identification are constructed such that the first one consists of non-degenerate contributions from each Hilbert (sub)space direction. This corresponds to choosing a basis, and probing the gate error within this basis. In order to determine the error of gates that are diagonal in the chosen basis, i.e., phase errors, the second state is needed. For Hamiltonians, which due to their inherent structure allow for nothing but diagonal gates, only the second state together with the third one is required, enforcing the dynamical map on the optimization subspace to be contracting and norm conserving. In our application, we thus distinguish between gates which are diagonal and those that are nondiagonal in the logical basis.

Our optimization functional is closely related to the gate error. While two states represent the minimal set of states required to distinguish any two unitaries, it is impossible to deduce bounds on the gate error from the two states [29]. This is due to the state corresponding to the choice of basis being a totally mixed thermal state. Meaningful bounds on the gate error can be derived numerically when replacing the totally mixed state by a set of d pure states where d is the dimension of Hilbert space, i.e., by choosing a separate basis state for each Hilbert space direction [29, 30]. The resulting set consists of $d+1$ states. Analytical bounds are obtained when also the second state of the minimal set is expanded [31]. The corresponding set is built out of the $2d$ states of two mutually unbiased bases [29]. This observation from process verification motivates the choice of optimization functionals, which utilize these extended sets of states. Although the number of states then depends on Hilbert space dimension, this choice still comes with very significant savings in the computational resources. For example, already for a two-qubit gate, both $2d$ and $d+1$ represent a significant reduction in the number of states that need to be propagated, namely a reduction from 16 for the full Liouville space basis to 8 and 5, respectively.

We demonstrate below that two states are sufficient to optimize diagonal gates and three states to optimize nondiagonal two-qubit gates. We also show that, depending on the desired gate error, $d+1$ , respectively, $2d$ states in the optimization functional correspond to the numerically most efficient choice. We consider a controlled phasegate with neutral trapped atoms that are excited into a Rydberg state and a $\sqrt{i{\rm{SWAP}}}$ gate with superconducting qubits. In both examples, our optimization identifies gate implementations for which the error is limited by decoherence. This proves that all reduced sets of states are sufficient for determining the fundamental limit to the gate error and thus for quantum gate optimization.

The paper is organized as follows. Section 2 defines the optimization functional and presents the optimization algorithm. Optimization of a controlled phasegate for neutral atoms is discussed in section 3, whereas optimization of a nondiagonal gate for superconducting qubits is studied in section 4. Section 5 concludes. The algebraic framework and the proofs required for the construction of the three states employed in the optimization functional are presented in appendix A.

2. Optimal control theory for a unitary operation under dissipative evolution

2.1. Optimization functional

In order to employ optimal control theory to determine a high-fidelity implementation of quantum gates, one needs to define a distance measure ${{J}_{T}}$ between the desired unitary $\hat{O}$ and the actual evolution. We show here that

$\begin{eqnarray}&&{{J}_{T}}=1-\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{n}\frac{{{w}_{i}}}{{\mathsf Tr} \left[ \hat{\rho }_{i}^{2}(0) \right]}\;\Re e\left\{ {\rm{Tr}}\left[ \hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }}{{\hat{\rho }}_{i}}\left( T \right) \right] \right\}\end{eqnarray} \tag{ 1 }$

with n = 3 and specific initial states ${{\hat{\rho }}_{i}}(0)$ represents a suitable choice for ${{J}_{T}}$ . This is in contrast to [9, 12, 23], where n was taken to be the Liouville space dimension corresponding to $\hat{O}$ , i.e., $n={{2}^{2N}}$ for N qubits, and ${{\hat{\rho }}_{i}}$ an orthonormal basis (under the Hilbert–Schmidt product) of Liouville space. In equation (1), ${{w}_{i}}$ are weights, normalized as $\mathop{\sum }\nolimits_{i=1}^{n}{{w}_{i}}=1$ . In order to evaluate ${{J}_{T}}$ , the time evolved states ${{\hat{\rho }}_{i}}\left( T \right)$ need to be obtained by solving the equation of motion describing the open systemʼs evolution for ${{\hat{\rho }}_{i}}$ . While in general the dynamics can be non-Markovian, we will restrict ourselves to a Markovian master equation in the examples below. We assume the coherent part to include coupling to an external control, i.e., the Hamiltonian, is of the form $\hat{H}(t)={{\hat{H}}_{0}}+\epsilon (t){{\hat{H}}_{1}}$ , and generalization to several controls ${{\epsilon }_{i}}(t)$ is straightforward.

The functional ${{J}_{T}}$ needs to be minimized with respect to $\epsilon (t)$ . Further constraints can be added, for example,

$\begin{eqnarray}&&J={{J}_{T}}-{{\lambda }_{a}}\mathop{\int }\nolimits_{0}^{T}{{\left[ \epsilon (t)-{{\epsilon }_{{\rm{ref}}}}(t) \right]}^{2}}{\rm{/}}S(t)\;{\rm{d}}t,\end{eqnarray} \tag{ 2 }$

where ${{\epsilon }_{{\rm{ref}}}}(t)$ denotes a reference field, S(t) enforces the field to be smoothly switched on and off, and the second term in equation (2) ensures a finite pulse fluence [22]. More complex additional constraints, for example, restricting the spectral width of the pulse or confining the accessible state space [32, 33], are also conceivable.

Mathematically, our claim that only three states are sufficient to determine proper implementation of the desired unitary $\hat{O}$ is equivalent to the conjecture that the optimization functional attains its global minimum if and only if

$\begin{eqnarray}&&{{\hat{\rho }}_{i}}\left( T \right)=\hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }}\end{eqnarray} \tag{ 3 }$

for the three states ${{\hat{\rho }}_{i}}$ . The three states are constructed such that the first one fixes a basis, and the corresponding Hilbert–Schmidt product in equation (1) checks whether the gate is correctly implemented in this basis. It misses errors for gates that are diagonal in the basis, i.e., phase errors [29]. The second state is therefore chosen to detect phase errors with its Hilbert–Schmidt product in equation (1) [29]. The Hilbert–Schmidt product of the third state determines whether the dynamical map attained at time T conserves the population within the optimization subspace. This is necessary since the time evolution can be nonunitary due to decoherence or due to leakage into states other than the logical basis¹ .

In more technical terms, ${{\hat{\rho }}_{1}}(0)$ is a density matrix with N nondegenerate, nonzero eigenvalues. Spanning the d-dimensional Hilbert space ( $d={{2}^{N}}$ for N qubits) by an arbitrary complete orthonormal basis, $\{|{{\varphi }_{i}}\rangle \}$ , ${{\hat{\rho }}_{1}}(0)$ is expressed in terms of a complete set of d one-dimensional orthogonal projectors ${{\hat{P}}_{i}}=|{{\varphi }_{i}}\rangle \langle {{\varphi }_{i}}|$ , i.e., ${{\hat{\rho }}_{1}}(0)=\mathop{\sum }\nolimits_{i=1}^{d}{{\lambda }_{i}}{{\hat{P}}_{i}}$ with ${{\lambda }_{i}}\ne {{\lambda }_{j}}\forall i\ne j$ and ${{\lambda }_{i}}\geqslant 0$ [29]. The second state, ${{\hat{\rho }}_{2}}(0)$ , is constructed to be totally rotated with respect to ${{\hat{\rho }}_{1}}(0)$ , i.e., ${{\hat{\rho }}_{2}}(0)={{\hat{P}}_{TR}}$ where ${{\hat{P}}_{TR}}$ is a one-dimensional projector obeying ${{\hat{P}}_{TR}}{{\hat{P}}_{i}}\ne 0$ for i = 1,..., d [29]. ${{\hat{\rho }}_{3}}(0)$ is the identity in the optimization subspace. A possible choice for the initial states reads

$\begin{eqnarray}&&{{\left( {{\hat{\rho }}_{1}}(0) \right)}_{ij}}=\frac{2\left( d-i+1 \right)}{d\left( d+1 \right)}{{\delta }_{ij}},\end{eqnarray} \tag{ 4a }$

$\begin{eqnarray}&&{{\left( {{\hat{\rho }}_{2}}(0) \right)}_{ij}}=\frac{1}{d},\end{eqnarray} \tag{ 4b }$

$\begin{eqnarray}&&{{\left( {{\hat{\rho }}_{3}}(0) \right)}_{ij}}=\frac{1}{d}{{\delta }_{ij}},\end{eqnarray} \tag{ 4c }$

where the matrix elements are given in the optimization subspace, all other elements are zero. We show in appendix A that the optimization reaches its target if and only if condition (3) is fulfilled. Specifically, we prove that propagation of three states is sufficient, irrespective of the dimension of the optimization subspace. Already for a small number of qubits, this represents a significant computational saving compared to the propagation of ${{2}^{2N}}$ initial states deemed necessary in the literature [9, 12, 23].

The states ${{\hat{\rho }}_{1}}$ and ${{\hat{\rho }}_{2}}$ of equation (4), while sufficient in principle to distinguish any two unitaries, do not allow for stating bounds on the gate error [29]. Meangingful bounds on the gate error can be obtained numerically by replacing ${{\hat{\rho }}_{1}}$ , ${{\hat{\rho }}_{2}}$ by a set of $d+1$ states, whereas analytical bounds can be deduced from $2d$ states [29–31]. Motivated by this fact, we define two additional sets of states that can be employed in equation (1). When n in equation (1) is taken to be equal to $d+1$ , the totally mixed state of equation (4a ) is replaced by d pure states,

$\begin{eqnarray}&&{{\hat{\rho }}_{j}}(0)=|{{\varphi }_{j}}\rangle \langle {{\varphi }_{j}}|,\end{eqnarray} \tag{ 5 }$

with j = 1,..., d and $\{|{{\varphi }_{j}}\rangle \}$ the logical basis. ${{\hat{\rho }}_{d+1}}(0)$ is simply equal to ${{\hat{\rho }}_{2}}(0)$ of equation (4b ). In this case, equation (4c ) is not required since the $d+1$ pure states are sufficient to enforce the dynamical map on the optimization subspace to be contracting and norm conserving. Similarly, the functional (1) employing $n=2d$ states is constructed by replacing ${{\hat{\rho }}_{1}}(0)$ of equation (4a ) by ${{\hat{\rho }}_{j}}$ , j = 1,..., d of equation (5) and ${{\hat{\rho }}_{2}}(0)$ of equation (4b ) by

$\begin{eqnarray}&&{{\hat{\rho }}_{d+j}}(0)=|{{\tilde{\varphi }}_{j}}\rangle \langle {{\tilde{\varphi }}_{j}}|,\end{eqnarray} \tag{ 6 }$

with $j=1,...,d$ , where the states $|{{\varphi }_{j}}\rangle$ form a mutually unbiased basis with respect to the canonical basis $\{|{{\varphi }_{j}}\rangle \}$ . For two qubits (d = 4), an example for such a basis is given by

$\begin{eqnarray}&&|{{\tilde{\varphi }}_{1}}\rangle =\frac{1}{2}\left( |00\rangle +|01\rangle +|10\rangle +|11\rangle \right),\end{eqnarray} \tag{ 7a }$

$\begin{eqnarray}&&|{{\tilde{\varphi }}_{2}}\rangle =\frac{1}{2}\left( |00\rangle -|01\rangle +|10\rangle -|11\rangle \right),\end{eqnarray} \tag{ 7b }$

$\begin{eqnarray}&&|{{\tilde{\varphi }}_{3}}\rangle =\frac{1}{2}\left( |00\rangle +|01\rangle -|10\rangle -|11\rangle \right),\end{eqnarray} \tag{ 7c }$

$\begin{eqnarray}&&|{{\tilde{\varphi }}_{4}}\rangle =\frac{1}{2}\left( |00\rangle -|01\rangle -|10\rangle +|11\rangle \right).\end{eqnarray} \tag{ 7d }$

2.2. Optimization algorithm

We assume in the following a coupling to the external field that is linear in the field and equations of motion that are linear in the states² . Moreover, the full optimization functional, equation (2), is linear in the states ${{\hat{\rho }}_{i}}\left( T \right)$ and does not depend on the states at intermediate times t. In this case, the linear version of Krotovʼs method is sufficient to yield a monotonically convergent optimization algorithm [34]. It is given in terms of coupled control equations that need to be solved simultaneously. Here, we model the dissipative time evolution by a Markovian master equation,

$\begin{eqnarray}&&\frac{d\hat{\rho }}{dt}=\mathcal{L}\left( \hat{\rho } \right)=-i\left[ \hat{H}(t),\hat{\rho } \right]+{{\mathcal{L}}_{D}}\left( \hat{\rho } \right).\end{eqnarray} \tag{ 8 }$

The control equations then read

$\begin{eqnarray}&&\frac{d{{\hat{\rho }}_{i}}}{dt}=-i\left[ \hat{H},{{\hat{\rho }}_{i}} \right]+{{\mathcal{L}}_{D}}\left( {{\hat{\rho }}_{i}} \right),\end{eqnarray} \tag{ 9a }$

$\begin{eqnarray}&&\frac{d{{\hat{\sigma }}_{i}}}{dt}=-i\left[ \hat{H},{{\hat{\sigma }}_{i}} \right]-{{\mathcal{L}}_{D}}\left( {{\hat{\sigma }}_{i}} \right)\quad {\rm{and}}\quad {{\hat{\sigma }}_{i}}\left( t=T \right)=\frac{{{w}_{i}}}{{\mathsf Tr} \left[ \hat{\rho }_{i}^{2}(0) \right]}\hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }},\end{eqnarray} \tag{ 9b }$

$\begin{eqnarray}&&\Delta \epsilon (t)=\frac{S(t)}{{{\lambda }_{a}}}\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{n}\Im m\{{\mathsf Tr} [\frac{\partial \mathcal{L}\left( {{\hat{\rho }}_{i}} \right)}{\partial \epsilon }\hat{\sigma }_{i}^{{\rm{old}}}(t)\frac{\partial \mathcal{L}\left( {{\hat{\rho }}_{i}} \right)}{\partial \epsilon }{{|}_{\hat{\rho }_{i}^{{\rm{new}}},{{\epsilon }^{{\rm{new}}}}}}\frac{\partial \mathcal{L}\left( {{\hat{\rho }}_{i}} \right)}{\partial \epsilon }]\}\end{eqnarray} \tag{ 9c }$

with $i=1,2,3$ when the initial conditions ${{\hat{\rho }}_{i}}(0)$ of equation (4) are employed or $i=1,\ldots ,\;{{d}^{2}}$ with d the dimension of Hilbert space when a full basis of Liouville space is propagated. In equation (9c ), the states $\hat{\sigma }_{i}^{{\rm{old}}}$ are backward-propagated with the pulse of the previous iteration ('old'), whereas the states $\hat{\rho }_{i}^{{\rm{new}}}$ are forward-propagated with the updated pulse ('new'). The derivative with respect to the field is given by the commutator

$\begin{eqnarray}&&\frac{\partial \mathcal{L}\left( \hat{\rho } \right)}{\partial \epsilon }=-i\left[ \frac{\partial \hat{H}}{\partial \epsilon },\hat{\rho } \right]\end{eqnarray} \tag{ 10 }$

and has to be evaluated for the 'new' field and the states $\hat{\rho }$ propagated under the 'new' field. For a complex control, which occurs for example when using the rotating wave approximation (RWA), equation (9c ) holds for both the real and the imaginary part of $\epsilon (t)$ .

The value of the optimization functional in equation (1) depends on the number and the specific choice of initial states as well as the choice of weights. It is therefore not suitable to compare the convergence behavior between different sets of states. Instead, we employ the average gate fidelity,

$\begin{eqnarray}&&{{F}_{{\rm{avg}}}}=\mathop{\int }\nolimits\left\langle \Psi \right|{{\hat{O}}^{\dagger }}\mathcal{D}\left( \left| \Psi \right\rangle \left\langle \Psi \right| \right)\hat{O}\left| \Psi \right\rangle \;{\rm{d}}\Psi ,\end{eqnarray} \tag{ 11 }$

for the comparison. In equation (11), $\mathcal{D}$ denotes the dynamical map describing the time evolution of the open quantum system, i.e., $\hat{\rho }\left( T \right)=\mathcal{D}\left( \hat{\rho }(0) \right)$ . The gate fidelity, respectively the gate error, $1-{{F}_{{\rm{avg}}}}$ , is easily evaluated as [35]

$\begin{eqnarray}\begin{array}{ccccccccccccccc} {{F}_{{\rm{avg}}}} & = & \frac{1}{d\left( d+1 \right)}\mathop{\mathop{\sum }\nolimits}\limits_{i,j=1}^{d}\left( \langle {{\varphi }_{i}}|{{\hat{O}}^{\dagger }}\mathcal{D}\left( |{{\varphi }_{i}}\rangle \langle {{\varphi }_{j}}| \right)\hat{O}|{{\varphi }_{j}}\rangle \right. \\ {} & {} & +\left. {\mathsf Tr} \left[ \hat{O}|{{\varphi }_{i}}\rangle \langle {{\varphi }_{i}}|{{\hat{O}}^{\dagger }}\mathcal{D}\left( |{{\varphi }_{j}}\rangle \langle {{\varphi }_{j}}| \right) \right] \right). \\ \end{array}\end{eqnarray} \tag{ 12 }$

3. Example I: Diagonal gates

It is quite common that a two-qubit Hamiltonian allows only for diagonal gates, such as a controlled phasegate. A prominent example are noninteracting qubit carriers that interact only when excited into an auxiliary state where they accumulate a nonlocal phase [36]. Neutral trapped atoms with long-range interaction in a Rydberg state, present a physical implementation of this setting [36, 37]. Optimal control theory has been employed before to determine the minimum time in which a controlled phasegate can be implemented [38] and the optimum distribution of the single-qubit phases [39]. These optimizations were carried out, however, without explicitly accounting for decoherence. It is thus not clear whether the best solutions to avoid decoherence have indeed been identified. While the logical basis states and the Rydberg state are typically very long-lived, the main source of decoherence is spontaneous decay from an intermediate state, which is necessary to access the Rydberg state. Due to experimental feasibility, the excitation to the Rydberg state proceeds by a near-resonant two-photon process. The corresponding single atom Hamiltonian in the basis $\left\{ \left| 0 \right\rangle ,\left| 1 \right\rangle ,\left| i \right\rangle ,\left| r \right\rangle \right\}$ , cf. figure 1, and employing a two-color rotating wave approximation is given by

$\begin{eqnarray}{{\hat{H}}_{1}}=\left( \begin{array}{ccccccccccccccc} 0 & 0\ & \frac{1}{2}{{\Omega }_{R}}(t)\ & 0 \\ 0 & {{E}_{1}}\ & 0 & 0 \\ \frac{1}{2}{{\Omega }_{R}}(t)\ & 0\ & {{\Delta }_{1}} & \frac{1}{2}{{\Omega }_{B}}(t) \\ 0 & 0\ & \frac{1}{2}{{\Omega }_{B}}(t) & {{\Delta }_{2}} \\ \end{array} \right).\end{eqnarray} \tag{ 13a }$

The total Hamiltonian for two atoms includes an interaction when both atoms are in the Rydberg state,

$\begin{eqnarray}&&\hat{H}={{\hat{H}}_{1}}\otimes \mathbb{1}+\mathbb{1} \otimes {{\hat{H}}_{1}}-U\left| rr \right\rangle \left\langle rr \right|.\end{eqnarray} \tag{ 13b }$

Spontaneous emission from the intermediate level is accounted for by the dissipator

$\begin{eqnarray}&&{{\mathcal{L}}_{D}}\left( \hat{\rho } \right)=\gamma \left( \hat{A}\hat{\rho }{{\hat{A}}^{\dagger }}-\frac{1}{2}\left\{ {{\hat{A}}^{\dagger }}\hat{A},\hat{\rho } \right\} \right)\quad {\rm{with}}\quad \hat{A}=\left| 0 \right\rangle \left\langle i \right|,\end{eqnarray} \tag{ 14 }$

and γ the decay rate, $\gamma =1/\tau$ . The parameters correspond to optically trapped rubidium atoms and are summarized in table 1. Since qubit level $\left| 1 \right\rangle$ remains decoupled throughout the time evolution, cf. equation (13a ) and figure 1, the Hamiltonian (13) admits only diagonal gates. The update equations for real and imaginary part of the red and blue pulses are obtained by evaluating equation (9c ) for the Hamiltonian given in equation (13),

$\begin{eqnarray}&&\Re e\left\{ \Delta {{\Omega }_{R,B}}(t) \right\}=\frac{S(t)}{{{\lambda }_{a}}}\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{n}\Im m\left\{ {\rm{Tr}}\left[ \hat{\sigma }_{i}^{{\rm{old}}}(t)\left[ {{\hat{H}}_{R,B}}(t),\hat{\rho }_{i}^{{\rm{new}}}(t) \right] \right] \right\}\end{eqnarray} \tag{ 15a }$

$\begin{eqnarray}&&\Im m\left\{ \Delta {{\Omega }_{R,B}}(t) \right\}=\frac{S(t)}{{{\lambda }_{a}}}\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{n}\Im m\left\{ {\rm{Tr}}\left[ i\hat{\sigma }_{i}^{{\rm{old}}}(t)\left[ {{\hat{H}}_{R,B}}(t),\hat{\rho }_{i}^{{\rm{new}}}(t) \right] \right] \right\},\end{eqnarray} \tag{ 15b }$

where ${{\hat{H}}_{R,B}}$ represents the control Hamiltonians coupling to the red and blue laser, respectively, obtained by rewriting equation (13) as the sum of a diagonal drift Hamiltonian and the two control Hamiltonians,

$\begin{eqnarray}&&\hat{H}={{\hat{H}}_{{\rm{drift}}}}+{{\Omega }_{R}}(t){{\hat{H}}_{R}}+{{\Omega }_{B}}(t){{\hat{H}}_{B}}.\end{eqnarray} \tag{ 16 }$

**Figure 1.** Atomic levels for two-photon near-resonant excitation to a Rydberg state.
Download figure:
Standard image High-resolution image

Table 1. Parameters of the Hamiltonian, equation (13), for implementing a controlled phasegate with two rubidium atoms.

single-photon detuning ${{\Delta }_{1}}$	600 MHz
two-photon detuning ${{\Delta }_{2}}$	0
excitation energy ${{E}_{1}}$	6.8 GHz
Rabi frequencies ${{\Omega }_{R}}$ , ${{\Omega }_{B}}$	300 MHz
interaction energy U	50 MHz
lifetime $\tau =1/\gamma$	25 ns

Figure 2 shows the gate error of the controlled phasegate versus iteration of the optimization algorithm when using a full basis, i.e., 16 states, or using three, respectively, two, states in equation (15). The minimum number of states in this example is two since the Hamiltonian admits only diagonal gates, i.e., only phase errors and norm conservation within the logical subspace have to be checked. Therefore, ${{\hat{\rho }}_{1}}$ in equation (4a ) can be omitted, and the two remaining states are ${{\hat{\rho }}_{2}}$ (phase errors) and ${{\hat{\rho }}_{3}}$ (norm conservation) of equations (4b , 4c ). The relative weights ${{w}_{2}}$ and ${{w}_{3}}$ in equation (1) can be modified to emphasize one of the two aspects. Figure 2 therefore also compares two states with equal and unequal weights in equation (1), cf. green dotted and orange solid lines. The fastest convergence was obtained for ${{w}_{2}}{\rm{/}}{{w}_{3}}=10$ . The panels from top to bottom show the optimization without any dissipation, starting from a well-chosen guess pulse; an optimization starting with a bad guess pulse of insufficient fluence; and an optimization taking into account spontaneous decay from the intermediate level. As the main observation, figure 2 clearly demonstrates that only two states are sufficient to optimize a quantum gate for a Hamiltonian of this kind. The optimization for coherent time evolution (top panel) shows that while the use of three states converges to gate errors as small as those obtained with the full basis, the convergence rate is only about half that of the full basis. This is due to two factors: (i) For the optimization with three states, there is no bound on the distance between the value ${{J}_{T}}$ and the gate error, such that the path in the optimization landscape may be less direct until an asymptotic value is reached. Since without dissipation, there is no limit to the gate error, the convergence of ${{J}_{T}}$ and that of $1-{{F}_{{\rm{avg}}}}$ stay on different trajectories. (ii) The reduced sets of states are constructed specifically to take into account decoherence. In particular, the third state contributes significantly less information that is relevant for reaching the optimization target than the second state. The convergence can be improved dramatically by weighting the three states according to the relevance of the information they carry. In this respect, the use of only two initial states can be seen as choosing ${{w}_{1}}=0$ . Taking ${{w}_{2}}>{{w}_{3}}$ addresses the issue of ${{\hat{\rho }}_{3}}$ contributing less to the optimization. Choosing proper weights allows for ensuring the convergence of optimization with a reduced set of states to be as fast as the optimization using the full basis.

**Figure 2.** Optimizing a controlled phasegate for two trapped neutral atoms that are excited to a Rydberg state. The convergence is shown as the gate error, $1-{{F}_{{\rm{avg}}}}$ , over OCT iterations, using the full basis of 16 states (solid black lines), as well as a reduced set of three states (red dashed lines) and a reduced set of two states (green dotted and orange solid line). The calculations employ equal weights of all states, except for those shown in orange where ${{w}_{2}}{\rm{/}}{{w}_{3}}=10$ . The top and middle panels show optimizations without any dissipation; the middle panel shows a calculation with the same parameters as the top panel except for the guess pulse, which is badly chosen. The optimization shown in the bottom panel takes into account spontaneous emission from the intermediate state, with a lifetime of $\tau =25\;{\rm{ns}}$ . The gate duration is $T=50\;{\rm{ns}}$ for the top and middle panels, and $T=75\;{\rm{ns}}$ for the bottom panel. The number of iterations and the reached gate error differ significantly in all three situations, cf. the different x- and y-axes scales.
Download figure:
Standard image High-resolution image

The importance of choosing weights appropriate to the optimization problem becomes even more evident when the optimization starts from a bad guess pulse of insufficient fluence, as shown in the center panel of figure 2. The features observed in figure 2 are typical: The plateau near the beginning corresponds to the optimization increasing the intensity of the pulse without any significant improvement in the gate error, before converging quickly once the pulse is sufficiently intense. The end of the plateau can be significantly influenced by the choice of weights, cf. solid orange and dotted green curves in the middle panel of figure 2. Remarkably, the optimal choice of using two properly weighted initial states outperforms the use of the full basis. This might be explained by the fact that each of the three states in the reduced set has a specific physical role to play in the optimization, and this role can be emphasized by choice of the weight. In contrast, all states in the full basis fulfill the same role in the optimization, and thus there is no way in which different weights on individual states would improve the convergence.

One should point out that even in the cases where the use of two or three initial states shows a slower convergence than that of the full basis, they still outperform the full basis in terms of numerical resources. Since both CPU time and the required memory scale linearly with the number of initial states in the optimization, using only two states compared to 16 has a 1:8 advantage, which more than offsets the factor of two in the convergence rate in the middle panel of figure 2.

Naturally, without the presence of decoherence, there is no reason to perform the optimization in Liouville space. Therefore, the results shown here only serve to illustrate the general convergence behavior of a reduced set of initial states. The more relevant case of noncoherent dynamics is shown in the bottom panel of figure 2. The presence of decoherence implies the existence of an asymptotic bound on the gate error. This constraint on the optimization landscape (together with the further constraint that only diagonal gates are reachable) ensures that all sets of reduced states converge at a similar rate, once the asymptotic region is approached. We expect that all choices reach the same asymptotic value; which choice yields the best fidelity after a specific number of iterations cannot be predicted in general. Factoring in all necessary resources, optimization using two states with unequal weights dramatically outperforms optimization using the full basis in this example.

The optimized pulse and spectrum in the case of coherent dynamics is presented in figure 3. The result shown here is obtained from the optimization using two initial states with unequal weights. However, the pulse is indistinguishable from the one obtained using the full basis, consistent with the identical convergence behavior for the two sets in the upper panel of figure 2. The optimized pulses only show relatively small amplitude modulations compared to the guess pulse (dotted line). These modulations appear as small side-peaks in the spectrum. In the time interval in which there is a significant pulse amplitude, the complex phase only deviates by about $\frac{\pi }{10}$ from zero. This phase evolution is reflected in the asymmetry of the spectrum for the red and the blue pulse (bottom panel). The spectrum nicely illustrates the mechanism of control: while each spectrum by itself is asymmetric, the red pulse showing negative frequencies, the blue pulse showing positive frequencies, the sum of both pulses is again symmetric, i.e., positive and negative frequencies cancel out. This means that the combination of both pulses is two-photon resonant with the transition $\left| 0 \right\rangle \to \left| r \right\rangle$ , providing multiple pathways for the same transition whose interference might be exploited by the optimization.

**Figure 3.** The optimized pulses ${{\Omega }_{B,R}}(t)$ , cf. figure 1, resulting from optimization using two states with unequal weights without spontaneous decay (corresponding to the orange solid line in the top panel of figure 2). The pulse amplitudes are shown in the top panel, the complex phase in the center panel, and the pulse spectrum in the bottom panel. The guess pulse, indicated by the black dotted line in the top panel, is identical for both the red and the blue laser. In the spectrum, frequency 0 corresponds to the carrier frequencies of the laser pulses.
Download figure:
Standard image High-resolution image

The population dynamics induced by the optimized pulses are shown in figure 4. The two-photon resonance of the pulse expresses itself in a direct Rabi cycling between $\left| 0 \right\rangle$ and $\left| r \right\rangle$ on the left qubit in the propagation of $\left| 01 \right\rangle$ (top panel). The population shows roughly a $4\pi$ Rabi flip due to the relatively high pulse intensity. The nearly 25% of the population in the intermediate states in the propagation of $\left| 00 \right\rangle$ (bottom panel) is due to the fact that the decay from these levels was not included in the optimization, and thus the optimization algorithm makes no attempt at suppressing population in these states.

$\left| 01 \right\rangle $ — **Figure 4.** Population dynamics under the pulse shown in (3), for the logical basis states $\left| 01 \right\rangle$ (top) and $\left| 00 \right\rangle$ (bottom). The intermediate population ('int') is integrated over all levels with decay, i.e., $\left| 0i \right\rangle$ , $\left| i0 \right\rangle$ , $\left| ii \right\rangle$ , $\left| ir \right\rangle$ , and $\left| ri \right\rangle$ .
Download figure:
Standard image High-resolution image

For the optimization with dissipation, the optimized pulse and pulse spectrum is shown in figure 5. The characteristics of the pulses are quite different compared to the coherent case. The red pulse remains close to the single Gaussian peak of the guess pulse, except for being slightly narrower. The blue pulse has a more complex structure. It is overall broader than the red pulse and consists of three distinctive features: an initial peak that overlaps but precedes the red pulse, followed by some amplitude oscillations in the center of the pulse, and lastly another peak symmetric to the first, thus following the red laser pulse, with some overlap. For both pulses, the complex phase, shown in the center panel, is close to zero when there is significant pulse amplitude. In the spectrum (bottom panel), the overall narrowing and broadening of the red and blue pulse, respectively, is reflected in a broadening and narrowing of the central peak in the spectrum. The amplitude modulations on the blue pulse appear as side-lobes in the spectrum.

The initial and final peak of the blue pulse, together with the red pulse are reminiscent of the counter intuitive pulse scheme of STIRAP, with the blue laser acting as the 'Stokes' pulse and the red laser as 'pump'. The STIRAP-like behavior appears also in the population dynamics, shown in figure 6, as a population inversion between level $\left| 0 \right\rangle$ and $\left| r \right\rangle$ , without any population in the intermediate decaying state. The amplitude modulations in the central region of both pulses then induce some additional dynamics, generating the entanglement needed for the gate. Note that the pulse duration for the dissipative process ( $T=75\;{\rm{ns}}$ ) is longer than that of the coherent process ( $T=50\;{\rm{ns}}$ ). This is necessary to allow for an adiabatic time evolution that is essential to the STIRAP-like behavior. Overall, the decaying intermediate state population (red lines in figure 6) is almost completely suppressed, which is in contrast to the optimization not taking into account the dissipation, cf. the red lines in figure 4. Both figures 4 and 6 show a significant population of the $\left| rr \right\rangle$ state. This is not surprising, since the parameters of table 1 are not in the regime of the Rydberg blockade [36, 37].

**Figure 6.** Dissipative population dynamics under the pulse shown in figure 5, for the initial states $\hat{\rho }(0)=\left| 01 \right\rangle \left\langle 01 \right|$ (top) and $\hat{\rho }(0)=\left| 00 \right\rangle \left\langle 00 \right|$ (bottom). The intermediate population ('int') is integrated over all levels with decay, i.e., $\left| 0i \right\rangle$ , $\left| i0 \right\rangle$ , $\left| ii \right\rangle$ , $\left| ir \right\rangle$ , and $\left| ri \right\rangle$ .
Download figure:
Standard image High-resolution image

**Figure 6.** Dissipative population dynamics under the pulse shown in figure 5, for the initial states $\hat{\rho }(0)=\left| 01 \right\rangle \left\langle 01 \right|$ (top) and $\hat{\rho }(0)=\left| 00 \right\rangle \left\langle 00 \right|$ (bottom). The intermediate population ('int') is integrated over all levels with decay, i.e., $\left| 0i \right\rangle$ , $\left| i0 \right\rangle$ , $\left| ii \right\rangle$ , $\left| ir \right\rangle$ , and $\left| ri \right\rangle$ .
Download figure:
Standard image High-resolution image

4. Example II: Nondiagonal gates

Superconducting qubits represent a physical realization of a quantum processor where the Hamiltonian admits both diagonal and nondiagonal entangling gates. In fact, there exist superconducting architectures that admit several two-qubit gates simultaneously [40, 41]. We consider here the example of two transmon qubits coupled via a shared transmission line resonator. In the dispersive limit, the interaction of each qubit with the resonator leads to an effective coupling J between the two qubits, and the cavity can be integrated out [40]. The resulting Hamiltonian reads

$\begin{eqnarray}\begin{array}{ccccccccccccccc} \hat{H} & = & \left( {{\omega }_{1}}-\frac{{{\delta }_{1}}}{2} \right)\hat{b}_{1}^{\dagger }{{\hat{b}}_{1}}+\frac{{{\delta }_{1}}}{2}{{\left( \hat{b}_{1}^{\dagger }{{\hat{b}}_{1}} \right)}^{2}}+\left( {{\omega }_{2}}-\frac{{{\delta }_{2}}}{2} \right)\hat{b}_{2}^{\dagger }{{\hat{b}}_{2}}+\frac{{{\delta }_{2}}}{2}{{\left( \hat{b}_{2}^{\dagger }{{\hat{b}}_{2}} \right)}^{2}}+J\left( \hat{b}_{1}^{\dagger }{{\hat{b}}_{2}}+{{\hat{b}}_{1}}\hat{b}_{2}^{\dagger } \right) \\ {} & {} & +\Omega (t){\rm cos} \left( {{\omega }_{d}}t \right)\left( {{\hat{b}}_{1}}+\hat{b}_{1}^{\dagger }+{{\hat{b}}_{2}}+\hat{b}_{2}^{\dagger } \right), \\ \end{array}\end{eqnarray} \tag{ 17 }$

where ${{\hat{b}}_{1,2}}$ , $\hat{b}_{1,2}^{\dagger }$ are the ladder operators for the first and second qubit, ${{\omega }_{1,2}}$ and ${{\delta }_{1,2}}$ represent the frequency and anharmonicity, J is the effective qubit-qubit-interaction, and $\Omega (t)$ and ${{\omega }_{d}}$ are amplitude and frequency of the drive, respectively. The two most relevant dissipation channels are energy relaxation and pure dephasing of the qubits, described by the decay rate $\gamma =1{\rm{/}}{{T}_{1}}$ and dephasing rate ${{\gamma }_{\phi }}=1{\rm{/}}T_{2}^{*}$ for each qubit. The corresponding dissipator reads

$\begin{eqnarray}&&{{\mathcal{L}}_{D}}\left( \hat{\rho } \right)=\mathop{\mathop{\sum }\nolimits}\limits_{q=1,2}\left( {{\gamma }_{q}}\mathop{\mathop{\sum }\nolimits}\limits_{j=1}^{N-1}iD\left[ \left| j-1 \right\rangle {{\left\langle j \right|}_{q}} \right]\hat{\rho }+{{\gamma }_{\phi ,q}}\mathop{\mathop{\sum }\nolimits}\limits_{j=0}^{N-1}\sqrt{i}D\left[ \left| j \right\rangle {{\left\langle j \right|}_{q}} \right]\hat{\rho } \right),\end{eqnarray} \tag{ 18 }$

with

$\begin{eqnarray}&&D\left[ \hat{A} \right]\hat{\rho }=\hat{A}\hat{\rho }{{\hat{A}}^{\dagger }}-\frac{1}{2}\left( {{\hat{A}}^{\dagger }}\hat{A}\hat{\rho }+\hat{\rho }{{\hat{A}}^{\dagger }}\hat{A} \right)\end{eqnarray} \tag{ 19 }$

and each qubit, $q=1,2$ , truncated at level N. The parameters of the coupled transmon qubits are summarized in table 2. We employ a RWA, centered at the drive frequency ${{\omega }_{d}}$ .

Table 2. Parameters of the transmon Hamiltonian, equation (17), and Liouvillian, equation (18), taken from [40].

qubit frequency ${{\omega }_{1}}$	4.3796 GHz
qubit frequency ${{\omega }_{2}}$	4.6137 GHz
drive frequency ${{\omega }_{d}}$	4.4985 GHz
anharmonicity ${{\delta }_{1}}$	−239.3 MHz
anharmonicity ${{\delta }_{2}}$	−242.8 MHz
effective qubit-qubit coupling J	−2.3 MHz
qubit 1 decay time ${{T}_{1}}$	38.0 $\mu {\rm{s}}$
qubit 2 decay time ${{T}_{1}}$	32.0 $\mu {\rm{s}}$
qubit 1 dephasing time $T_{2}^{*}$	29.5 $\mu {\rm{s}}$
qubit 2 dephasing time $T_{2}^{*}$	16.0 $\mu {\rm{s}}$

The Hamiltonian in equation (17) can generate a large number of entangling two-qubit gates; we find $\sqrt{i{\rm{SWAP}}}$ to be a fast converging nondiagonal perfect entangler, and thus choose

$\begin{eqnarray}\hat{O}=\left( \begin{array}{ccccccccccccccc} 1 & 0 & 0 & 0 \\ 0 & \frac{1}{\sqrt{2}} & \frac{i}{\sqrt{2}} & 0 \\ 0 & \frac{i}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} \right)\end{eqnarray} \tag{ 20 }$

as the optimization target. Figure 7 shows the convergence behavior for several choices of initial states: the 16 canonical states of the full basis of Liouville space; the 3 states given in equation (4) with equal weight and with ${{w}_{1}}{\rm{/}}{{w}_{2}}={{w}_{1}}{\rm{/}}{{w}_{3}}=20$ ; a set of 5 states consisting of ${{\hat{\rho }}_{1}}$ expanded into 4 pure states, cf. equation (5) plus ${{\hat{\rho }}_{2}}$ of equation (4b ); and lastly a set of 8 states, cf. equations (5) and (6), consisting of the expansion of ${{\hat{\rho }}_{1}}$ and the 4 pure states of a mutually unbiased basis, as explained in section 2. As seen in the top panel, all choices show good convergence. A plateau corresponding to a slowing of convergence is observed only for the 3 states with equal weights. But even in this case, the same asymptotic value for the gate error is obtained as for the other choices; see also figure 7(d). The advantage of employing the reduced sets of states in the optimization functional, equation (1), becomes most apparent in figure 7(b) which shows the gate error over the number of state propagations. Since optimization requires two propagations per iteration and state, i.e., the backward and forward propagation in equation (9), the number of state propagations corresponds directly to the CPU time that is required to obtain a given fidelity. Figures 7(c) and (d) shows a zoom on the same data, once for the initial phase of the optimization and once for the asymptotic behavior. All reduced sets except for the three states with equal weights perform better than the full set during the initial phase. Also, for this specific optimization problem, all reduced sets reach a slightly better asymptotic value than the full set, although we expect that ultimately all curves will converge to the same value. Figure 7 suggests that the reduced sets have a significant advantage in reaching a good fidelity with a given amount of resources, especially since in practice, an optimization is usually stopped near the beginning of the asymptotic regime. Indeed, the full set shows an advantage only in the intermediate regime between gate errors of 10 and 1 percent, and only over the sets of three states. The choice of 5 or 8 states outperforms the full set in all cases. One should note that the savings in computational resources due to the use of a reduced set of states also extend to the amount of memory required, which is proportional to the number of states. Since in the optimization algorithm, propagated states over the entire time grid need to be stored, these savings can be very substantial.

**Figure 7.** Optimizing a $\sqrt{i{\rm{SWAP}}}$ gate for two transmons in the presence of energy relaxation and pure dephasing (with the rates given in table 2): Convergence for five choices of sets of initial states, as described in the text. The gate duration is $T=400\;{\rm{ns}}$ . The panels from top to bottom show the gate error over the number of iterations; the gate error over the number of state propagations, indicative of the required CPU time; a zoom on the initial phase of the optimization; and a zoom on the asymptotic convergence (panels (c) and (d) both using a linear scale). The number of propagations (x-axis in panels (b)–(d) is a linear rescaling of the number of OCT iterations (x-axis in panel (a)), with 2 propagations per iteration and state, i.e., the lines of panel (a) are rescaled differently depending on the respective number of states. Since all panels only show different views on the same data, the line colors and styles are the same in all of them.
Download figure:
Standard image High-resolution image

For the three states with equal weights the gate error shows a non-monotonic behavior in the upper-left corner of figure 7(c). This is due to the optimization functional, equation (1), not being equivalent to the gate error ${{F}_{{\rm{avg}}}}$ , equation (11). Specifically, for a set of three states, no bound on the distance between ${{J}_{T}}$ and $1-{{F}_{{\rm{avg}}}}$ can be derived [29]. Thus, the gate error might increase even though ${{J}_{T}}$ decreases. In fact, the behavior of ${{J}_{T}}$ is fully monotonic as expected (data not shown). With an increasing number of states in the chosen set, the value of the optimization functional is more closely connected to the gate fidelity; and for 5 and 8 states numerical, respectively analytical, bounds can be found [29, 31]. For this reason, we expect the sets of 5 and 8 states to show a faster convergence than the 3 states, when measured in OCT iterations, although not necessarily in CPU time. This expectation is confirmed by figure 7. The weak correspondence between the optimization functional and the gate error for three states is most likely also the reason for the plateau observed for the red dashed line in figures 7(a) and (b). However, the use of three states can still be a good choice since weighting the states properly improves the convergence significantly. The weights have to be chosen empirically, but the choice can be guided by physical intuition. The three states are responsible for ensuring that the realized gate is diagonal in the correct basis, that the relative phases match the target once the correct basis has been found, and that the gate is unitary on the logical subspace, respectively. The weights should reflect which of these requirements is most difficult to realize. In the present example this is finding the correct basis in which the gate is diagonal. Therefore the choice of ${{w}_{1}}{\rm{/}}{{w}_{2}}={{w}_{1}}{\rm{/}}{{w}_{3}}=20$ gave the best convergence rate. This is in contrast to the optimization of the Rydberg gate in section 3, in which the gate was already known to be diagonal, and the first state could be left out of the optimization entirely. Generally, using the set of three states with equal weights is not recommended.

Comparing figure 7 with the bottom panel of figure 2 for the Rydberg gate shows that the different choices of basis sets show a slightly wider range of the convergence rate. This can be attributed to the fact that for the Rydberg gate, the optimization landscape is severely constrained since only diagonal gates can be reached. In contrast, the transmon Hamiltonian can generate both diagonal and nondiagonal gates, resulting in a more complex optimization landscape. Different choices of initial states can thus take more strongly varying pathways.

Figure 8 shows the optimization of a $\sqrt{i{\rm{SWAP}}}$ gate for two transmons in the case of weak dissipation, where the decay and dephasing times from table 2 have been increased by a factor of 10. A comparison of figure 8(a) with figure 7(a) shows that the convergence behavior is essentially the same except for the value of the asymptote. We find an asymptotic gate error of approximately $7\times {{10}^{-3}}$ with full dissipation, $7\times {{10}^{-4}}$ with weak dissipation, and no asymptote without dissipation (data not shown). The value of the asymptote is logarithmically proportional to the decay and dephasing rates. This is as expected since the pulse duration is kept constant at 400 ns and the gate fidelity is solely limited by dissipation. Our claim that the dissipation only affects the asymptotic convergence is supported by a comparison of the initial convergence in figures 7(c) and 8(c), which remarkably are completely identical. Furthermore, the crossing between the black solid and red dot-dashed lines for the full basis and the three states with unequal weights near 1000 propagations and that between the blue dotted and orange dash-dash-dotted lines for the sets of 5, respectively 8, states near 1300 propagations in figure 8(d) can also be seen in figure 7(d). There are however some slight differences in the asymptotically reached values, in that the choice of 3 states (with both equal and unequal weights) reaches a slightly smaller gate error than in the case of full dissipation. Again, we expect that ultimately, all curves will converge to the same value. Which set of states reaches the best gate error at a specific point near the beginning of the asymptotic region seems to depend on the slope of the convergence curve as the limit is approached. This can depend on any number of factors including, e.g., the choice of ${{\lambda }_{a}}$ in equation (2). Again, empirically, the reduced sets of states show a significant numerical advantage over the full basis also for weak dissipation.

As an example, the optimized pulse obtained using a set of three states with unequal weights, taking into account the full dissipation, is presented in figure 9, along with the pulse spectrum. The population dynamics that this pulse induces when propagating the logical basis states $\hat{\rho }\left( t=0 \right)=\left| 01 \right\rangle \left\langle 01 \right|$ and $\hat{\rho }\left( t=0 \right)=\left| 11 \right\rangle \left\langle 11 \right|$ is shown in figure 10. As can be seen in the top panel of figure 9, the optimized pulse shows small oscillations around the guess peak amplitude of 35 MHz. The complex phase, shown in the middle panel, stays relatively close to zero, indicating that the optimization employs mainly amplitude modulation. The pulse amplitude is roughly time-symmetric. The pulse spectrum shown in the bottom panel of figure 9 relates easily to the pulse shape. The strongest frequency component remains the driving frequency of the guess pulse (zero in the spectrum). The small oscillations in the pulse shape are approximately 8 $\;{\rm{ns}}$ apart, corresponding to a frequency of ±125 MHz, which is present in the spectrum. There are peaks with exponentially decaying amplitude in the spectrum at multiples of these values. The width of the central peak is due to the 20 $\;{\rm{ns}}$ switch-on and switch-off time of the pulse, and is unchanged from the guess pulse. The fact that there is not a single, but a double peak around ±125 MHz corresponds the slow beats in the pulse shape. The slight asymmetry of the spectrum is caused by the complex phase of the optimized pulse.

**Figure 9.** Shape and spectrum of an optimized pulse, from optimization with 3 weighted states, with strong dissipation. The panels from top to bottom show the amplitude, complex phase, and spectrum of the optimized pulse $\Omega (t)$ . The spectrum is shown in the rotating frame, with zero corresponding to the driving frequency ${{w}_{d}}$ of the field. The transition frequencies from the logical subspace are indicated by vertical dashed lines. These are ${{\Delta }_{1}}={{w}_{1}}-{{w}_{d}}=-118.88\;{\rm{MHz}}$ and ${{\Delta }_{1}}-{{\delta }_{1}}=-358.18\;{\rm{MHz}}$ in red for the left qubit, and ${{\Delta }_{2}}={{w}_{2}}-{{w}_{d}}=115.20\;{\rm{MHz}}$ and ${{\Delta }_{2}}-{{\delta }_{2}}=-127.58\;{\rm{MHz}}$ in blue for the right qubit. The central peak in the spectrum has been cut off to show the relevant side-peaks, and would extend to a value of approximately 10.0. For all quantities, the values for the guess pulse are shown as a dotted line.
Download figure:
Standard image High-resolution image

**Figure 9.** Shape and spectrum of an optimized pulse, from optimization with 3 weighted states, with strong dissipation. The panels from top to bottom show the amplitude, complex phase, and spectrum of the optimized pulse $\Omega (t)$ . The spectrum is shown in the rotating frame, with zero corresponding to the driving frequency ${{w}_{d}}$ of the field. The transition frequencies from the logical subspace are indicated by vertical dashed lines. These are ${{\Delta }_{1}}={{w}_{1}}-{{w}_{d}}=-118.88\;{\rm{MHz}}$ and ${{\Delta }_{1}}-{{\delta }_{1}}=-358.18\;{\rm{MHz}}$ in red for the left qubit, and ${{\Delta }_{2}}={{w}_{2}}-{{w}_{d}}=115.20\;{\rm{MHz}}$ and ${{\Delta }_{2}}-{{\delta }_{2}}=-127.58\;{\rm{MHz}}$ in blue for the right qubit. The central peak in the spectrum has been cut off to show the relevant side-peaks, and would extend to a value of approximately 10.0. For all quantities, the values for the guess pulse are shown as a dotted line.
Download figure:
Standard image High-resolution image

**Figure 10.** Population dynamics for $\hat{\rho }\left( t=0 \right)=\left| 01 \right\rangle \left\langle 01 \right|$ (a) and $\hat{\rho }\left( t=0 \right)=\left| 11 \right\rangle \left\langle 11 \right|$ (b) under the pulse shown in figure 9. For each of the two propagated states, the expectation value of the right qubit excitation quantum number j is shown in the top panel, with the standard deviation in gray, the expectation value for the corresponding quantum number i for the left qubit is shown in the center panel, and the population dynamics for all the logical subspace states is shown in the bottom panel (colored lines), along with the total population in the logical subspace (black line).
Download figure:
Standard image High-resolution image

The spectrum of the optimized pulse is very instructive in understanding the population dynamics in figure 10. The most relevant transition frequencies from the logical subspace are indicated by vertical lines in the spectrum in the lower panel of figure 9. Clearly, the peaks around ±125 MHz are nearly resonant with the excitation of the left and right qubit, and the excitation to level $\left| 2 \right\rangle$ of the right qubit. There is no significant component in the spectrum that could excite to the level $\left| 2 \right\rangle$ of the left qubit. Consequently, in the population dynamics of both the $\left| 01 \right\rangle \left\langle 01 \right|$ and $\left| 11 \right\rangle \left\langle 11 \right|$ state, the right qubit (top panel) leaves the logical subspace (expectation value $\left\langle j \right\rangle >1.0$ ) to a much more significant extent than the left qubit (middle panel). This behavior is slightly more pronounced for $\left| 11 \right\rangle \left\langle 11 \right|$ , which is the only state for which the total subspace population (gray curve in bottom panel) drops below 80% for a significant amount of time. The fact that for all logical basis states, most of the dynamics occurs within the logical subspace is due to the presence of decoherence, where higher levels have faster decay and faster dephasing due to a stronger coupling to the cavity. In an optimization without dissipation (data not shown), the optimized dynamics would generally veer farther outside the logical subspace. Lastly, the population dynamics show the expected behavior for the $\sqrt{i{\rm{SWAP}}}$ gate: the $\left| 01 \right\rangle$ state ends up in a coherent superposition between $\left| 01 \right\rangle$ and $\left| 10 \right\rangle$ , whereas $\left| 11 \right\rangle$ returns to its original state at the end of the gate.

5. Conclusions

We have utilized the fact that the average error of a quantum gate can be estimated from the time evolution of a reduced set of states [28, 29] to construct a dedicated functional for quantum gate optimization in open quantum systems. Our optimization functional consists of Hilbert–Schmidt products that compare the actual and ideal time-evolved states from the reduced set. The minimal number of states that need to be forward and backward propagated during optimization is two for Hamiltonians that admit only diagonal gates and three for Hamiltonians that allow for both diagonal and nondiagonal gates. Remarkably, the size of the minimal set of states is independent of Hilbert space dimension.

While the minimal number of states allows for determining whether a quantum gate has been implemented, it is insufficient to deduce bounds on the gate error [29]. Numerical bounds require $d+1$ states in the reduced set, where d is the dimension of the Hilbert space on which the optimization target is defined. In order to obtain meaningful analytical bounds on the gate error, $2d$ states are necessary. Employing the sets of $d+1$ , respectively $2d$ , states in quantum gate optimization is still significantly more efficient, both with respect to CPU time and memory requirements, than utilizing a full basis of Liouville space, with ${{d}^{2}}$ elements [9, 12, 23].

We have demonstrated the power of our approach in the optimization of a diagonal and a nondiagonal two-qubit gate. Specifically, we have optimized a controlled phasegate for trapped neutral atoms that are excited into a Rydberg state and subject to fast spontaneous emission from an intermediate state. The best performance was achieved by two states in the reduced set and a large weight of the Hilbert–Schmidt product for the state responsible for detecting phase errors. In the optimization of a $\sqrt{i{\rm{SWAP}}}$ gate for two transmons coupled to the same transmission line cavity and subject to both energy relaxation and pure dephasing, we have found the best, and roughly identical, performance for the reduced sets consisting of $d+1$ , respectively $2d$ , states. In all cases, the final gate error was limited by the decoherence rates. This confirms that employing a reduced set of states in quantum gate optimization is sufficient to determine the physical limit for the gate error.

The significant reduction in computational resources that we report here opens the door for a large-scale, systematic investigation of the fundamental limits of high-fidelity quantum gates in the presence of decoherence. Our approach is not tied to a specific decoherence model. It therefore allows to explore, using optimal control theory, settings for extended Hilbert spaces and beyond Markovian master equations, where a quantum systemʼs complexity may possibly be exploited for control.

Acknowledgments

We would like to thank Giulia Gualdi, Matthias M Müller, Felix Motzoi, Alireza Shabani, and Birgitta Whaley for fruitful discussions and the Kavli Institute of Theoretical Physics at the University of California at Santa Barbara for hospitality. This research was supported in part by the Deutscher Akademischer Austauschdienst and by the National Science Foundation (Grant No. NSF PHY11-25915).

Appendix A.: Three states are sufficient to assess whether a desired target unitary is implemented

In the following we discuss the functional ${{J}_{dist}}$ ,

$\begin{eqnarray}&&{{J}_{dist}}=\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{3}{\mathsf Tr} \left[ {{\left( \hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }}-{{\hat{\rho }}_{i}}\left( T \right) \right)}^{2}} \right],\end{eqnarray} \tag{ A1 }$

which is built on the distance between the ideal and actual states at time T. It attains its global minimum, ${{J}_{dist}}=0$ , if and only if the initial states defined in section 2, ${{\hat{\rho }}_{i}}(0)$ for $i=1,2,3$ , are mapped to their correct target states, i.e., fulfill condition (3). This functional motivates the use of the optimization functional ${{J}_{T}}$ , equation (1), which is also built on only three states, as discussed in section A1. ${{J}_{T}}$ and ${{J}_{dist}}$ differ in that ${{J}_{T}}$ evaluates the Hilbert–Schmidt products, i.e., the projections of the actual onto the ideal states instead of the trace distance. The construction of ${{J}_{dist}}$ , and subsequently ${{J}_{T}}$ , is rationalized by a theorem for unital, i.e., identity preserving, dynamical maps. Specifically, the theorem states that a complete and totally rotating set of density matrices is sufficient to determine whether a given time evolution is unitary. The functional (A1) exploits the further property of a complete and totally rotating set of density matrices to differentiate any two unitaries [29]. The theorem for unital dynamical maps is proven in section A2.

It should be stressed that we use ${{J}_{T}}$ , equation (1), instead of ${{J}_{dist}}$ , equation (A1), as optimization functional. This is motivated by the convexity of ${{J}_{T}}$ which implies a much more favorable convergence behavior than would be obtained with a nonconvex functional³ . Mathematically, however, the two functionals are not equivalent. This is illustrated by rewriting a single summand of ${{J}_{dist}}$ , equation (A1), and comparing it to the corresponding term in ${{J}_{T}}$ , equation (1),

$\begin{eqnarray}\begin{array}{ccccccccccccccc} {\mathsf Tr} \left[ {{\left( \hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }}-{{\hat{\rho }}_{i}}\left( T \right) \right)}^{2}} \right] & = & {\mathsf Tr} \left[ {{\left( \hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }} \right)}^{2}} \right]-2{\mathsf Tr} \left[ \hat{O}{{\hat{\rho }}_{i}}(0){{\hat{O}}^{\dagger }}{{\hat{\rho }}_{i}}\left( T \right) \right] \\ {} & {} & +{\mathsf Tr} \left[ {{\left( {{\hat{\rho }}_{i}}\left( T \right) \right)}^{2}} \right]. \\ \end{array}\end{eqnarray} \tag{ A2 }$

The first term on the rhs of equation (A2) is constant and thus irrelevant. The second term corresponds to the Hilbert–Schmidt overlap as used in ${{J}_{T}}$ , equation (1), up to a prefactor. The main difference between ${{J}_{T}}$ and ${{J}_{dist}}$ is due to the third term, the purity of the propagated density matrix. ${{J}_{T}}$ neglects this term. This could potentially disturb convergence, because the functional value of ${{J}_{T}}$ can be decreased by (artificial) purification of the totally mixed states ${{\hat{\rho }}_{1}}$ and ${{\hat{\rho }}_{3}}$ , cf. equation (4), instead of being decreased due to the desired approach to the target. Note that this problem can only arise for mixed states, i.e., when using the minimal set of states. For the reduced sets consisting of $d+1$ , respectively $2d$ , states, propagation starts from pure states, and the global minimum of ${{J}_{T}}$ is identical to the global minimum of ${{J}_{dist}}$ . Note that the problem of artificial purification is purely hypothetical and was never encountered in our optimizations –'artificial purification traps' in the optimization landscape of the functional ${{J}_{T}}$ with mixed states are apparantly avoided.

A.1. Construction of the functional

We first define the concept of complete and total rotation, which we then use to formulate the required theorem. Let $\mathcal{H}$ be a Hilbert space with dimension N. Let $\mathcal{A}$ be a set of N one-dimensional orthogonal projectors. A one-dimensional projector is a projector with rank one, which means that its spectrum consists of a single eigenvalue equal to one with all remaining eigenvalues being zero.

Definition: A one-dimensional projector ${{\hat{P}}_{TR}}$ is called totally rotated with respect to the set $\mathcal{A}$ if $\forall \hat{P}\in \mathcal{A}:\ {{\hat{P}}_{TR}}\hat{P}\ne 0$ .

Definition: A set of density operators, $\left\{ {{\hat{\rho }}_{i}} \right\}$ with ${{\hat{\rho }}_{i}}\in \mathcal{H}\otimes \mathcal{H}$ , is called complete if the set $\mathcal{P}$ of projectors onto the eigenspaces of $\left\{ {{\hat{\rho }}_{i}} \right\}$ contains exactly N one-dimensional orthogonal projectors.

Definition: A set of density operators, $\left\{ {{\hat{\rho }}_{i}} \right\}$ with ${{\hat{\rho }}_{i}}\in \mathcal{H}\otimes \mathcal{H}$ , is called complete and totally rotating if it is complete and there exists a one-dimensional projector in $\mathcal{P}$ that is totally rotated with respect to the orthogonal set of one-dimensional orthogonal projectors necessary for completeness.

Theorem 1. Let ${\rm{DM}}\left( {\rm{N}} \right)$ be the space of N × N density matrices and $\mathcal{D}:{\rm{DM}}\left( {\rm{N}} \right)\mapsto {\rm{DM}}\left( {\rm{N}} \right)$ a dynamical map. The following three statements are equivalent:

1.
$\mathcal{D}$ is unitary, i.e., $\mathcal{D}\left( \rho \right)=U\rho {{U}^{\dagger }}$ $\forall \rho \in {\rm{DM}}\left( {\rm{N}} \right)$ and U some element of the projective unitary group, $U\in {\rm{PU}}\left( {\rm{N}} \right)$ .
2.
$\mathcal{D}$ maps a set $\mathcal{A}$ of N one-dimensional orthogonal projectors onto a set of N one-dimensional orthogonal projectors as well as a totally rotated projector ${{\hat{P}}_{TR}}$ (with respect to $\mathcal{A}$ ) onto a one-dimensional projector.
3.
$\mathcal{D}$ is unital and leaves the spectrum of a complete and totally rotating set of density matrices invariant.

We now explain how Theorem 1 can be used to prove the claim that ${{J}_{dist}}$ , equation (A1), attains its global minimum if and only if condition (3) is fulfilled for the three states defined in section 2. We first discuss the role of ${{\hat{\rho }}_{3}}=\frac{1}{N}\mathbb{1}$ . It is used to check whether the evolution corresponds to a dynamical map in the optimization subspace and whether it is unital. This dynamical map is obtained by projecting the action of the dynamical map, defined on the total Hilbert space, onto the optimization subspace. The term in the functional (A1) involving ${{\hat{\rho }}_{3}}=\frac{1}{N}\mathbb{1}$ becomes minimal, and so does the total functional, only if the identity in the optimization subspace is mapped onto itself. Minimization of ${{J}_{dist}}$ thus ensures a unital dynamical map on the subsystem such that Theorem 1 is applicable.

We now discuss the role of ${{\hat{\rho }}_{1}}$ and ${{\hat{\rho }}_{2}}$ which by construction form a complete and totally rotating set of density matrices. The functional (A1) becomes zero only if $\mathcal{D}\left( {{\hat{\rho }}_{1}} \right)=\hat{O}{{\hat{\rho }}_{1}}{{\hat{O}}^{\dagger }}$ and $\mathcal{D}\left( {{\hat{\rho }}_{2}} \right)=\hat{O}{{\hat{\rho }}_{2}}{{\hat{O}}^{\dagger }}$ . This requires the actual evolution to be unitary. Unitary evolution leaves the spectrum of a density matrix invariant. Due to the equivalence relation $\left( 1 \right)\Longleftrightarrow \left( 3 \right)$ in Theorem 1, preservation of the spectrum of a complete and totally rotating set of density matrices, i.e., the two states ${{\hat{\rho }}_{1}}$ and ${{\hat{\rho }}_{2}}$ , is sufficient to ensure unitarity. Furthermore, it was proven in [29] that the density matrices ${{\hat{\rho }}_{1}}$ and ${{\hat{\rho }}_{2}}$ are unitary differentiating, i.e., it is possible to distinguish any two unitary evolutions by inspection of ${{\hat{\rho }}_{1}}\left( T \right)$ and ${{\hat{\rho }}_{2}}\left( T \right)$ only. In particular there is only one unitary dynamical map, $\mathcal{D}\left( \hat{\rho } \right)=\hat{U}\hat{\rho }{{\hat{U}}^{\dagger }}$ , which leads to $\mathcal{D}\left( {{\hat{\rho }}_{i}}(0) \right)=\hat{O}{{\hat{\rho }}_{i}}{{\hat{O}}^{\dagger }}$ for both i = 1,2, namely the one induced by the target unitary $\hat{O}$ . Therefore the functional (A1) becomes minimal if and only if the target gate $\hat{O}$ is implemented.

To summarize, ${{J}_{dist}}$ is additively composed of three terms, each corresponding to a distance measure between the desired result, $\hat{O}{{\hat{\rho }}_{i}}\hat{O}$ , and the actually implemented evolution, $\mathcal{D}\left( {{\hat{\rho }}_{i}} \right)$ . For the total functional to be minimal, the evolutions of all three states have to match. This is the case only if a unital dynamical map on the optimization subspace is implemented and if this is the unitary evolution according to $\hat{O}$ . More explicitly, the distance measure formed by the density matrices i = 1,2 is only meaningful provided the evolution within the optimization subspace corresponds to a unital dynamical map. However, this is ensured by the third density matrix. Consequently, the global minimum of the functional (A1) will only be attained if this condition is fulfilled, too.

Note that the functional (A1) weights all three states equally. This is not a unique choice. In fact, all crucial properties of the functional remain unchanged when scaling the three terms with different positive factors, which has been done in the main text for example when discussing the optimisation using three states with weighting which significantly improved the performance of the optimization.

A.2. Proof

We utilize in the following the representation of operators by N × N matrices and therefore omit the operator notation. In order to prove Theorem 1, it is useful to first show the validity of the following lemma.

Lemma 1. Let $\mathcal{D}$ be a unital dynamical map, i.e., $\mathcal{D}$ is completely positive and maps identity onto itself, acting on $N\times N$ density matrices. If and only if there exists a set of N one-dimensional, orthogonal projectors that is mapped by $\mathcal{D}$ onto another set of N one-dimensional orthogonal projectors, there exists a complete set of density matrices whose spectrum is invariant under $\mathcal{D}$ .

Proof of Lemma 1: ( $\Longleftarrow$ direction) We denote the set of N one-dimensional projectors ${{P}_{i}}$ by $\mathcal{P}$ . By assumption, we know that

$\begin{eqnarray*}&&\forall i:\ \mathcal{D}\left( {{P}_{i}} \right)={{\tilde{P}}_{i}},\end{eqnarray*}$

where the ${{\tilde{P}}_{i}}$ also form a set of N one-dimensional, orthogonal projectors. Clearly, ${\rm{spec}}\left( {{P}_{i}} \right)={\rm{spec}}\left( {{\tilde{P}}_{i}} \right)$ , hence $\forall {{P}_{i}}\in \mathcal{P}$

$\begin{eqnarray*}&&{\rm{spec}}\left( \mathcal{D}\left( {{P}_{i}} \right) \right)={\rm{spec}}\left( {{P}_{i}} \right)=(1,0,...,\;0).\end{eqnarray*}$

Obviously, $\mathcal{P}$ itself corresponds to a specific complete set of density matrices, ${{\rho }_{i}}={{P}_{i}}$ .

( $\Longleftarrow$ direction) This part of the proof proceeds as follows: First we show that the assumption, a dynamical map leaving the spectrum of a given density matrix invariant, implies that $\mathcal{D}$ maps projectors onto the eigenspaces of the initial density matrices into projectors onto the eigenspaces of the resulting density matrix with the same eigenvalue. As a consequence, a one-dimensional projector onto a corresponding one-dimensional eigenspace is mapped into a one-dimensional projector. We then repeat this argument for all density matrices in the complete set. In this set, by definition, there exist density matrices with N one-dimensional, orthogonal projectors onto one-dimensional eigenspaces which, according to the first step of the $\Longleftarrow$ proof, is mapped onto another set of one-dimensional projectors. We show in a second step that the set of the mapped one-dimensional projectors is also orthogonal.

We start by assuming that $\mathcal{D}$ leaves the spectrum of a given density matrix, ρ, invariant,

$\begin{eqnarray*}&&{\rm{spec}}\left( \mathcal{D}\left( \rho \right) \right)={\rm{spec}}\mathop{\mathop{\sum }\nolimits}\limits_{k}\left( {{E}_{k}}\rho E_{k}^{\dagger } \right)={\rm{spec}}\left( \rho \right),\end{eqnarray*}$

where we have expressed $\mathcal{D}$ in terms of Kraus operators ${{E}_{k}}$ . We can write $\rho ={{\mathop{\sum }\nolimits}_{i}}{{\lambda }_{i}}P_{i}^{\prime }$ where $\mathcal{P}\prime =\left\{ P_{i}^{\prime } \right\}$ is a set of M orthogonal projectors onto the eigenspaces of ρ with M the number of distinct eigenvalues of ρ. We assume the ${{\lambda }_{i}}$ to be ordered by magnitude with ${{\lambda }_{1}}$ corresponding to the largest eigenvalue. Since we know that the spectrum of $\mathcal{D}\left( \rho \right)$ to be identical to that of ρ, we can decompose $\mathcal{D}\left( \rho \right)$ ,

$\begin{eqnarray*}&&\mathcal{D}\left( \rho \right)=\mathop{\mathop{\sum }\nolimits}\limits_{i}{{\lambda }_{i}}\tilde{P}_{i}^{\prime }\end{eqnarray*}$

with $\left\{ \tilde{P}_{j}^{\prime } \right\}$ another set of M orthogonal projectors. Note that neither the $P_{i}^{\prime }$ nor the $\tilde{P}_{i}^{\prime }$ have to be one-dimensional but for a given i, $\tilde{P}_{i}^{\prime }$ has the same dimensionality as the corresponding $P_{i}^{\prime }$ . Specifically,

$\begin{eqnarray*}&&\mathcal{D}\left( \rho \right)=\mathcal{D}\left( \mathop{\mathop{\sum }\nolimits}\limits_{i}{{\lambda }_{i}}P_{i}^{\prime } \right)=\mathop{\mathop{\sum }\nolimits}\limits_{i}{{\lambda }_{i}}\mathcal{D}\left( P_{i}^{\prime } \right)=\mathop{\mathop{\sum }\nolimits}\limits_{j}{{\lambda }_{j}}\tilde{P}_{j}^{\prime }\;.\end{eqnarray*}$

Multiplying by another projector $\tilde{P}_{p}^{\prime }$ from the set, where p can take integer values between 1 and M, we obtain

$\begin{eqnarray}&&\mathop{\mathop{\sum }\nolimits}\limits_{i}{{\lambda }_{i}}\mathcal{D}\left( P_{i}^{\prime } \right)\tilde{P}_{p}^{\prime }=\mathop{\mathop{\sum }\nolimits}\limits_{j}{{\lambda }_{j}}\tilde{P}_{j}^{\prime }\tilde{P}_{p}^{\prime }={{\lambda }_{p}}\tilde{P}_{p}^{\prime },\end{eqnarray} \tag{ A3 }$

since $\tilde{P}_{j}^{\prime }$ , $\tilde{P}_{p}^{\prime }$ are orthogonal. Using proof by (transfinite) induction we now show that

$\begin{eqnarray*}&&\mathcal{D}\left( P_{k}^{\prime } \right)=\tilde{P}_{k}^{\prime }\quad \forall i=k,...,M\;.\end{eqnarray*}$

The idea of the induction is the following: To show that indeed the projectors onto the eigenspaces of ρ, $P_{i}^{\prime }$ , are mapped into projectors onto the eigenspaces of $\mathcal{D}\left( \rho \right)$ with the same eigenvalue, we start with the projector onto the eigenspace with the largest eigenvalue and then inductively proceed to increasingly smaller eigenvalues. Furthermore, to prevent having to deal with a possible smallest eigenvalue of 0, we treat the lowest eigenvalue case separately. Calling the induction variable k, we have to show that $\mathcal{D}\left( P_{k}^{\prime } \right)=\tilde{P}_{k}^{\prime }$ follows from the assumption $\mathcal{D}\left( P_{i}^{\prime } \right)=\tilde{P}_{i}^{\prime }\forall i<k$ . Note that if k = M, i.e., for the smallest eigenvalue,

$\begin{eqnarray}&&\mathop{\mathop{\sum }\nolimits}\limits_{i}\mathcal{D}\left( P_{i}^{\prime } \right)=\mathcal{D}\left( \mathop{\mathop{\sum }\nolimits}\limits_{i}P_{i}^{\prime } \right)=\mathcal{D}\left(\mathbb{1} \right)=\mathbb{1},\end{eqnarray} \tag{ A4 }$

since, by definition, a unital dynamical map maps identity onto itself. So assume $k\ne M$ . Then ${{\lambda }_{k}}>0$ since it is not yet the smallest eigenvalue because each $\lambda k$ corresponds by construction to a different eigenspace, hence they are different, and we assumed them to be ordered. For k = p, we can rewrite equation (A3), multiplying by an arbitrary normalized eigenvector ${{\vec{x}}_{k}}\in {{\mathbb{C}}^{N}}$ of $\tilde{P}_{k}^{\prime }$ from the left and right,

$\begin{eqnarray}&&\mathop{\mathop{\sum }\nolimits}\limits_{i}{{\lambda }_{i}}{{\vec{x}}_{k}}\cdot \mathcal{D}\left( P_{i}^{\prime } \right)\cdot {{\vec{x}}_{k}}={{\lambda }_{k}}.\end{eqnarray} \tag{ A5 }$

By assumption of the induction, $\mathcal{D}\left( P_{i}^{\prime } \right)=\tilde{P}_{i}^{\prime }\forall i<k$ , therefore

$\begin{eqnarray*}&&{{\vec{x}}_{k}}\cdot \mathcal{D}\left( P_{i}^{\prime } \right)\cdot {{\vec{x}}_{k}}=0\quad \forall i<k.\end{eqnarray*}$

Introducing $d_{kk}^{\left( i \right)}\equiv {{\vec{x}}_{k}}\cdot \mathcal{D}\left( P_{i}^{\prime } \right)\cdot {{\vec{x}}_{k}}$ , equation (A5) can be written as

$\begin{eqnarray}&&\mathop{\mathop{\sum }\nolimits}\limits_{i\geqslant k}{{\lambda }_{i}}d_{kk}^{\left( i \right)}={{\lambda }_{k}}.\end{eqnarray} \tag{ A6 }$

Due to equation (A4) and the assumption of the induction,

$\begin{eqnarray*}&&\mathop{\mathop{\sum }\nolimits}\limits_{i}d_{kk}^{\left( i \right)}=\mathop{\mathop{\sum }\nolimits}\limits_{i\geqslant k}d_{kk}^{\left( i \right)}=\mathop{\mathop{\sum }\nolimits}\limits_{i}{{\vec{x}}_{k}}\cdot \mathcal{D}\left( P_{i}^{\prime } \right)\cdot {{\vec{x}}_{k}}=1,\end{eqnarray*}$

and, since $\mathcal{D}\left( P_{i}^{\prime } \right)$ is the image of a positive semidefinite matrix which has to be positive semidefinite itself,

$\begin{eqnarray*}&&d_{kk}^{\left( i \right)}={{\vec{x}}_{k}}\cdot \mathcal{D}\left( P_{i}^{\prime } \right)\cdot {{\vec{x}}_{k}}\geqslant 0\ \forall i.\end{eqnarray*}$

Now remember that ${{\lambda }_{k}}\ne 0$ is strictly larger than all the other ${{\lambda }_{i}}$ with $i>k$ since the eigenvalues are assumed to be ordered. In addition, $d_{kk}^{\left( i \right)}\geqslant 0\ \forall i$ and at least one $d_{kk}^{\left( i \right)}$ with $i\geqslant k$ must be nonzero, otherwise the $d_{kk}^{\left( i \right)}$ would not sum up to 1. Then

$\begin{eqnarray*}&&\mathop{\mathop{\sum }\nolimits}\limits_{i\geqslant k}{{\lambda }_{i}}d_{kk}^{\left( i \right)}\leqslant {{\lambda }_{k}}\mathop{\mathop{\sum }\nolimits}\limits_{i\geqslant k}d_{kk}^{\left( i \right)}={{\lambda }_{k}},\end{eqnarray*}$

with equality if and only if $d_{kk}^{\left( i \right)}=0$ for $i\ne k$ . In fact, equality has to hold since otherwise we would contradict equation (A6). We conclude that

$\begin{eqnarray*}&&d_{kk}^{\left( i \right)}={{\vec{x}}_{k}}\cdot \mathcal{D}\left( P_{i}^{\prime } \right)\cdot {{\vec{x}}_{k}}={{\delta }_{ik}}.\end{eqnarray*}$

Since ${{\vec{x}}_{k}}$ is normalized and arbitrary as long as it lies in the eigenspace ${{\tilde{\mathcal{E}}}_{k}}$ of $\tilde{P}_{k}^{\prime }$ , ${{\vec{x}}_{k}}$ must be an eigenvector of $\mathcal{D}\left( P_{k}^{\prime } \right)$ with eigenvalue 1. Consequently, the operator $\mathcal{D}\left( P_{k}^{\prime } \right)$ maps the eigenspace of $\tilde{P}_{k}^{\prime }$ onto itself. Now we are almost done with showing that $\mathcal{D}\left( P_{k}^{\prime } \right)$ and $\tilde{P}_{k}^{\prime }$ are indeed identical. Since ${{\tilde{\mathcal{E}}}_{k}}$ is mapped by $\mathcal{D}$ into itself, $\mathcal{D}\left( P_{k}^{\prime } \right)$ has at least ${\rm dim}\left( {{\tilde{\mathcal{E}}}_{k}} \right)$ eigenvalues equal to 1. The fact that $\mathcal{D}\left( P_{k}^{\prime } \right)$ has exactly ${\rm dim}\left( {{\tilde{\mathcal{E}}}_{k}} \right)$ eigenvalues equal to 1 follows from $\mathcal{D}$ being trace-preserving: ${\rm{Tr}}\left[ \mathcal{D}\left( P_{k}^{\prime } \right) \right]={\rm{Tr}}\left[ P_{k}^{\prime } \right]={\rm dim}\left( {{\mathcal{E}}_{k}} \right)$ and ${\rm dim}\left( {{\mathcal{E}}_{k}} \right)={\rm dim}\left( {{\tilde{\mathcal{E}}}_{k}} \right)$ , where ${\rm{Tr}}\left[ \mathcal{D}\left( P_{k}^{\prime } \right) \right]$ is the sum over the eigenvalues of $\mathcal{D}\left( P_{k}^{\prime } \right)$ . Since all eigenvalues of $\mathcal{D}\left( P_{k}^{\prime } \right)$ are non-negative, all other eigenvalues must vanish. Hence $\mathcal{D}\left( P_{k}^{\prime } \right)=\tilde{P}_{k}^{\prime }$ . This completes the induction and concludes the first step of the $\Longleftarrow$ proof, i.e., we have shown that a unital dynamical map that leaves the spectrum of a given arbitrary density matrix invariant, maps projectors onto the eigenspaces of this density matrix onto projectors of the same rank. This is specifically true for one-dimensional projectors. Iterating the argument for all density matrices in the complete set and selecting a set $\mathcal{P}$ of N orthogonal, one-dimensional projectors, it follows that these projectors will be mapped by $\mathcal{D}$ onto another set of one-dimensional projectors.

In the second step of the $\Longleftarrow$ proof we still need to show that the mapped set is also orthogonal. We denote the complete set of projectors by $\left\{ {{P}_{i}} \right\}$ . From the first step of the $\Longleftarrow$ proof we know that the ${{\tilde{P}}_{i}}$ ,

$\begin{eqnarray*}&&\mathcal{D}\left( {{P}_{i}} \right)={{\tilde{P}}_{i}},\end{eqnarray*}$

need to be one-dimensional projectors. Using the unitality of $\mathcal{D}$ , we see that

$\begin{eqnarray*}&&\mathbb{1}=\mathcal{D}\left(\mathbb{1} \right)=\mathcal{D}\left( \mathop{\mathop{\sum }\nolimits}\limits_{i}{{P}_{i}} \right)=\mathop{\mathop{\sum }\nolimits}\limits_{i}\mathcal{D}\left( {{P}_{i}} \right)=\mathop{\mathop{\sum }\nolimits}\limits_{i}{{\tilde{P}}_{i}}.\end{eqnarray*}$

The unit matrix can only be summed by N one-dimensional projectors if these are orthogonal. Hence we have accomplished the second step, and the lemma follows.

Proof of Theorem 1: The equivalence relation of statement $\left( 1 \right)\Longleftrightarrow \left( 2 \right)$ has already been proven in [29]. To complete the proof of the more general Theorem 1, we are left with proving $\left( 2 \right)\Longleftrightarrow \left( 3 \right)$ .

$\left( 2 \right)\left( 3 \right)$ : If $\mathcal{D}$ maps a set of N one-dimensional orthogonal projectors onto another set of N one-dimensional orthogonal projectors, it leaves the spectrum of the projectors invariant. This can be seen as follows. Projectors are idempotent and positive semidefinite, hence their spectrum can only consist of zeros and ones. Since the projector is one-dimensional, its image under $\mathcal{D}$ has to be one-dimensional, too, and there can only be one eigenvalue equal to one. Thus any one-dimensional projector has the spectrum $\left\{ 1,0,0,\ldots \right\}$ which must be invariant under a mapping between one-dimensional orthogonal projectors. We now use the linearity of dynamical maps to show that $\mathcal{D}$ must be unital. Specifically, let $\left\{ {{P}_{i}} \right\}$ be the initial set of orthogonal projectors that is mapped to another set of orthogonal projectors, $\left\{ {{{\bar{P}}}_{i}} \right\}$ . We find for the image of the totally mixed state, ${{\rho }_{M}}=\frac{1}{N}\mathbb{1}$ ,

$\begin{eqnarray*}\begin{array}{ccccccccccccccc} \mathcal{D}\left( {{\rho }_{M}} \right) & = & \mathcal{D}\left( \frac{1}{N}\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{N}{{P}_{i}} \right)=\frac{1}{N}\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{N}\mathcal{D}\left( {{P}_{i}} \right) \\ {} & = & \frac{1}{N}\mathop{\mathop{\sum }\nolimits}\limits_{i=1}^{N}{{{\bar{P}}}_{i}}={{\rho }_{M}}, \\ \end{array}\end{eqnarray*}$

i.e., $\mathcal{D}$ maps identity onto itself, making it unital. We can thus use Lemma 1 to obtain that the spectrum of a complete set of density matrices is invariant under $\mathcal{D}$ . We now just have to add ${{\rho }_{TR}}={{P}_{TR}}$ to realize a complete and totally rotating set. The spectrum of ${{\rho }_{TR}}={{P}_{TR}}$ is also invariant under $\mathcal{D}$ since it is a one-dimensional projector that is mapped onto another one-dimensional projector.

$\left( 3 \right)\left( 2 \right)$ : From Lemma 1, we obtain that $\mathcal{D}$ maps a set of N one-dimensional, orthogonal projectors onto another set of N one-dimensional orthogonal projectors. We are thus only left with showing that $\mathcal{D}$ maps a totally rotated projector onto a one-dimensional projector: There always exists a density matrix with a one-dimensional eigenspace corresponding to a totally rotated projector ${{P}_{TR}}$ whose spectrum is invariant under the action of $\mathcal{D}$ . In the proof of Lemma 1, we have shown that a dynamical map that leaves the spectrum of projectors invariant maps these projectors onto projectors of the same rank. Repeating the steps of the proof of Lemma 1, we see that the image of ${{P}_{TR}}$ has to be a one-dimensional projector. This completes the proof of Theorem 1.

Optimal control theory for a unitary operation under dissipative evolution

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Optimal control theory for a unitary operation under dissipative evolution

2.1. Optimization functional

2.2. Optimization algorithm

3. Example I: Diagonal gates

4. Example II: Nondiagonal gates

5. Conclusions

Acknowledgments

Appendix A.: Three states are sufficient to assess whether a desired target unitary is implemented

A.1. Construction of the functional

A.2. Proof

Footnotes

Optimal control theory for a unitary operation under dissipative evolution

Article metrics

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Optimal control theory for a unitary operation under dissipative evolution

2.1. Optimization functional

2.2. Optimization algorithm

3. Example I: Diagonal gates

4. Example II: Nondiagonal gates

5. Conclusions

Acknowledgments

Appendix A.: Three states are sufficient to assess whether a desired target unitary is implemented

A.1. Construction of the functional

A.2. Proof

Footnotes