Semi-definite programming and quantum information

Piotr Mironowicz

doi:10.1088/1751-8121/ad2b85

List of abbreviations

DPS	Doherty–Parillo–Spedalieri
IPM	interior point method
LP	linear programming
LMI	linear matrix inequalities
MLP	Mironowicz–Li–Pawłowski
NPA	Navascués–Pironio–Acín
NV	Navascués–Vertesi
PD	positive definite
POVM	positive operator-valued measure
PPT	positive partial transpose
PSD	positive semi-definite
QSD	quantum state discrimination
SDP	semi-definite programming
SoS	sum of squares

1. Introduction

Optimization, in its various forms, has been a cornerstone of scientific and technological advancements across numerous disciplines. From engineering and economics to machine learning and operations research, optimization techniques have played a crucial role in solving complex problems and driving innovation. Over the years, different variants of optimization methods have emerged, each tailored to address specific problem structures and objectives [15, 24, 36, 40, 103, 232]. In recent decades, SDP has emerged as a powerful variant of convex optimization, offering a versatile framework for solving optimization problems involving PSD matrices. SDP has found applications in diverse fields, including control theory, signal processing, combinatorial optimization, and quantum information theory [41, 116, 233, 303]. Particularly, in the field of quantum information, SDP has proven to be an indispensable tool for characterizing and manipulating quantum correlations and probabilities [28, 326]. Quantum information theory deals with the fundamental principles governing the representation, transmission, and processing of information in quantum systems. It explores the unique properties of quantum mechanics to develop new paradigms for computation, communication, and cryptography. Quantum correlations, such as entanglement, and the manipulation of probabilities in quantum systems are essential components in designing quantum algorithms and protocols.

This paper aims to provide a comprehensive study of SDP in the context of quantum information. The outline of the paper is as follows. We present first a mathematical framework for convex optimization, covering the necessary preliminaries and notation, and provide a software overview useful for implementing techniques discussed in this work. To aid researchers in the practical implementation of SDP, the paper provides an overview of software tools, solvers, and modeling techniques in section 1.3. It discusses the different solvers available for solving SDP problems, as well as the modeling tools used to formulate and represent optimization problems. Section 1.4 contains a brief overview of the topic of probability distributions occurring in quantum mechanics and other important problems of the theory which can be effectively treated with SDP. The terms introduced there will be used in section 4.

The further discussion in section 2 encompasses sets, spaces, cones, and functions, including important concepts like Fenchel conjugate and subgradient. Duality, a fundamental aspect of optimization, is explored extensively, shedding light on its role in problem formulations and solution methods. The theory of SDP is a focal point of this paper, as it enables the optimization of PSD matrices, which are fundamental objects in quantum information theory. Next, in section 3, the work delves into the definition and characterization of positive semi-definiteness, presenting various formulations of SDP problems, such as the canonical form, the Vandenberghe and Boyd form, the so-called SDP algorithm (SDPA) form, and the Watrous symmetric form. It also discusses the duality of SDP, the treatment of complex variables in semidefinite problems imposing equality and inequality constraints, and the topic of the Schur complement and submatrices. Next, we concentrate on implementing SDPs, to provide a general understanding of the involved numerical methods. The paper also explores how solvers employ IPMs, highlighting their internal mechanisms, such as predictor–corrector methods, warm start strategies, and exploitation of the problem structure.

In the following section 4, the paper introduces basic tools and techniques in SDP that are specifically relevant to quantum information. These tools include semidefinite representations, separability criteria, Choi–Jamiołkowski isomorphism (state-channel duality), the sum of squares decomposition, and Lovász theta. Understanding and utilizing these tools are crucial for solving optimization problems involving quantum states, correlations, and probabilities. A significant portion of the paper focuses on the application of moment matrices in quantum information. Moment matrices play a vital role in capturing the correlations present in quantum systems. The paper explores correlation matrices and moment matrices, their mathematical properties, and their significance in optimization problems involving non-commuting variables. Hierarchical methods, such as the NPA hierarchy, the so-called MLP hierarchy, and the NV hierarchy, are discussed in detail for optimizing probability distributions without or with dimension constraints. Additionally, the SWAP method for self-testing in quantum information is presented.

1.1. Historical notes on optimization

The historical development of LP, can be traced back to the times of a critical need for optimal resource management during World War II. Soon after, in 1947, George Dantzig introduced the simplex method [80], marking a significant milestone in this field. The simplex method operates by starting at a vertex of a convex polytope representing feasible solutions and gradually moving toward its extreme point. It is important to note that although the simplex method algorithm exhibits exponential worst-case complexity, it has demonstrated remarkable efficiency in practical problem-solving scenarios. To understand the underlying reason for the high complexity of the simplex method, it is necessary to examine the specific instances where the algorithm visits every vertex of the feasible region, leading to exponential worst-case complexity. The Klee–Minty problem, formulated in 1970 [170], serves as an illustrative example of such instances. During the 1960s and 1970s, there was an increasing recognition of the significance of computational complexity, fueling the pursuit of efficient algorithms with polynomial time complexity.

In the 1960s, in the realm of nonlinear programming, it became a common practice to transform constrained problems into unconstrained ones through the utilization of the so-called barrier methods [103]. By introducing a specialized barrier function, it became possible to delineate a trajectory within the space of optimization variables known as the central path, which could be traversed using the well-established Newton's method. However, the prevalence of barrier methods experienced a temporary decline in the 1970s. Meanwhile, in 1979, Khachian introduced the ellipsoid method, the first algorithm for LP with polynomial worst-case complexity [168]. Surprisingly, despite its favorable theoretical complexity, the ellipsoid method proved to be exceedingly slow when applied to most practical problems. Consequently, before 1984, two primary methods for LP existed:

The simplex method, which possessed exponential worst-case complexity but demonstrated practical efficiency.
The ellipsoid method, exhibited polynomial complexity but was notably inefficient in practice.

The field of optimization has undergone a significant transformation with the introduction of IPMs. Before 1984, IPMs did not hold a prominent position until Karmarkar's groundbreaking paper, a new polynomial-time algorithm for LP [165], was published. Notably, it was demonstrated that IPMs were no less efficient than the simplex method for solving practical LP problems. This revelation of IPM's potential sparked what is often referred to as a revolution in optimization [334], leading IPMs to be recognized as one of the most significant algorithms of all time. Before 1984, there existed only minimal connections between LP and nonlinear programming. However, it was soon discovered [118] that the IPM was equivalent to a logarithmic barrier method applied to LP. This equivalence enabled the development of a unified framework, based on barrier function methods, for analyzing both linear and nonlinear problems [234].

The next significant advancement in the field of IPM came with the independent work of Alizadeh [8], Nesterov, and Nemirovskii [233, 234] in the late 1980s. They expanded the applicability of IPM to various convex optimization problems. Nesterov and Nemirovskii discovered that the key to utilizing IPM for convex problems lies in knowing a specific barrier function known as a self-concordant barrier [230]. For practical implementation, it is essential that the first and second derivatives of the barrier function can be computed easily. Vandenberghe and Boyd [314] utilized the theory developed by Nesterov and Nemirovskii to apply the LP method given by Gonzaga and Todd [126] to SDPs. A self-concordant barrier refers to a smooth convex function defined within the interior of a given set. It diverges toward infinity as it approaches the boundary and, along with its derivatives, satisfies certain Lipschitz continuity conditions. Nesterov and Nemirovskii demonstrated that IPM can be applied to any set where such a barrier function can be formulated. Fortunately, a relatively computationally tractable self-concordant barrier is known for SDPs, viz. $F(X) \equiv -\ln \det X$ . For a comprehensive historical overview of the development and significance of SDP, detailed information can be found in several notable references such as [105, 125, 303, 334]. These works provide an in-depth exploration of SDPs, shedding light on their emergence as a powerful tool in modern optimization. The key property of SDP problems is the fact that they may be efficiently solved numerically using IPM, as sketched further in section 3.8, and at the same time, they can express or approximate a tremendous range of scientific and engineering problems.

1.2. Preliminaries and notation

We now briefly specify the notation used in this work. In some places, the notation used is overloaded, with the same symbols having different meanings. The reason is that the paper covers a variety of different fairy-specialized topics. We decided to keep the established notation characteristic for each of the specializations. We made an effort to ensure that this does not lead to any ambiguity.

1.2.1. Spaces and sets notation.

In this work, we denote by $\mathbb{N}$ the set of natural numbers (including 0), $\mathbb{N}_{+}$ is the set of natural numbers (excluding 0), $\mathbb{R}$ is the set of real number, and $\mathbb{C}$ is the set of complex numbers. The sets of vectors composed of real numbers, non-negative real numbers, and complex numbers, with k elements, are denoted by $\mathbb{R}^k$ , $\mathbb{R}_{+}^k$ , and $\mathbb{C}^k$ , respectively. The set of real k × l matrices is denoted by $\mathbb{R}^{k \times l}$ and the set of real n × n symmetric matrices by $\mathbb{S}^n$ . The set of real n × n symmetric matrices that are PSD or PD (see section 3.1 for the definitions) we denote with $\mathbb{S}^n_+$ or $\mathbb{S}^n_{++}$ , $\mathbb{S}^n_{++} \subset \mathbb{S}^n_+ \subset \mathbb{S}^n$ . Similarly, the set of complex k × l matrices is denoted by $\mathbb{C}^{k \times l}$ , the set of Hermitian n × n matrices by $\mathbb{H}^n$ , and its subset of PSD or PD matrices by $\mathbb{H}^n_+$ or $\mathbb{H}^n_{++}$ , $\mathbb{H}^{n}_{++} \subset \mathbb{H}^{n}_+ \subset \mathbb{H}^{n}$ . We refer to the pair of values k and l (for matrices of arbitrary size) or n (for square matrices) as the size of the matrix. For $n \in \mathbb{N}_{+}$ we denote $[n] \equiv \{1, \ldots, n\}$ . The relation $\succeq$ denotes the so-called Löwner's partial order of PSD matrices [181, 197]. For two symmetric or Hermitian matrices A and B, we have $A \succeq B$ when A − B is PSD.

Banach spaces, i.e. vector spaces which are complete with respect to a given norm $\left| \left| \cdot \, \right| \right|$ , are denoted with letters $X, Y, \ldots$ . Their continuous dual spaces, as defined below, are denoted with starred letters, $X^{*}, Y^{*}, \ldots$ . The real and complex Hilbert spaces, i.e. vector spaces with an inner (scalar) product $\langle \cdot | \cdot \rangle$ that are Banach spaces with the norm induced by the inner product, are denoted with calligraphic letters, $\mathcal{X}, \mathcal{Y}, \mathcal{Z}, \ldots$ . Their continuous dual spaces are denoted by $\mathcal{X}^{*}, \mathcal{Y}^{*}, \mathcal{Z}^{*}, \ldots$ . Usually, we consider vector spaces of real or complex matrices with the Frobenius product, defined below in (3), as the inner product; thus the spaces $\mathbb{R}^{k \times 1}$ and $\mathbb{C}^{k \times 1}$ are the ordinary k dimensional real or complex Euclidean spaces, i.e. finite-dimensional Hilbert spaces.

In some cases we use finite sets of symbols as indices of vectors or matrices; this will be particularly useful in the context of moment matrices, see section 4.6. For a set of symbols Σ we use the standard convention of set theory to denote $\mathbb{C}^\Sigma$ ( $\mathbb{R}^\Sigma$ ) the set of all functions from Σ to $\mathbb{C}$ ( $\mathbb{R}$ ). Since there exists a natural isomorphism between $\mathbb{C}^\Sigma$ and $\mathbb{C}^{\left| \Sigma \, \right|}$ ( $\mathbb{R}^\Sigma$ and $\mathbb{R}^{\left| \Sigma \, \right|}$ ), all operations defined for the latter can be easily mapped to relevant operations on the former, with an arbitrary ordering of the symbols in Σ, that will be treated here as implicit. For a metric space (X, d) we denote by $\textrm{B}_X(x,r)$ the closed ball centered at $x \in X$ with radius r; $\textrm{B}_X \equiv \textrm{B}_X(0,1)$ is the unit closed ball.

Consider a Banach space X over the field F. For a linear functional $x^{*}: X \rightarrow F$ we define its norm as $\left| \left| x^{*} \, \right| \right| \equiv \sup_{x \in X: \left| \left| X \, \right| \right| \unicode{x2A7D} 1} \left| x^{*}(x) \, \right|$ . We define $X^{*}$ as the continuous dual space, or the topological dual space, or simply the dual space, i.e. the space of all linear continuous functionals on X with this norm. The weak topology of X is the weakest topology in X in that all elements of $X^{*}$ are continuous. For $X^{*}$ we consider also the weak^* topology, defined as the weakest topology on $X^{*}$ for that every element $x \in X$ corresponds to a continuous functional on $X^{*}$ . We denote the bidual spaces $X^{**} \equiv (X^{*})^{*}$ . Spaces for that $X = X^{**}$ are called reflexive. It can be seen that every finite-dimensional normed space is reflexive.

The action of a conjugate element $x^{*}$ on an element x, i.e. $x^{*}(x)$ , is further denoted as $\langle x^{*} , x \rangle$ to make the linearity explicit. This is to be contrasted with the inner (scalar) product $\langle \cdot | \cdot \rangle$ for Hilbert spaces. The Riesz–Fréchet representation theorem [288, p 182][138, p 31] states that every linear continuous functional $x^{*}$ on a Hilbert space $\mathcal{X}$ can be represented by the inner product with a certain unique element x of $\mathcal{X}$ , in the sense that ${\Large{\forall}}_{x^{\prime} \in \mathcal{H}} \langle x^{*} , x^{\prime} \rangle = \langle x | x^{\prime} \rangle$ .

To provide the most explicit formulations, we usually denote the elements of a conjugate space with the star symbol ${}^*$ , e.g. $x^{*} \in \mathcal{X}^{*}$ . The symbol is barely a notation suggesting an element of a conjugate space and has no algebraic meaning. Similarly, we optionally (with no special mathematical meaning) denote with · (dot) a matrix multiplication in places where it allows us to avoid ambiguity, especially to stress the presence of a scalar product of two vectors. Due to the specificity of our topic closely mixing the explicit numerical representation of operators as arrays of (numerical) real values with the abstract complex Hilbert formalism of quantum mechanics, we decided to use both this 'dot' notation of the scalar product, and the bra-ket notation, with the latter used in cases not directly related to the computer implementations.

We denote by $\mathsf{L}\left[ \mathcal{X}, \mathcal{Y} \right]$ the set of all linear operators from the Hilbert space $\mathcal{X}$ to the Hilbert space $\mathcal{Y}$ . For $\mathcal{X} = \mathbb{C}^n$ and $\mathcal{Y} = \mathbb{C}^m$ this set is isomorphic to the set of complex m × n matrices, $\mathbb{C}^{m \times n}$ . We use the latter whenever our considerations are directly related to computer implementations. For $A \in \mathsf{L}\left[ \mathcal{X}, \mathcal{Y} \right]$ we define the adjoint operator, $A^{\dagger} \in \mathsf{L}\left[ \mathcal{Y}, \mathcal{X} \right]$ as the unique operator satisfying the scalar product relation

$\begin{align} y^{\dagger} \cdot A x = \left( A^{\dagger} y \right)^{\dagger} \cdot x, \end{align} \tag{ 1 }$

or, in the bra-ket notation, $\langle y | A x \rangle = \langle A^{\dagger} y | x \rangle$ , for all $x \in \mathcal{X}$ and $y \in \mathcal{Y}$ . We use the abbreviation $\mathsf{L}\left[ \mathcal{X} \right] \equiv \mathsf{L}\left[ \mathcal{X}, \mathcal{X} \right]$ . For real matrix space, the adjoint is the transposition denoted by $^\mathrm{T}$ ; and for complex matrices, it is the Hermitian conjugate denoted by $^{\dagger}$ . Thus we denote by $x^{\dagger} \in \mathcal{X}^{*}$ the unique linear operator $x^{\dagger}: \mathcal{X} \rightarrow F : x^{\prime} \mapsto \langle x | x^{\prime} \rangle$ , where $F = \mathbb{C}$ for complex Hilbert spaces; or by $^\mathrm{T}$ with $F = \mathbb{R}$ for real Hilbert spaces. The symbol of $^{\dagger}$ used for real spaces is equivalent to $^\mathrm{T}$ . We define the set of all Hermitian operators acting on a complex Euclidean vector space $\mathcal{X}$ as $\textrm{Herm}\left[\mathcal{X}\right] \equiv \left\{X \in \mathsf{L}\left[ \mathcal{X} \right] : X^{\dagger} = X \right\}$ .

1.2.2. Matrix conventions and the Frobenius product.

In this work, we use the MATLAB notation for elements of matrices. We recall a few examples. In this notation the expression $M_{k,l:m}$ means a submatrix consisting of the elements of the matrix M within the row k and with column indices in $\{l, l+1, \ldots, m\}$ . The expression $M_{:,k}$ means the vector consisting of the kth column of the matrix M. $M_{k,l}$ refers to the element in kth row and lth column. For a vector v its kth element is v_k . Vectors are represented by one-column matrices. Matrix elements are numbered from 1. The identity matrix of size d by d is denoted by ${\unicode{x1D7D9}}_{d}$ ; in some cases, instead of the size we specify a space $\mathcal{X}$ and then ${\unicode{x1D7D9}}_{\mathcal{X}}$ denotes the identity operator on $\mathcal{X}$ . The zero operator and zero matrix for all spaces are denoted with 0. The Kronecker delta is denoted by $\delta_{i,j}$ and is equal 1 for i = j and 0 otherwise. Trace operation is denoted as ${\textrm{Tr}}[\cdot]$ , and partial trace by ${\textrm{Tr}}_{\{s_i\}_i}[\cdot]$ , where $\{s_i\}_i$ enumerates the subsystems which are traced out. Similarly, the partial transposition of subsystems $\{s_i\}_i$ is denoted as $^{\mathrm{T}_{\{s_i\}_i}}$ .

The function $\textrm{vec}(\cdot)$ defines a vector containing the elements of the given matrix in column-wise order. $\textrm{Mat}(\cdot)$ is the inverse of this function. For example, we have

$\begin{align} \textrm{vec} \left( \begin{bmatrix} a & c \\ b & d \end{bmatrix} \right) = \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix}. \end{align} \tag{ 2 }$

We also use the following standard convention in which upper-case letters denote matrices, and lower-case letters denote vectors of elements of the matrices, e.g. $x = \textrm{vec}(X) \in \mathbb{R}^{n^2}$ and $X = \textrm{Mat}(x) \in \mathbb{R}^{n \times n}$ . For two matrices $A, B \in \mathbb{R}^{m \times n}$ we define the relation $A \unicode{x2A7D} B$ to hold if and only if $\forall_{i = 1, \ldots, m} \forall_{j = 1, \ldots, n} A_{i,j} \unicode{x2A7D} B_{i,j}$ . We define relations A < B, $A \unicode{x2A7E} B$ and A > B in an analogous way. $\textrm{Diag}[(d_i)_{i \in [n]}]$ is an n by n diagonal matrix with diagonal entries d_i .

The Frobenius product of two complex (or real) matrices, $A, B \in \mathbb{C}^{k \times l}$ (or $A, B \in \mathbb{R}^{k \times l}$ ) is defined as ${\textrm{Tr}}(A^{\dagger} B)$ (or ${\textrm{Tr}}(A^\mathrm{T} B)$ ). We follow the convention common in the literature close to implementation issues and usually denote the Frobenius product as $A \bullet B$ [10, 111, 169, 216, 293, 305, 307, 319]. It can be easily shown that

$\begin{align} A \bullet B \equiv {\textrm{Tr}}\left(A^{\dagger} B\right) = {\textrm{Tr}}\left(A B^\mathrm{T}\right) = \sum_{i = 1, \ldots, k} \sum_{j = 1, \ldots, l} A_{i,j}^{*} B_{i,j} = \textrm{vec}\left(A\right)^{\dagger} \cdot \textrm{vec }{B}, \end{align} \tag{ 3 }$

and similarly for real matrices. Thus, for real matrices, the Frobenius product is the sum of the elements of the element-wise product of entries of two matrices. One can also show that for real symmetric A and real antisymmetric B we have ${\textrm{Tr}}(A^\mathrm{T} B) = {\textrm{Tr}}(A B) = 0$ . For real symmetric A we have $A \bullet B = {\textrm{Tr}}(A^{\dagger} B) = {\textrm{Tr}}(A^\mathrm{T} B) = {\textrm{Tr}}(AB)$ . The Frobenius product induces a Frobenius norm of a matrix, $\left| \left| \cdot \, \right| \right|_\mathrm{F}$ defined as $\left| \left| A \, \right| \right|_\mathrm{F} = \sqrt{{\textrm{Tr}}(A^{\dagger} A)}$ . This norm is called also a Hilbert–Schmidt norm. The Frobenius product is a direct generalization of the vector Euclidean product, as it can be seen from (3), and the $\mathbb{C}^{k \times l}$ ( $\mathbb{R}^{k \times l}$ ) with Frobenius inner product is isomorphic with the Euclidean space $\mathbb{C}^{kl}$ ( $\mathbb{R}^{kl}$ ). For $A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}$ and $B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix}$ , where $A_{11}, B_{11} \in \mathbb{C}^{n_1 \times n_1}$ , $A_{12}, B_{12} \in \mathbb{C}^{n_1 \times n_2}$ , $A_{21}, B_{21} \in \mathbb{C}^{n_2 \times n_1}$ , and $A_{22}, B_{22} \in \mathbb{C}^{n_2 \times n_2}$ , for some n₁ and n₂, we have

$\begin{align} {\textrm{Tr}}\left(A^{\dagger} B\right) = {\textrm{Tr}}\left(A_{11}^{\dagger} B_{11}\right) + {\textrm{Tr}}\left(A_{12}^{\dagger} B_{12}\right) + {\textrm{Tr}}\left(A_{21}^{\dagger} B_{21}\right) + {\textrm{Tr}}\left(A_{22}^{\dagger} B_{22}\right). \end{align} \tag{ 4 }$

This equality easily generalizes to matrices $A = (A_{r,c})_{r,c}$ and $B = (B_{r,c})_{r,c}$ divided into arbitrary number of blocks, viz. $A \bullet B = \sum_{r,c} \left( A_{r,c} \bullet B_{r,c} \right)$ .

1.3. Software overview, usage, and implementation

The basic tool used to find solutions to SDP is SDP solvers. Modeling languages are useful supporting software aiding in formulating SDPs to be passed to a solver. We refer readers to [214] for a comprehensive overview. From the experience of the author, most of the analysis involving SDPs in quantum information is conducted using either Python or Matlab language. The latter language has two major implementations, viz. the software MATLAB from MathWorks [2] and its open-source alternative OCTAVE [90].

A standard choice for an SDP solver among the NPA community (see section 4.6) using Matlab seems to be the SeDuMi solver [291, 292] created by J F Sturm, currently developed and maintained by Imre Pólik and Oleksandr Romanko under the direction of Tamás Terlaky [290]. This solver implements self-dual embedding IPM [66] and was used for instance in [226–228] and other works implementing NPA. Another SDP solver of particular interest in Matlab is SDPT3 solver [310, 312] implemented by Toh, Todd, and Tütüncü. It uses infeasible primal-dual IPM with so-called Nesterov-Todd (NT) and Helmberg-Kojima-Monteiro (HKM) search directions (see section 3.8 for the definition of the search directions). Other examples of SDP solvers include C library for semidefinite programming (CSDP) [35] by Borchers, DSDP [29], and SDPA [112]. The mentioned solvers are freely available, in most cases in open-source form. A very efficient commercial SDP solver is Mosek [1], possible to be used in Python and MATLAB, with a free license for academia. The solver particularly relevant for large problems is semidefinite programming Newton-CG augmented Lagrangian method (SDPNAL) [294, 338] that is implementing a Newton-conjugate gradient (CG) augmented Lagrangian method for SDP [346].

A popular family of solvers is the mentioned SDPA solvers by Fujisawa et al [112, 113, 222, 336, 337]. The SDPA solvers are using the so-called Vandenberghe and Boyd form, or the SDPA form, of SDPs, see section 3.2.2. The variants cover Matlab interface (SDPA-M), parallel implementation for large SDPs Semidefinite programming algorithm parallel version (SDPARA), higher precision arithmetics GNU multiple precision arithmetic library (SDPA-GMP), quad-double library (SDPA-QD) and double-double (SDPA-DD) structural sparsity Semidefinite programming algorithm with the positive definite matrix Completion (SDPA-C), see [335] of an overview. Semidefinite programming algorithm (SDPA) solver implements primal-dual IPM with Mehrotra type predictor–corrector, see section 3.9. When deciding solver to use, a performance benchmark should be consulted [210].

Plenty of papers [6, 19, 31, 46, 50, 63–65, 104, 191, 257, 286] uses the Python package NCPOL2SPDA [332] by Peter Wittek, currently under maintenance by Peter J Brown [333]. NCPOL2SDPA implements a framework for global polynomial optimization problems with SDP relaxations. The functionality of particular interest covers the NPA [226, 227, 254] hierarchy for non-commuting operators; Lasserre's hierarchy for commutative polynomials [184]; the more randomness from the same data technique [22, 239]; a hierarchy for bilevel polynomial optimization problem [159]; the Moroder's hierarchy [221]; and a hierarchy of sufficient conditions for the steerability of bipartite quantum states [177]. Note that NCPOL2SDPA is not an SDP solver but a modeling toolbox, used to reduce the human effort when formulating SDPs, and it requires a solver to be included separately. Other popular modeling tools for Python are python interface to conic optimization solvers (PICOS) [275] and CVX Python (CVXPY) [5, 84].

Popular modeling toolboxes to be used with Matlab language are yet another linear matrix inequalities processor (YALMIP) [193] and CVX [127, 128, 132]. They allow the use of various solvers, including SeDuMi, SDPT3, and Mosek. YALMIP can be supported with a package QDimSum (symmetric SDP relaxations for qudits systems) [272] that implements the hierarchy [228] using the symmetrization methods [7, 302] to enhance the performance. In appendix B.1 a sample simple execution is given with YALMIP.

An alternative to the mentioned modeling tools for Matlab and Python is the JuMP package ('Julia for Mathematical Programming') [89], which is an open-source modeling language integrated within the Julia programming environment [32]. It allows users to express a wide range of optimization problems, including linear, mixed-integer, quadratic, conic quadratic, semidefinite, and nonlinear, in a clear and intuitive code format. JuMP allows these formulated problems to be solved using open-source and commercial solvers including CSDP [35], Mosek [1], SCS [241], and SDPA [112].

At this stage, we mention that models in YALMIP and many other modeling languages are interpreted as so-called dual problems [194], discussed in section 3.2. The dual form of SDP is given in (81), where the SDP variable Z is in a disaggregated form, i.e. it is a matrix composed of linear combinations of scalar variables. This is to be contrasted with the primal form of SDP (80), where the SDP variable X is treated as a single matrix variable. There are two reasons, why modeling languages prefer the dual form over the primal form. The major reason is that symbolic manipulations are much easier when the variables are disaggregated. The other reason is that it was observed that in many different fields, the dual form is more natural to formulate the problems occurring in them, see e.g. table 1 in section 4.6.

Table 1. Comparison of sizes of SDP formulations of the levels of the NPA in a scenario with two parties, each with two binary measurements, when the constraints are expressed in terms of primal or dual canonical SDP forms.

Hierarchy level	n	m-dual	Average density (dual)	m-primal
$\mathfrak{Q}_2$	13	31	0.183	137
$\mathfrak{Q}_3$	25	61	0.098	563
$\mathfrak{Q}_4$	41	101	0.060	1579
$\mathfrak{Q}_5$	61	151	0.041	3569
$\mathfrak{Q}_6$	85	211	0.029	7013
$\mathfrak{Q}_7$	113	281	0.022	12 487
$\mathfrak{Q}_8$	145	361	0.017	20 663
$\mathfrak{Q}_9$	181	451	0.014	32 309
$\mathfrak{Q}_{10}$	221	551	0.011	48 289
$\mathfrak{Q}_{11}$	265	661	0.009	69 563
$\mathfrak{Q}_{12}$	313	781	0.008	97 187
$\mathfrak{Q}_{13}$	365	911	0.007	132 313
$\mathfrak{Q}_{14}$	421	1051	0.006	176 189
$\mathfrak{Q}_{15}$	481	1201	0.005	230 159

1.4. Basic problems of quantum information

Many useful functions that occur in quantum information belong to the family of the so-called semi-algebraic functions. These functions can be represented using SDP constraints, and thus are particularly relevant for this review. On the other hand, many other functions are not semi-algebraic, like the logarithm function used e.g. in the definitions of such quantities as entropies, including Shannon, quantum, or their relative or conditional variants. It would be beneficial to be able to express them, or at least their approximations as SDPs. It revealed that it is possible when the non-semi-algebraic function is approximated with a polynomial function, for instance with the support of one of the Gauss quadratures, e.g. the Radau quadrature. Recent results allowed the use of SDP to approximate the matrix logarithm function [100], and as a result, the use of SDP to efficiently optimize expressions on various entropies [98]. One recent article [47] used these methods to determine the lower bounds of the conditional von Neumann entropy certified in a device-independent approach using an extended NPA method [254] using the NCPOL2SDPA tool [332]. We provide a brief overview of the theory of semidefinite representations of semi-algebraic functions in section 4.1.

An important problem in the investigation of the properties of quantum states is to determine whether they are separable or not. An N-partite state ρ is called separable when it is written as a convex combination of product states, viz. [327]:

$\begin{align} \rho = \sum_i p_i \rho_1^i \otimes \cdots \otimes \rho_N^i. \end{align} \tag{ 5 }$

Determining whether a given state is separable or entangled based solely on the definition is a challenging task in practice. Thus, the so-called separability problem emerges as one of the fundamental issues in the study of entanglement. The famous Peres–Horodecki PPT criterion [151, 250] provided a necessary condition for separability of states and says that if a bipartite state $\rho_{\mathcal{A}\mathcal{B}}$ is separable, then $\rho_{\mathcal{A}\mathcal{B}}^{\mathrm{T}_B} \succeq 0$ . Another attempt at this issue was [188], where a constructive algorithm that enables the identification of the optimal separable approximation for any density matrix associated with a finite-dimensional composite quantum system was presented. The method established a condition for separability and provided a measure of entanglement. An important connection of the separability problem with SDP was the so-called DPS method given in [86, 87], which may be considered as a direct development of the PPT criterion, providing a hierarchy of SDP approximations discussed in section 4.2.

Another notion is that of quantum channels [329]. A quantum channel is a communication channel that transmits quantum information. Quantum channels describe any form of state evolution either in time or space governed by quantum mechanics. There are multiple different issues to be studied regarding quantum channels. One of the main research problems related to them is the characterization and classification of different types of channels, like depolarizing or amplitude-damping channels. Another research problem related to quantum channels is the development of methods for channel estimation and tomography, or the study of noisy and imperfect channels. In practice, all communication channels are subject to noise and imperfections that can degrade the quality of transmitted quantum states. Therefore, it is essential to develop methods for mitigating the effects of noise and imperfections on quantum communication. A basic tool used in modeling quantum channels, or more general maps linear $\mathsf{L}[\mathcal{H}_1, \mathcal{H}_2]$ , for Hilbert spaces $\mathcal{H}_1$ and $\mathcal{H}_2$ , is the Choi–Jamiołkowski isomorphism discussed in section 4.3.

The Tsirelson bound also referred to as the Cirel'son bound, is a concept in quantum mechanics that holds significance in the investigation of quantum non-locality. In essence, the Tsirelson bound establishes a maximum level of correlation achievable between two or more distant quantum systems. The first examples of such bounds were derived by Boris Tsirelson in 1980 [72]. Tsirelson bounds carry profound implications for our comprehension of quantum mechanics and its practical applications. In particular, if the correlations violate the Tsirelson bound, it implies that quantum physics cannot reproduce them. This observation significantly contributes to our understanding of entanglement. The Tsirelson bound finds practical utility in various applications of quantum information theory, including quantum cryptography and quantum teleportation. For instance, it enables the quantification of the requisite and attainable level of entanglement for secure communication through quantum cryptography. In section 4.4 we briefly describe the SoS technique and then show an example of how can it be applied to the derivation of the Tsirelson bound.

A fundamental concept, closely related to the Tsirelson bound is quantum contextuality. It refers to the property of quantum systems where the outcome of a measurement depends on the context in which it is measured. In other words, the value of a quantum property is not determined by the property itself, but by the other properties with which it is measured. This means that the same quantum system can exhibit different properties depending on how it is measured, and this property has been shown to be essential for many quantum information processing tasks. One of the approaches to the analysis of contextuality was given in [57, 58] where a relationship with the so-called Lovász theta has been established. The method revealed to be very profound [56, 133, 152, 186, 268, 304]. Here, in section 4.5 we describe the SDP methods for Lovász theta and show how to relate it to contextuality.

We will now briefly review the topic of Bell inequalities and Bell functionals [25], as this will be needed in many places in this work, especially for section 4.6. Bell inequalities are mathematical expressions that set a limit on certain probabilities, which cannot be violated in a classical physics framework but can be exceeded in quantum mechanics. The violation of Bell inequalities can be observed through experimental measurements, providing conclusive evidence that the behavior of the world cannot be explained solely by classical physics. Such groundbreaking experiments were conducted in the 1980s by Aspect et al [16–18]. A Bell experiment involves two or more separate parties who share a quantum state and perform measurements with different settings, without any form of communication between them. By conducting a series of such measurements and analyzing the collected data, it becomes possible to estimate the set of joint conditional probability distributions, or behaviors, $\{P(a,b|x,y) \}$ of the outcomes conditioned on the settings. A bipartite Bell functional is a linear functional that operates on behaviors over two parties or subsystems of the form $\sum_{a,b,x,y} \alpha_{a,b,x,y} P(a,b|x,y)$ , where $\alpha_{a,b,x,y} \in \mathbb{R}$ .

Classical devices can be described using the following elements. The class $\mathfrak{L}$ represents local behaviors that are in accordance with classical physics. The statistical description, i.e. the behavior, of the pair of devices is of the form

$\begin{align} P\left(a,b|x,y\right) = \sum_{\lambda} P\left(\lambda\right) \cdot P_{A|X,\Lambda}\left(a|x,\lambda\right) \cdot P_{B|Y,\Lambda}\left(b|y,\lambda\right). \end{align} \tag{ 6 }$

Here, $P(\lambda)$ represents the probability of observing the hidden state λ, and $P_{A|X,\Lambda}(a|x,\lambda)$ and $P_{B|Y,\Lambda}(b|y,\lambda)$ correspond to the conditional probabilities of obtaining results a for Alice and b for Bob, respectively, given their respective settings x and y, and the hidden state λ. $P_{\Lambda}(\lambda)$ refers to the probability distribution of hidden internal states, where λ represents a specific state and $\sum_{\lambda \in \Lambda} P_{\Lambda}(\lambda) = 1$ ensures normalization.

Next, the non-signaling devices can be characterized as follows. The class $\mathfrak{N}$ denotes non-signaling behaviors, which align with the principles of relativistic physics. $P_{A|X}(a|x)$ and $P_{B|Y}(b|y)$ represent the marginal conditional probability distributions of Alice and Bob, respectively. These distributions are derived from the behavior $\{P(a,b|x,y)\}$ , and they satisfy the conditions

$\begin{align} \sum_{b \in B} P\left(a,b|x,y\right) = P_{A|X,Y}\left(a|x,y\right) = P_{A|X}\left(a|x\right), \text{ and} \end{align} \tag{ 7a }$

$\begin{align} \sum_{a \in A} P\left(a,b|x,y\right) = P_{B|X,Y}\left(b|x,y\right) = P_{B|Y}\left(b|y\right). \end{align} \tag{ 7b }$

These conditions ensure the consistency of the marginal distributions regardless of the settings of the other party. Non-signaling property implies that the settings chosen by one party do not have any influence on the marginal distribution observed by the other party. By considering these elements and properties, we can analyze the behavior of non-signaling devices in the context of bipartite systems. Optimization over these sets can be performed using LP.

The class $\mathfrak{Q}$ contains all behaviors that adhere to the fundamental principles of quantum physics. It is noteworthy that the class of local behaviors $\mathfrak{L}$ , forms a subset of quantum behaviors, i.e. $\mathfrak{L} \subset \mathfrak{Q}$ , and $\mathfrak{Q} \subset \mathfrak{N}$ . A Bell functional $\mathcal{I}$ can exhibit a characteristic where its maximum value allowed on the set $\mathfrak{Q}$ , denoted as $I_\mathrm{Q}$ , is strictly greater than its maximum value on the set $\mathfrak{L}$ , denoted as $I_\mathrm{L}$ . The existence of such functionals is a consequence of Bell's theorem [26]. A Bell inequality is a statement $\mathcal{I} \unicode{x2A7D} I_\mathrm{L}$ that sets a limit on the value of this operator within the framework of local theories. We say that a Bell inequality is violated if, for a given behavior $\{P(a,b|x,y) \}$ , we have $\mathcal{I} \gt I_\mathrm{L}$ . The task of optimization over $\mathfrak{Q}$ , in particular Bell functionals, is NP-hard, as shown by Kempe et al in 2008 at FOCS [166].

The statement $\{P(a,b|x,y)\} \in \mathfrak{Q}$ is true if and only if the following conditions are satisfied, involving the existence of a (finite or infinite dimensional) Hilbert space $\mathcal{H}$ , a state (vector) $| \, \psi \rangle$ on $\mathcal{H}$ , and a set of operators (measurements) $\{E^a_x, F^b_y\}_{a,b,x,y}$ on $\mathcal{H}$ such that:

(i)
The operators $E^a_x$ and $F^b_y$ are projectors. From this property, it follows that the operators correspond to observable quantities with non-negative probabilities.
(ii)
Different results with the same setting, represented by $E^a_x$ and $E^{a^{\prime}}_x$ , are orthogonal to each other, given by $E^a_x E^{a^{\prime}}_x = 0$ ; similarly for Bob's measurements $F^b_y$ and $F^{b^{\prime}}_y$ . This orthogonality condition signifies that different measurement outcomes are mutually exclusive.
(iii)
The sum of all operators $E^a_x$ for a fixed x equals the identity operator ${\unicode{x1D7D9}}$ , denoted as $\sum_a E^a_x = {\unicode{x1D7D9}}$ . Similarly, the sum of all operators $F^b_y$ for a fixed y is equal to ${\unicode{x1D7D9}}$ , expressed as $\sum_b F^b_y = {\unicode{x1D7D9}}$ . These normalization conditions ensure that the probabilities of all possible outcomes sum up to 1.
(iv)
The operators representing measurements for Alice, $E^a_x$ , and those for Bob, $F^b_y$ , commute with each other, denoted as $[E^a_x, F^b_y] = 0$ . This commutation property indicates that the order of measurements performed by Alice and Bob does not affect the results.
(v)
The joint probability distribution $P(a,b|x,y)$ can be expressed as the expectation value of the operators $E^a_x$ and $F^b_y$ acting on the state $| \, \psi \rangle$ , given by
$\begin{align} P\left(a,b|x,y\right) = \langle \psi \, | E^a_x F^b_y | \, \psi \rangle. \end{align} \tag{ 8 }$
This equation illustrates that the probabilities arise from performing measurements on the quantum state $| \, \psi \rangle$ .

We note that the quantum measurements can be represented not necessarily by projectors as in the above condition (i), but by a more general set of operators called POVMs. A POVM $\{M^a\}_a$ satisfies the PSD condition ${\Large{\forall}}_a M^a \succeq 0$ and the normalization condition $\sum_a M^a = {\unicode{x1D7D9}}$ . Since the vector $| \, \psi \rangle$ can be of arbitrarily high, possibly infinite, dimension, and by Stinespring's dilation theorem [289] any POVM on a specific Hilbert space can be represented by a projective measurement within a sufficiently highly dimensional Hilbert space and any mixed state can be represented as a subsystem originating from a system in a pure state in that Hilbert space, the restriction to projectors in (i) and pure states do not lead to a loss in generality.

From the normalization conditions (iii) we see that for any fixed x (y) any of the operators $\{E^a_x \}_{a,x}$ ( $\{F^b_y \}_{b,y}$ ), can be expressed using the rest of them and the identity operator. Thus instead of the full set $\{E^a_x, F^b_y\}_{a,b,x,y}$ we can equivalently require existence of the so-called reduced set of operators $\{{\unicode{x1D7D9}}, E^{\tilde{a}}_x, F^{\tilde{b}}_y\}_{\tilde{a},\tilde{b},x,y}$ , where the index $\tilde{a}$ ( $\tilde{b}$ ) covers all values excluding the last one. For instance, consider a scenario involving two parties, each with two settings from the set $\{1,2\}$ , and obtaining two outcomes from the set $\{0,1\}$ . The reduced set of operators in this case is

$\begin{align} \left\{{\unicode{x1D7D9}}, E^0_1, E^0_2, F_1^0, F_2^0\right\}. \end{align} \tag{ 9 }$

The schoolbook formulation of quantum mechanics involves the notion of Hilbert spaces and the properties of operators acting over them. In contrast, an effort was made to derive, at least partially, equivalent physical consequences from a direct axiomatics [51, 69, 74, 136, 137, 148, 156, 201], or principles explicitly basing on information theory. The prominent examples of such information-theoretic principles include the non-signaling [255], the non-trivial communication complexity [43], the no advantage for nonlocal computation [192], the information causality [249], the local orthogonality [106], the information content of systems [79]. Of particular interest in this work are the macroscopic locality [229] and the almost quantum set of behaviors discussed in section 4.6.1.

Now, we provide a concise overview of the key aspects pertaining to dimension-bounded scenarios [53, 114, 190, 248]. We will discuss them in detail in sections 4.7 and 4.8. Alice and Bob are assigned random inputs, x and y. Subsequently, Alice sends a message to Bob based on her input, where the message takes the form of a quantum state ρ_x of a specific dimension d. Bob receives the quantum state and performs a measurement $\{M^b_y \}_b$ on it, yielding a result b. This leads to a behavior $\{P_d(b|x,y)\}$ within a prepare-and-measure scheme, $P_d(b|x,y) = {\textrm{Tr}} \left(\rho_{x} M^b_y\right)$ . We assume the absence of entanglement between Alice and Bob in this context. Let $\mathcal{P}_d \equiv \{\{P_d(b|x,y) \}_{\{\rho_x\}_x, \{\{M^b_y \}_b\}_y},\}$ be the set of all probabilities of the discussed for in the given dimension d. We note that we have $\mathcal{P}_d \subseteq \mathcal{P}_{d+1}$ , since increasing the dimension of the communicated state we can send at most the same amount of data. A dimension witness W is a linear function of behaviors, i.e. it has the following form:

$\begin{align} \sum_{b,x,y} \beta_{b,x,y} P\left(b|x,y\right). \end{align} \tag{ 10 }$

The key property of dimension witnesses is that they allow us to distinguish the dimensions for which inclusion is strict. Using the definition of probability distributions in the prepare-and-measure scheme, we may introduce a notion of dimension witnesses which is analogous to the concept of Bell functionals.

2. Mathematical framework of optimization

A general static optimization problem [200], or optimization in finite-dimensional spaces, is a task of determining the values of a certain variable $x \in \mathcal{F} \subseteq \mathbb{R}^n$ , called the decision variable, for which a given function $f_0 : \mathcal{F} \rightarrow \mathbb{R}$ , called target, attains its minimum; these points are called minimizers, and their set is called the optimal set [24]. The whole set $\mathcal{F}$ is called the feasible set (and other names like feasible region, or solution space, or search space, are often used) [23]. A point $x_0 \in \mathcal{F}$ with the property that for all $x_1 \in \mathcal{F}$ it holds $f(x_1) \unicode{x2A7E} f(x_0)$ is called a global minimum. A point $x_0 \in \mathcal{F}$ for which there exists a neighborhood (in metric space sense) $\mathcal{N}$ such that $x_1 \in \mathcal{N} \cap \mathcal{F} \implies f_0(x_1) \unicode{x2A7E} f_0(x_0)$ is called a local minimum. Every global minimum is also a local minimum.

The further part of this section covers the crucial topic of duality in optimization. In section 2.4 we discuss general optimization in Banach spaces, covering essential techniques and concepts. Next, in section 2.5 we explore the Fenchel–Rockafellar scheme, delving into strong duality and constraint qualification, which are fundamental principles in optimization. Then, in section 2.6 we discuss an alternative, but less general, way of construction of dual problems, viz. the Lagrangian scheme. Lastly, in section 2.7 we delve into the more specific case of convex cone optimization and show how both dualization schemes apply to it.

2.1. Convex and linear programming

Now, let us specify what we mean by convex optimization problems. In simple words, these are tasks of minimization of a convex function over a convex set [24, 41]. To be more specific, the so-called functional form of convex problems is defined as follows. Let $m, n \in \mathbb{N}$ . The commonly used general form of convex problems is the following:

$\begin{align} \begin{split} \text{minimize } & f_0\left(x\right) \\ \text{subject to } & f_i\left(x\right) \unicode{x2A7D} b_i, i = 1, \ldots, m, \end{split} \end{align} \tag{ 11 }$

where $f_0, \ldots, f_m: \mathbb{R}^n \rightarrow \mathbb{R}$ are convex function, i.e. for any $x_0,x_1 \in \mathbb{R}^n$ and $\lambda \in [0,1]$ they satisfy the so-called Jensen's inequality:

$\begin{align} f_i \left(\lambda x_0 + \left(1 - \lambda\right) x_1\right) \unicode{x2A7D} \lambda f_i\left(x_0\right) + \left(1-\lambda\right) f_i\left(x_1\right). \end{align} \tag{ 12 }$

If we replace $\unicode{x2A7D}$ with $\lt$ in (12), we say that f_i is strictly convex. The set of points satisfying the constraints of (11), i.e. the feasible set $\mathcal{F} = \bigcap_{i = 1}^m \{x \in \mathbb{R}^n : f_i(x) \unicode{x2A7D} b_i\} \subseteq \mathbb{R}^n$ , is a convex set. A crucial property of convex programs is that all their local minimum points are also global minimum points. What is more, the optimal set for a convex problem is also a convex set. If f₀ is strictly convex then there exists at most one global minimum. The optimal value or, when there is no ambiguity, simply the value, of an optimization problem corresponds to the optimal objective function value, representing the optimal outcome attainable for the objective function while adhering to all constraints. The solution to an optimization problem refers to the collection of decision variables that yield the optimal objective function value. This solution must satisfy all constraints imposed by the problem. It is worth noting that in certain scenarios, multiple optimal solutions can exist, indicating that various sets of decision variables yield the same optimal objective function value.

The terms optimization and optimization problem are often used interchangeably in the literature to refer to the task of finding the optimal (usually minimal, as above) value of an objective function. However, the terms have a subtle difference in their meaning. Optimization typically refers to the act of determining the optimal value itself. The optimization problem encompasses not only finding the optimal value but also the associated decision variables or parameters that achieve that optimal value, i.e. the solution. Thus, we distinguish the minimization problems where the task is to find both the value $\min_{x \in \mathcal{F}} [f_0(x)]$ and the solution $\textrm{argmin}_{x \in \mathcal{F}} [f_0(x)]$ , from the minimization, i.e. the task of finding the infimum $\inf_{x \in \mathcal{F}} [f_0(x)]$ . Note that the infimum may not even be attained, whereas for the minimum there always exists a solution attaining it.

Particular examples of convex optimization problems are LPs and SDPs, being the main topic of this Review. We start with the formulation of LP. Let $m, n \in \mathbb{N}$ , and $m \unicode{x2A7D} n$ . The so-called primal canonical form of LP is the following optimization task in variable x:

$\begin{align} \begin{split} \text{minimize } & c^\mathrm{T} \cdot x \\ \text{subject to } & A x = b, x \unicode{x2A7E} 0, \end{split} \end{align} \tag{ 13 }$

where $A \in \mathbb{R}^{n \times m}$ , $b \in \mathbb{R}^m$ , $x, c \in \mathbb{R}^n$ . The dual problem of LP is

$\begin{align} \begin{split} \text{maximize } & b^\mathrm{T} \cdot y \\ \text{subject to } & c - A^\mathrm{T} y = z, \\ & z \unicode{x2A7E} 0. \end{split} \end{align} \tag{ 14 }$

In the above problems, the variable x is called the primal variable, y the dual variable, z the dual slack variable, A is the linear constraint matrix, b is the right-hand side (RHS) of the linear constraint, and c is the linear coefficient.

2.2. Sets and cones definitions

Let X be a Banach space over an ordered field F.

Consider a set $C \subseteq X$ . The core of C is the set of all points in C such that for any direction d in X there exist $T_d \gt 0$ such that for all $t \in [0, T_d]$ we have $x + t d \in C$ , viz.:

$\begin{align} \textrm{core }{C} \equiv \left\{x \in C : {\Large{\forall}}_{\substack{d \in X, \\ \left| \left| d \, \right| \right| = 1}} {\Large{\exists}}_{T_d > 0} {\Large{\forall}}_{t \in \left[0, T_d\right]} \left(x + t d\right) \in C \right\}. \end{align} \tag{ 15 }$

Contrast it with the interior of C defined as:

$\begin{align} \textrm{int }{C} \equiv \left\{x \in C : {\Large{\exists}}_{T > 0} {\Large{\forall}}_{\substack{d \in X, \\ \left| \left| d \, \right| \right| = 1}} {\Large{\forall}}_{t \in \left[0, T\right]} \left(x + t d\right) \in C \right\}. \end{align} \tag{ 16 }$

It is easy to see that $\textrm{int}{C} \subseteq \textrm{core}{C}$ .

C is convex if

$\begin{align} {\Large{\forall}}_{x_1, x_2 \in C} {\Large{\forall}}{\lambda \in \left[0,1\right]} \lambda x_1 + \left(1 - \lambda\right) x_2 \in C. \end{align} \tag{ 17 }$

In particular, $\emptyset$ is a convex set. C is absorbing [36, p 244] if it is convex and $X = \bigcup_{t \unicode{x2A7E} 0} t C$ . Obviously, $0 \in \textrm{core}{C}$ for absorbing C.

A subset $K \subseteq X$ is called a cone, or nonnegative homogeneous, if and only if [342]:

$\begin{align} x \in K, \lambda \in F_+ \implies \lambda x \in K, \end{align} \tag{ 18 }$

where $F_+$ is the set of all non-negative scalars of F. Thus, the cone is simply the set invariant under multiplication by non-negative scalars. The cone K is convex if and only if [41]

$\begin{align} x_1, x_2 \in K \implies x_1 + x_2 \in K, \end{align} \tag{ 19 }$

that can be intuitively understood as $K + K \subseteq K$ .

For an arbitrary, not necessarily being a cone, subset $K \subseteq X$ , the topological dual cone, or simply the dual cone, is defined as [39, p 16]:

$\begin{align} K^{*} \equiv \left\{x^{*} \in X^{*} : x^{\prime} \in K \implies \langle x^{*} , x^{\prime} \rangle \unicode{x2A7E} 0 \right\}. \end{align} \tag{ 20 }$

The set $K^{*}$ defined by (20) is always a convex cone. If X is also a Hilbert space, then by the Riesz–Fréchet representation theorem, the definition (20) is equivalent to:

$\begin{align} K^{*} \equiv \left\{x \in X : x^{\prime} \in K \implies \langle x | x^{\prime} \rangle \unicode{x2A7E} 0 \right\}. \end{align} \tag{ 21 }$

If $K = K^{*}$ , then K is called a self-dual cone. Examples of a self-dual cone for $m \in \mathbb{N}_{+}$ are: the positive orthant cone of $\mathcal{X} = \mathbb{R}^m$ :

$\begin{align} K_{+}^m \equiv \left\{x \in \mathbb{R}^m : x_1 \unicode{x2A7E} 0, \ldots ,x_m \unicode{x2A7E} 0 \right\} = \mathbb{R}_{+}^m, \end{align} \tag{ 22 }$

the Lorentz (or second order, or quadratic) cone:

$\begin{align} K_q^m \equiv \left\{\left(x,t\right) \in \mathbb{R}^{m+1} : \left| \left| x \, \right| \right|_2 \unicode{x2A7D} t \right\}, \end{align} \tag{ 23 }$

and the PSD cone $\mathbb{S}^n_+$ discussed further in section 3.1.

A set $\mathcal{S} \subseteq \mathbb{R}^n$ is called a basic closed semialgebraic set if and only if there exist a set of polynomials $\{f_i\}_{i \in [m]}$ , $f_i: \mathbb{R}^n \rightarrow \mathbb{R}$ , such that $\mathcal{S} = \left\{x \in \mathbb{R}^n : {\Large{\forall}}_{i \in [m]} f_i(x) \unicode{x2A7E} 0 \right\}$ . Similarly, it is called a basic open semialgebraic set if and only if $\mathcal{S} = \left\{x \in \mathbb{R}^n : {\Large{\forall}}_{i \in [m]} f_i(x) \gt 0 \right\}$ . Compare this with algebraic sets which have the form $\mathcal{S} = \left\{x \in \mathbb{R}^n : {\Large{\forall}}_{i \in [m]} f_i(x) = 0 \right\}$ . A set $\mathcal{S}$ is called semi-algebraic if there exists a set $\{\mathcal{S}_{i,j}\}_{i \in [k], j \in [r_i]}$ for some $k, r_i \in \mathbb{N}_{+}$ such that each $\mathcal{S}_i$ is a basic closed semialgebraic set or a basic open semialgebraic set or an algebraic set and $\mathcal{S} = \bigcup_{i \in [k]} \bigcap_{j \in [r_i]} \mathcal{S}_{i,j}$ . Any algebraic set is obviously semialgebraic. A consequence of the famous Tarski–Seidenberg principle [282, 297] is the fact that the set of semialgebraic sets is closed under projections, i.e. if $S \in \mathbb{R}^{n_1 + n_2}$ is a semialgebraic set, then also its projection onto the first n₁ coordinates is a semialgebraic set. We refer to chapter 2 of [34] for a detailed discussion of semialgebraic sets. The positive orthant, Lorentz, and PSD cone are basic closed semialgebraic sets; the same holds for polyhedra and spectrahedra discussed in section 4.1.

2.3. Convex analysis: functions, convex conjugate, and Fenchel–Rockafellar theorem

Consider an arbitrary function $f: X \rightarrow \mathbb{R} \cup \{-\infty,+\infty\}$ . The function is defined to be proper if it never takes the value $-\infty$ , and is not identically equal to $+\infty$ . The epigraph of f is defined as

$\begin{align} \textrm{epi }{f} \equiv \left\{\left(x,r\right) \in X \times \mathbb{R}: r \unicode{x2A7E} f\left(x\right) \right\}, \end{align} \tag{ 24 }$

so it is the set of all points above the graph of the function. The hypograph of f is $\textrm{hyp}{f} \equiv \{(x,r) \in X \times \mathbb{R}: f(x) \unicode{x2A7E} r \}$ . One usually formulates the definition of a convex function in terms of Jensen's inequality (12); this is the convexity in analytical sense. Alternatively, epigraph allows to provide a geometric sense of convexity, viz. f is defined to be convex if $\textrm{epi}{f}$ is convex, and concave if −f is convex; see [198, p 12] for a discussion. The effective domain of f is defined as

$\begin{align} \textrm{dom }{f} \equiv \left\{x \in X: f\left(x\right) < +\infty \right\}. \end{align} \tag{ 25 }$

f is defined to be lower semi-continuous ( $\textrm{lsc}$ ) if $\textrm{epi}~{f}$ is a closed subset of $X \times \mathbb{R}$ . The set $\textrm{cont}~{f}$ is the set of all points where f is finite and continuous.

Let $F: X \times Y \rightarrow \mathbb{R} \cup \{-\infty,+\infty\}$ , with Y a linear spaces, be convex in both parameters. Let $C \subseteq X$ be a non-empty convex set. Then [41, p 88], the function

$\begin{align} \theta\left(y\right) = \inf_{x \in C} F\left(x, y\right) \end{align} \tag{ 26 }$

is convex in y, as long as

$\begin{align} {\Large{\forall}}_{y \in Y} \theta\left(y\right) \neq -\infty. \end{align} \tag{ 27 }$

The epigraph of θ is $\textrm{epi}~{\theta} = \left\{(x,t) : {\Large{\exists}}_{y \in Y} (x,y,t) \in \textrm{epi}~{F} \right\}$ , and is convex, as a projection of a convex set $\textrm{epi}~{F}$ . Indeed, let $y_0, y_1 \in \textrm{dom}~{\theta}$ . Then

$\begin{align} {\Large{\forall}}_{\epsilon > 0} {\Large{\exists}}_{x_0, x_1 \in C} F\left(x_0, y_0\right) \unicode{x2A7D} \theta\left(y_0\right) + \epsilon~ \text{and }~ F\left(x_1, y_1\right) \unicode{x2A7D} \theta\left(y_1\right) + \epsilon, \end{align} \tag{ 28 }$

and for any $\lambda \in [0,1]$ we have

$\begin{align} \begin{aligned} & \theta \left(\lambda y_0 + \left(1-\lambda\right) y_1\right) = \inf_{x \in C} F\left(x, \lambda y_0 + \left(1-\lambda\right) y_1\right) \\ & \quad \unicode{x2A7D} F\left(\lambda x_0 + \left(1-\lambda\right) x_1, \lambda y_0 + \left(1-\lambda\right) y_1\right) \unicode{x2A7D} \lambda F\left(x_0,y_0\right) + \left(1-\lambda\right) F\left(x_1,y_1\right) \\ & \quad \unicode{x2A7D} \lambda \theta\left(y_0\right) + \left(1-\lambda\right) \theta\left(y_1\right) + \epsilon. \end{aligned} \end{align} \tag{ 29 }$

Since ε can be arbitrarily small, Jensen's inequality (12) for θ follows.

A well-known operation of the holomorphic functional calculus is the extension of a function defined on real values to Hermitian matrices. This extension allows for the evaluation of functions that are not originally defined on matrices but can be extended to them through the use of complex analysis techniques. Any Hermitian matrix $H \in \mathbb{H}^n$ can be diagonalized by a unitary matrix U, so that $H = U \cdot \textrm{Diag}[(d_i)_{i \in [n]}] \cdot U^{\dagger}$ . A function $f : \mathbb{R} \rightarrow \mathbb{R}$ can be applied to the eigenvalues, and thus the function can then be extended to the entire matrix as $f(H) \equiv U \cdot \textrm{Diag}[(f(d_i))_{i \in [n]}] \cdot U^{\dagger}$ . We say that f is an operator monotone (or matrix monotone) when for any $M_1, M_2 \in \mathbb{H}^n$ , from $M_1 \succeq M_2$ it follows that $f(M_1) \succeq f(M_2)$ . Next, we say that f is operator convex (or matrix convex) if it satisfies Jensen's inequality (12) in Löwner's partial order, i.e. $\lambda f(M_1) + (1-\lambda) f(M_2) \succeq f (\lambda M_1 + (1 - \lambda) M_2)$ for all $\lambda \in [0,1]$ [134]. Finally, we say that f is operator concave (or matrix concave) when −f is operator convex [4, 60]. The matrix epigraph of f is $\mathbf{epi}~{f} \equiv \left\{(X, R) \in \mathbb{H}^n_{++} \times \mathbb{H}^n : R \succeq f(X) \right\}$ , and the matrix hypograph of f is $\mathbf{hyp}~{f} \equiv \left\{(X, R) \in \mathbb{H}^n_{++} \times \mathbb{H}^n : f(X) \succeq R \right\}$ .

The (commutative) perspective of a function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is the function ${P}_f : \mathbb{R}^{n+1} \rightarrow \mathbb{R}$ defined as ${P}_f(x,t) \equiv t f(x \cdot t^{-1})$ with $\textrm{dom}~{{P}_f} = \left\{(x,t) : x/t \in \textrm{dom}~{f}, t \gt 0 \right\}$ . The operation of perspective preserves the convexity, i.e. if f is convex then ${P}_f$ is convex [41, p 89]. If $M_1, M_2 \in \mathbb{H}^n$ commute, then ${P}_f(M_1,M_2)$ is well-defined by extending ${P}_f$ to matrices [93]. To cover also the non-commutative case, the non-commutative perspective of an operator convex function is defined as the unique extension of the corresponding (commutative) perspective that preserves homogeneity and convexity [92]. The formula for non-commutative perspective is $\mathbf{P}_f[M_1, M_2] \equiv M_2^{1/2} \cdot f \left( M_2^{-1/2} M_1 M_2^{-1/2} \right) \cdot M_2^{1/2}$ , with $\textrm{dom}~{\mathbf{P}_f} = \mathbb{H}^n \times \mathbb{H}^n_+$ [91]. For instance, the non-commutative perspective of the negative logarithm function is the operator relative entropy [107–110], viz.

$\begin{align} \begin{aligned} S\left(M_1|M_2\right) & = M_2^{1/2} \cdot \eta\left[M_2^{-1/2} M_1 M_2^{-1/2}\right] \cdot M_2^{1/2} = \mathbf{P}_{\left(- x \log{x}\right)}\left[M_1,M_2\right] \\ & = -M_1^{1/2} \cdot \log \left( M_1^{-1/2} M_2 M_1^{-1/2} \right) \cdot M_1^{1/2} = \mathbf{P}_{\left(-\log\right)}\left[M_2,M_1\right] \end{aligned} \end{align} \tag{ 30 }$

for $\eta(x) \equiv - x \log{x}$ and invertible M₁ and M₂ [109].

In [14, 261] the notion of the matrix geometric mean $M_1 \# M_2 \equiv M_1^{1/2} \cdot [ M_1^{-1/2} M_2 M_1^{-1/2} ]^{1/2} \cdot M_1^{1/2}$ , which satisfy certain general properties [182], was introduced for PD M₁ and M₂. Its direct generalization is the so-called t-weighted matrix geometric mean

$\begin{align} M_1 \#_t M_2 \equiv \mathbf{P}_{\left(x \mapsto x^t\right)}\left[M_1,M_2\right] = M_1^{1/2} \cdot \left[ M_1^{-1/2} M_2 M_1^{-1/2} \right]^{t} \cdot M_1^{1/2}, \end{align} \tag{ 31 }$

and thus $M_1 \# M_2 = M_1 \#_{1/2} M_2$ [33, 274]. It can be shown that t-weighted matrix geometric mean is operator concave for $t \in [0,1]$ , and operator convex for $t \in [-1,0] \cup [1,2]$ [33].

The indicator function of C is defined as

$\begin{align} I_C\left[x\right] \equiv \begin{cases} 0 & \text{if } x \in C, \\ +\infty & \text{otherwise}. \end{cases} \end{align} \tag{ 32 }$

The indicator function $I_C[x]$ is convex if and only if, the set C is convex. It can also be shown that the indicator function is lower (upper) semi-continuous if and only if, C is closed (open).

The mean value of a random variable x we denote as $\langle x \rangle$ , and the standard deviation as ${\boldsymbol{\sigma}} \left[ x \right]$ . The covariance between random variables x_i and x_j is defined as $\textrm{cov}[x_i, x_j] \equiv \langle (x_i - \langle x_i \rangle) \cdot (x_i - \langle x_i \rangle) \rangle$ , and their correlation is defined as $\text{corr}[x_i, x_j] \equiv \textrm{cov}[x_i, x_j] / ( {\boldsymbol{\sigma}} \left[ x_i \right] \cdot {\boldsymbol{\sigma}} \left[ x_j \right])$ . The variance of x is the covariance of the variable with itself, $\textrm{var}[x] \equiv \textrm{cov}[x,x] \unicode{x2A7E} 0$ .

2.3.1. Fenchel conjugate.

For an arbitrary function $f: X \rightarrow \mathbb{R} \cup \{+\infty\}$ , where X is a Banach space, the Fenchel conjugate [102], or convex conjugate, or Legendre transform, or Legendre–Fenchel transform, or simply the conjugate, being the basic operation in convex analysis, is defined as

$\begin{align} f^{{\,}*}: X^{*} \rightarrow \mathbb{R} \cup \left\{-\infty,+\infty\right\} : x^{*} \mapsto \sup_{x \in X} \left\{ \langle x^{*} , x \rangle - f\left(x\right) \right\}. \end{align} \tag{ 33 }$

The conjugate operation can be applied multiple times. For instance the function $(f^{{\,}*})^{*}$ is defined on $X^{**}$ . With an abuse of notation by $f^{**}$ we denote the restriction of $(f^{{\,}*})^{*}$ to X, so that [198, p 83]:

$\begin{align} f^{**}: X \rightarrow \mathbb{R} \cup \left\{-\infty,+\infty\right\} : x \mapsto \sup_{x^{*} \in X^{*}} \left\{ \langle x^{*} , x \rangle - f^{{\,}*}\left(x^{*}\right) \right\}. \end{align} \tag{ 34 }$

It holds

$\begin{align} \textrm{epi }{f^{{\,}*}} = \bigcap_{x \in X} \textrm{epi}{\left\{ \langle \cdot , x \rangle - f\left(x\right) \right\} }. \end{align} \tag{ 35 }$

Since for any $x \in X$ the set $\textrm{epi}{\{ \langle \cdot , x \rangle - f(x) \} }$ is convex and closed in weak^* topology on $X^{*} \times \mathbb{R}$ (and the natural topology on $\mathbb{R}$ ), we have that $f^{{\,}*}$ is always a convex and $\textrm{lsc}$ function in that topology, no matter on the form of f. It is trivial to see that $f_1 \unicode{x2A7D} f_2 \implies f_1^{*} \unicode{x2A7E} f_2^{*}$ and $f^{{\,}*}(0) = -\inf_{x \in X} f(x)$ [198, p 82].

A direct consequence of the definition (33) is for any $f: X \rightarrow \mathbb{R} \cup \{-\infty,+\infty\}$ it holds [138, p 184]:

$\begin{align} f \equiv +\infty \iff f^{{\,}*} \equiv -\infty \iff -\infty \in f^{{\,}*}\left(X\right), \end{align} \tag{ 36 }$

and, in particular if $f^{{\,}*}$ is proper, then also f is proper. From ${\Large{\forall}}_{x \in X} {\Large{\forall}}_{x^{*} \in X^{*}} f(x) \unicode{x2A7E} \langle x^{*} , x \rangle - f^{{\,}*}(x^{*})$ taking supremum over $x^{*}$ we get ${\Large{\forall}}_{x \in X} f^{**}(x) \unicode{x2A7D} f(x)$ , i.e.

$\begin{align} f^{**} \unicode{x2A7D} f. \end{align} \tag{ 37 }$

The inequality (37), together with the convexity and $\textrm{lsc}$ properties of the Fenchel conjugate, motivate to call $f^{**}$ a regularization, or a convex lsc relaxation. If f itself is convex and $\textrm{lsc}$ , and there exist $x^{*} \in X^{*}$ and $\alpha \in \mathbb{R}$ such that $f \unicode{x2A7E} \langle x^{*} , \cdot \rangle + \alpha$ then the Fenchel–Moreau theorem states the equality,

$\begin{align} f = f^{**}, \end{align} \tag{ 38 }$

see [198, p 84] for the proof.

2.3.2. Subgradient and Fenchel–Rockafellar theorem.

Consider an arbitrary function $f: X \rightarrow \mathbb{R} \cup \{+\infty \}$ . An element $x^{*} \in X^{*}$ is called a subgradient of f at $x \in \textrm{dom }{f}$ when

$\begin{align} {\Large{\forall}}_{x^{\prime} \in X} f\left(x^{\prime}\right) - f\left(x\right) \unicode{x2A7E} \langle x^{*} , x^{\prime} - x \rangle. \end{align} \tag{ 39 }$

The (possibly empty) set of all subgradients of f at $x \in \textrm{dom}{f}$ is called subdifferential and denoted by $\partial f(x)$ . Directly from the definitions of subgradient and convex conjugate, it follows that

$\begin{align} {\Large{\forall}}_{x \in \textrm{dom} f \subseteq X} {\Large{\forall}}_{x^{*} \in X^{*}} x^{*} \in \partial f\left(x\right) \iff f\left(x\right) + f^{{\,}*}\left(x^{*}\right) = \langle x^{*} , x \rangle \implies x \in \partial f^{{\,}*}\left(x^{*}\right). \end{align} \tag{ 40 }$

If $f(x) = f^{**}(x)$ , e.g. for f proper convex $\textrm{lsc}$ by the Fenchel–Moreau theorem, then the implication in (40) becomes equivalence. Note that a part of (40), the so-called Fenchel–Young inequality,

$\begin{align} {\Large{\forall}}_{x \in X} {\Large{\forall}}_{x^{*} \in X^{*}} f\left(x\right) + f^{{\,}*}\left(x^{*}\right) \unicode{x2A7E} \langle x^{*} , x \rangle, \end{align} \tag{ 41 }$

always holds [36, p 51]. The notion of the subgradient does not require the function f to be convex. Nonetheless, it uses global properties of f and is most useful in the context of convex functions.

The Fenchel–Rockafellar theorem provides sufficient conditions for the subdifferential of a convex function $f : X \rightarrow \mathbb{R} \cup \{+\infty \}$ to be non-empty at x, i.e. $\partial f (x) \neq \emptyset$ [37, p 121], viz.:

(i)
f is $\textrm{lsc}$ and $x \in \textrm{core} (\textrm{dom}~{f})$ , or
(ii)
$x \in \textrm{cont}~{f}$ .

2.4. Optimization in Banach spaces

For the rest of this section, let X and Y be Banach spaces. The duality between product space X × Y and $X^{*} \times Y^{*}$ is given by $\langle (x^{*},y^{*}) , (x,y) \rangle \equiv \langle x^{*} , x \rangle + \langle y^{*} , y \rangle$ .

Now, we provide a discussion of the basic concept of duality in optimization. The treatment we provide is relatively extensive, yet we find this topic to be of particular importance, and, additionally, often quite confusing. We discuss the two major duality schemes of Fenchel–Rockafellar and Lagrangian. For SDP the two schemes lead to the same results, but since most of the works refer to either of those two, it is useful to recognize and understand both. We consider the general approach, not limited to convex optimization unless explicitly stated. As we will show, for an optimization problem formulated as a minimization task, we can formulate a related maximization task with the property of the so-called duality meaning that the value of any feasible solution of the former is at least as large as the value of any feasible solution of the latter. The former optimization task is called a primal problem, and the latter a dual problem.

For both Fenchel–Rockafellar and Lagrangian schemes, we consider a single, possibly non-convex, function $F: X \times Y \rightarrow \mathbb{R} \cup \{+\infty \}$ . The primal problem is defined as

$\begin{align} \begin{split} \text{minimize } & F\left(x, 0\right) \\ \text{subject to } & x \in X, \end{split} \end{align} \tag{ 42 }$

and the primal optimization is $\inf_{x \in X} \{F(x,0) \}$ ; its value is called the primal value and denoted by p. A value of $x \in X$ for that the primal value p is attained, if exists, is called a primal solution. The primal function, or the target function is

$\begin{align} f\left(x\right) \equiv F\left(x, 0\right). \end{align} \tag{ 43 }$

Actually, in most of the cases, one is interested in some, domain-specific, target function f, and in this sense, F is secondary to f. For a given f there exist many different possible functions F that satisfy (43). For this reason, F is called a perturbation function of f.

The dual problem is defined as

$\begin{align} \begin{split} \text{maximize } & -F^{*}\left(0, y^{*}\right) \\ \text{subject to } & y^{*} \in Y^{*}, \end{split} \end{align} \tag{ 44 }$

and the dual optimization is $\sup_{y^{*} \in Y^{*}} \{-F^{*}(0, y^{*}) \}$ , where $F^{*} : X^{*} \times Y^{*} \rightarrow \mathbb{R} \cup \{-\infty,+\infty\}$ is the Fenchel conjugate with respect to both variables, i.e.

$\begin{align} F^{*}\left(x^{*}, y^{*}\right) = \sup_{\left(x,y\right) \in X \times Y} \left\{ \langle \left(x^{*},y^{*}\right) , \left(x,y\right) \rangle - F\left(x, y\right) \right\}. \end{align} \tag{ 45 }$

Its value is called the dual value and denoted by d; a value of $y \in Y^{*}$ for that the value d is attained, if exists, is called the dual solution. The dual function is [41, p 216]:

$\begin{align} g\left(y^{*}\right) \equiv -F^{*}\left(0, y^{*}\right). \end{align} \tag{ 46 }$

The value function is defined as

$\begin{align} \theta: Y \rightarrow \mathbb{R} \cup \left\{-\infty,+\infty\right\}: y \mapsto \inf_{x \in X} F\left(x,y\right). \end{align} \tag{ 47 }$

The duality property is called weak when the value p of the optimal solution of the primal problem is greater or equal to the value d of the optimal solution of the dual problem, i.e. $p \unicode{x2A7E} d$ . This property can be obtained directly from the definition of the Fenchel conjugate giving:

$\begin{align} F\left(x, 0\right) + F^{*}\left(0, y^{*}\right) \unicode{x2A7E} \langle \left(0,y^{*}\right) , \left(x,0\right) \rangle = 0. \end{align} \tag{ 48 }$

The duality is called strong, if the equality of the optimal primal and dual solution holds, i.e. p = d. The difference $p - d \unicode{x2A7E} 0$ is called the duality gap. The duality gap Δ is defined as

$\begin{align} \Delta \equiv \begin{cases} 0 & \text{if } p = d \in \left\{-\infty, +\infty\right\} \\ p - d & \text{otherwise}. \end{cases} \end{align} \tag{ 49 }$

For the strong duality to hold, we need equality in (48). Since $\langle (0,y^{*}) , (x,0) \rangle = 0$ , from (40) it follows that the strong duality for $(\bar{x},\bar{y}^{*}) \in X \times Y^{*}$ is equivalent to [198, pp 101–2]:

$\begin{align} \left(0,\bar{y}^{*}\right) \in \partial F\left(\bar{x},0\right), \end{align} \tag{ 50 }$

and if $F(\bar{x},0) = F^{**}(\bar{x},0)$ , then also to [198, p 103]:

$\begin{align} \left(\bar{x},0\right) \in \partial F^{*}\left(0,\bar{y}^{*}\right). \end{align} \tag{ 51 }$

If the perturbation functions F is proper and is both convex and $\textrm{lsc}$ in the second parameter, then it is called a dualizing parametrization [271] of the minimization $\inf_{x \in X} \{F(x, 0) \}$ , resp. of the primal problem (42). Thus, F provides a family of optimizations $\inf_{x \in X} \{F(x, y) \}$ , resp. problems, parameterized by the so-called parameter variable y [198, pp 100–1]. We stress that the same primal problem (42) is obtained with any other function $F^{\prime}: X \times Y \rightarrow \mathbb{R} \cup \{+\infty \}$ satisfying $F(\cdot,0) \equiv F^{\prime}(\cdot,0)$ , but a different function $F^{\prime} \neq F$ may lead to a different dual problem (44).

2.5. Fenchel–Rockafellar dualization scheme

First, for the Fenchel–Rockafellar duality [36, 37, 41, 138], consider a bounded linear map $A: X \rightarrow Y$ , and two, possibly non-convex, functions, $f: X \rightarrow \mathbb{R} \cup \{+\infty\}$ and $g: Y \rightarrow \mathbb{R} \cup \{+\infty\}$ . For the triple $(A, f, g)$ define

$\begin{align} F\left(x, y\right) \equiv f\left(x\right) + g\left(Ax + y\right). \end{align} \tag{ 52 }$

The primal problem (42) is thus

$\begin{align} \begin{split} \text{minimize } & f\left(x\right) + g\left(Ax\right) \\ \text{subject to } & x \in X. \end{split} \end{align} \tag{ 53 }$

The value function (47) is equal

$\begin{align} \theta: Y \rightarrow \mathbb{R} \cup \left\{-\infty,+\infty\right\}: y \mapsto \inf_{x \in X} \left( f\left(x\right) + g\left(Ax + y\right) \right) \end{align} \tag{ 54 }$

with $\theta(0) = p$ . We have [198, p 106]:

$\begin{align} \begin{aligned} F^{*}\left(x^{*}, y^{*}\right) & = \sup_{\left(x,y\right) \in X \times Y} \left\{ \langle \left(x^{*},y^{*}\right) , \left(x,y\right) \rangle - f\left(x\right) - g\left(Ax + y\right) \right\} \\ & = \sup_{\left(x,y\right) \in X \times Y} \left\{ \langle x^{*} + A^{*} y^{*} , x \rangle + \langle -A^{*} y^{*} , x \rangle + \langle y^{*} , y \rangle - f\left(x\right) - g\left(Ax + y\right) \right\} \\ & = \sup_{\left(x,y\right) \in X \times Y} \left\{ \langle x^{*} + A^{*} y^{*} , x \rangle + \langle -y^{*} , Ax + y \rangle - f\left(x\right) - g\left(Ax + y\right) \right\} \\ & = \sup_{x \in X} \left\{ \langle x^{*} + A^{*} y^{*} , x \rangle - f\left(x\right) + \sup_{y \in Y} \left( \langle -y^{*} , Ax + y \rangle - g\left(Ax + y\right) \right) \right\} \\ & = \sup_{x \in X} \left\{ \langle x^{*} + A^{*} y^{*} , x \rangle - f\left(x\right) + g^{*}\left(-y^{*}\right) \right\} \\ & = f^{{\,}*}\left(x^{*} + A^{*} y^{*}\right) + g^{*}\left(-y^{*}\right). \end{aligned} \end{align} \tag{ 55 }$

The dual problem (44) is thus equal

$\begin{align} \begin{split} \text{maximize } & -f^{{\,}*}\left(A^{*} y^{*}\right) - g^{*}\left(-y^{*}\right) \\ \text{subject to } & y^{*} \in Y^{*}. \end{split} \end{align} \tag{ 56 }$

The weak duality, viz. $p \unicode{x2A7E} d$ , can be derived directly from the Fenchel–Young inequality (41). Indeed from the adjoint operator definition (1) we get

$\begin{align} \begin{aligned} & {\Large{\forall}}_{x \in X} {\Large{\forall}}_{y^{*} \in Y^{*}} \left( f\left(x\right) + f^{{\,}*}\left(A^{*} y^{*}\right) \right) + \left( g\left(Ax\right) + g^{*}\left(-y^{*}\right)\right) \\ & \quad \unicode{x2A7E} \langle A^{*} y^{*} , x \rangle + \langle -y^{*} , Ax \rangle = \langle y^{*} , Ax \rangle - \langle y^{*} , Ax \rangle = 0. \end{aligned} \end{align} \tag{ 57 }$

2.5.1. Strong duality.

Knowing that the weak duality $p \unicode{x2A7E} d$ holds, to show the strong duality, we need to establish when $p \unicode{x2A7D} d$ . Suppose that the subdifferential $\partial \theta$ is non-empty at 0. We will show that this suffices for the strong duality, and in section 2.5.2 provide the so-called constraint qualification conditions that ensure this. Let $y^{*} \in \partial \theta(0) \subseteq Y^{*}$ . From the definition of subgradient for any $x \in X$ we have ${\Large{\forall}}_{y \in Y} \theta(y - Ax) - \theta(0) \unicode{x2A7E} \langle y^{*} , y - Ax \rangle$ , and thus, from the definition of the value function (54) we get

$\begin{align} \begin{aligned} & {\Large{\exists}}_{y^{*} \in Y^{*}} {\Large{\forall}}_{x \in X} {\Large{\forall}}_{y \in Y} \theta\left(0\right) \unicode{x2A7D} \theta\left(y - Ax\right) - \langle y^{*} , y - Ax \rangle \\ & \quad \unicode{x2A7D} f\left(x\right) + g\left(y\right) - \langle y^{*} , y - Ax \rangle = \left( f\left(x\right) + \langle A^{*} y^{*} , x \rangle \right) + \left( g\left(y\right) - \langle y^{*} , y \rangle \right). \end{aligned} \end{align} \tag{ 58 }$

Since the inequality holds for all $x \in X$ and $y \in Y$ , taking the infimum of these variables by the definition of the convex conjugate (33) we get ${\Large{\exists}}_{y^{*} \in Y^{*}} \theta(0) \unicode{x2A7D} -f^{{\,}*}(-A^{*} y^{*}) - g^{*}(y^{*})$ or, equivalently by negating the sign of $y^{*}$ (as also $-y^{*} \in Y^{*}$ ), we get

$\begin{align} {\Large{\exists}}_{y^{*} \in Y^{*}} \theta\left(0\right) \unicode{x2A7D} -f^{{\,}*}\left(A^{*} y^{*}\right) - g^{*}\left(-y^{*}\right). \end{align} \tag{ 59 }$

This, by (56), shows that $p = \theta(0) \unicode{x2A7D} d$ , and thus

$\begin{align} \partial \theta\left(0\right) \neq \emptyset \implies p \unicode{x2A7D} d, \end{align} \tag{ 60 }$

and the strong duality holds.

2.5.2. The decoupling lemma.

Now, we will discuss the so-called decoupling lemma [37, 38] providing a sufficient criterion, viz. certain constraint qualification condition, for the strong duality of the Fenchel–Rockafellar scheme. Since this topic is crucial for duality in convex optimization, for the sake of completeness, we provide in this work in appendix A a summary of the proof of the theorem, following [37, p 127nn].

Let us assume that $f: X \rightarrow \mathbb{R}$ and $g: Y \rightarrow \mathbb{R}$ are both convex and that the map $A: X \rightarrow Y$ is linear and bounded. We provide sufficient conditions for θ as defined in (54) to have non-empty subdifferential at 0, i.e. $\partial \theta(0) \neq \emptyset$ . We use the Fenchel–Rockafellar theorem, see section 2.3.2. One can directly check that θ when f and g are convex, then θ given by (54) is a convex function with $\textrm{dom}{\theta} = \textrm{dom}{g} - A \textrm{dom}{f}$ , where the set difference is the Minkowski difference. Indeed, $F(x,y)$ defined as (52) is convex as a sum of convex f and g is of the form (26), and satisfies the condition (27). The decoupling lemma states that under the condition that both f and g are $\textrm{lsc}$ and

$\begin{align} 0 \in \textrm{core} \left( \textrm{dom }{\theta} \right), \end{align} \tag{ 61 }$

the function θ defined as in (54) is continuous at 0. Then, from the Fenchel–Rockafellar theorem stated in section 2.3.2, it follows that $\partial \theta$ is non-empty at 0, as required for the proof of strong duality as given in section 2.5.1.

2.6. Lagrangian dualization scheme

Next, for the Lagrangian duality [138, 198], consider a single, again possibly non-convex, function F. Nonetheless, again, if F is convex in both parameters, then θ is of the form (26) and satisfies (27), thus θ is convex in this case. The Lagrangian of F is defined as [138, 198, 271]

$\begin{align} \mathcal{L}: X \times Y^{*} \rightarrow \mathbb{R} \cup \left\{-\infty,+\infty\right\}: \left(x,y^{*}\right) \mapsto -\sup_{y \in Y} \left\{ \langle y^{*} , y \rangle - F\left(x,y\right) \right\}, \end{align} \tag{ 62 }$

where one can easily recognize the Fenchel conjugate with respect to the parameter (i.e. the second) variable.

The Lagrangian allows reformulating the primal problem (42). From the definition (62) it follows that

$\begin{align} {\Large{\forall}}_{x \in X} {\Large{\forall}}_{y^{*} \in Y^{*}} {\Large{\forall}}_{y \in Y} F\left(x,y\right) \unicode{x2A7E} \mathcal{L}\left(x,y^{*}\right) + \langle y^{*} , y \rangle. \end{align} \tag{ 63 }$

When F is a dualizing parametrization, then by the Fenchel–Moreau theorem, see (38), the equality in (63) holds. From (63) we have for the primal optimization, see (42):

$\begin{align} \inf_{x \in X} \left\{F\left(x,0\right) \right\} \unicode{x2A7E} \inf_{x \in X} \sup_{y^{*} \in Y^{*}} \mathcal{L}\left(x,y^{*}\right). \end{align} \tag{ 64 }$

Also the dual optimization, see (44), can be easily expressed with the Lagrangian, viz. [198, p 109]:

$\begin{align} \begin{aligned} \sup_{y^{*} \in Y^{*}} \left\{-F^{*}\left(0, y^{*}\right) \right\} & = \sup_{y^{*} \in Y^{*}} \left\{-\sup_{x \in X, y \in Y} \left\{ \langle \left(0,y^{*}\right) , \left(x,y\right) \rangle - F\left(x,y\right)\right\} \right\} \\ & = \sup_{y^{*} \in Y^{*}} \left\{-\sup_{x \in X} \sup_{y \in Y} \left\{ \langle y^{*} , y \rangle - F\left(x,y\right)\right\} \right\} = \sup_{y^{*} \in Y^{*}} \inf_{x \in X} \mathcal{L}\left(x,y^{*}\right). \end{aligned} \end{align} \tag{ 65 }$

The inner infimum of the last expression is sometimes used as an alternative definition [41, p 216] of the dual function (46):

$\begin{align} g\left(y^{*}\right) \equiv \inf_{x \in X} \mathcal{L}\left(x,y^{*}\right). \end{align} \tag{ 66 }$

A direct consequence of (64) and (65) is another proof of the weak duality:

$\begin{align} \inf_{x \in X} \left\{F\left(x,0\right) \right\} \unicode{x2A7E} \inf_{x \in X} \sup_{y^{*} \in Y^{*}} \mathcal{L}\left(x,y^{*}\right) \unicode{x2A7E} \sup_{y^{*} \in Y^{*}} \inf_{x \in X} \mathcal{L}\left(x,y^{*}\right) = \sup_{y^{*} \in Y^{*}} \left\{-F^{*}\left(0, y^{*}\right) \right\}. \end{align} \tag{ 67 }$

The second inequality follows from the well-known max–min inequality [313]. A value $(\bar{x},\bar{y}^{*}) \in X \times Y^{*}$ is defined to be a saddle point of $\mathcal{L}$ when

$\begin{align} {\Large{\forall}}_{x \in X} {\Large{\forall}}_{y^{*} \in Y^{*}} \mathcal{L}\left(\bar{x},y^{*}\right) \unicode{x2A7D} \mathcal{L}\left(\bar{x},\bar{y}^{*}\right) \unicode{x2A7D} \mathcal{L}\left(x,\bar{y}^{*}\right), \end{align} \tag{ 68 }$

i.e. in other words, when

$\begin{align} \sup_{y^{*} \in Y^{*}} \mathcal{L}\left(\bar{x},y^{*}\right) = \mathcal{L}\left(\bar{x},\bar{y}^{*}\right) = \inf_{x \in X} \mathcal{L}\left(x,\bar{y}^{*}\right). \end{align} \tag{ 69 }$

From (67) we also directly get that for F being a dualizing parametrization the strong duality for $(\bar{x},\bar{y}^{*})$ is equivalent to saying that $(\bar{x},\bar{y}^{*})$ is a saddle point of $\mathcal{L}$ , see e.g. [198, p 110] for a proof.

2.7. Convex cone optimization and duality

We have introduced the general framework for optimization and discussed its duality. Now, we concentrate on a particular problem of convex cone programming, i.e. the optimization over variables belonging to a convex cone [41, 81, 214, 235]. We write convex cone optimization problems as:

$\begin{align} \begin{split} \text{minimize } & c^{\dagger} x \\ \text{subject to } & \mathcal{A} x = b, \\ & x \in K \subseteq \mathcal{X}, \end{split} \end{align} \tag{ 70 }$

in the primal form, see (42), and in the dual form, see (44), as:

$\begin{align} \begin{split} \text{maximize } & \langle y^{*} , b \rangle \\ \text{subject to } & c^{\dagger} - \mathcal{A}^{\dagger} y^{*} = z^{*}, \\ & z^{*} \in K^{*} \subseteq \mathcal{X}^{*}, \end{split} \end{align} \tag{ 71 }$

where K is a nonempty, closed convex cone in an Euclidean space $\mathcal{X}$ , see (19), $\mathcal{A}: \mathcal{X} \rightarrow \mathbb{R}^m$ is a linear operator, the operator $\mathcal{A}^{\dagger} : \left( \mathbb{R}^m \right)^{*} \rightarrow \mathcal{X}^{*}$ is its adjoint, $b \in \mathbb{R}^m$ , $y^{*} \in \left( \mathbb{R}^m \right)^{*}$ and $c \in \mathcal{X}$ . Note, that the spaces $\mathbb{R}^m$ and $\left( \mathbb{R}^m \right)^{*}$ are isomorphic with the transposition operation as the isomorphism.

To derive (71) from (70) we need a parametrization of the family of problems [198, pp 111–2]. One of the possibilities is to introduce a variable y used as the parameter for the linear constraints and take

$\begin{align} F\left(x, y\right) = \begin{cases} c^{\dagger} x + I_{\left\{x : \mathcal{A} x - b = y \right\} }\left[x\right] & \text{if } x \in K, \\ +\infty & \text{otherwise} \end{cases} \end{align} \tag{ 72 }$

where we used the indicator function (32). We stress that this is only one of the multiple examples of a dualizing parametrization (note that the indicator is over a convex closed set, and thus is convex and $\textrm{lsc}$ ): the most direct and simple, but yet arbitrary. To get the dual problem (44) we calculate

$\begin{align} \begin{aligned} -F^{*}\left(0, y^{*}\right) & = -\sup_{x \in X, y \in Y} \left\{ \langle y^{*} , y \rangle - F\left(x, y\right) \right\} = \inf_{x \in K, y \in Y} \left\{c^{\dagger} x + I_{\left\{x : \mathcal{A} x - b = y \right\} }\left[x\right] - \langle y^{*} , y \rangle \right\} \\ & = \inf_{x \in K} \left\{c^{\dagger} x - \langle y^{*} , \mathcal{A} x - b \rangle \right\} = \inf_{x \in K} \left\{ \langle c^{\dagger} - \mathcal{A}^{\dagger} y^{*} , x \rangle + \langle y^{*} , b \rangle \right\}. \end{aligned} \end{align} \tag{ 73 }$

The term $\langle c^{\dagger} - \mathcal{A}^{\dagger} y^{*} , x \rangle$ is non-negative for all $x \in K$ if and only if $c^{\dagger} - \mathcal{A}^{\dagger} y^{*}$ belongs to the dual cone $K^{*}$ ; and if a negative value can be attained for some $x \in K$ , then the infimum is $-\infty$ . Thus

$\begin{align} -F^{*}\left(0, y^{*}\right) = \begin{cases} \langle y^{*} , b \rangle & \text{if } c^{\dagger} - \mathcal{A}^{\dagger} y^{*} \in K^{*}, \\ -\infty & \text{otherwise}. \end{cases} \end{align} \tag{ 74 }$

The problem (71) is derived as $\sup_{y^{*} \in Y^{*}} \{-F^{*}(0, y^{*}) \}$ , see (44), by introducing $z^{*} = c^{\dagger} - \mathcal{A}^{\dagger} y^{*}$ .

The same result can be equivalently achieved, but more step by step, with the approach using the Lagrangian (62), which if $x \in K$ for F given by (72) is [41, p 266]:

$\begin{align} \begin{aligned} \mathcal{L}\left(x,y^{*}\right) & = -\sup_{y \in Y} \left\{ \langle y^{*} , y \rangle - F\left(x,y\right) \right\} = \inf_{y \in Y} \left\{F\left(x,y\right) - \langle y^{*} , y \rangle \right\} \\ & = \inf_{y : \mathcal{A} x - b = y, y \in Y} \left\{c^{\dagger} x - \langle y^{*} , y \rangle \right\} = \langle c^{\dagger} - \mathcal{A}^{\dagger} y^{*} , x \rangle + \langle y^{*} , b \rangle \end{aligned} \end{align} \tag{ 75 }$

and $\mathcal{L}(x,y^{*}) = +\infty$ if $x \notin K$ . The dual is derived as, see (65) and (73):

$\begin{align} \sup_{y^{*} \in Y^{*}} \inf_{x \in X} \mathcal{L}\left(x,y^{*}\right) = \sup_{y^{*} \in Y^{*}} \inf_{x \in K} \mathcal{L}\left(x,y^{*}\right). \end{align} \tag{ 76 }$

In particular, we see that the dual function (66) is the same as in (74):

$\begin{align} g\left(y^{*}\right) = \inf_{x \in X} \mathcal{L}\left(x,y^{*}\right) = \inf_{x \in K} \mathcal{L}\left(x,y^{*}\right) = -F^{*}\left(0, y^{*}\right) = \begin{cases} \langle y^{*} , b \rangle & \text{if } c^{\dagger} - \mathcal{A}^{\dagger} y^{*} \in K^{*}, \\ -\infty & \text{otherwise}. \end{cases} \end{align} \tag{ 77 }$

3. Theory of SDP

In this section, we delve into the foundational concepts and principles underlying the field of SDP. The section 3.1 elucidates the fundamental properties and criteria for positive semi-definiteness of matrices, which form the basis for semidefinite optimization problems. In section 3.2 we investigate various primal and dual formulations present in the literature. Next, section 3.3 explores the duality theory associated with SDP, highlighting the relationships between primal and dual problems, and section 3.4 discusses how a solution of a dual problem can provide a useful linear (affine) bound on a range of parameterized primal problems. Finally, sections 3.5 and 3.6 cover specialized topics, shedding light on the utilization of complex variables, the incorporation of slack and surplus variables, and the treatment of mixed problems and equalities in the context of SDP. In section 3.7 we discuss simple tricks related to the Schur complement. Then, in section 3.8 we briefly discuss implementations of SDP solvers, and in section 3.9 we outline selected internal solver mechanisms that may impact the performance.

The following overview can be supplemented with numerous other applications and constructions. These include the famous MAX-CUT and MAX-k-SAT relaxations by Goemans and Williamson [123], finding maximum eigenvalues, matrix norms optimizations, and combinatorial optimization problems [8, 9, 30, 121, 130, 140, 215, 240, 323, 325].

3.1. Definition and characterization of positive semidefiniteness

Discussion of SDP requires us to introduce the concept of PD or simply positive matrices, as well as PSD or simply semi-definite matrices. We denote a positive (or semi-positive) matrix M by $M \succ 0$ (or $M \succeq 0$ ). PSD matrices are also referred to as non-negative definite or simply non-negative. Several equivalent definitions or characterizations of such matrices can be found in the literature, and here we present three of them. Thus, a symmetric matrix $M \in \mathbb{R}^{n \times n}$ is considered positive (or non-negative) if all its eigenvalues are positive (or non-negative). Alternatively, M is positive (or non-negative) if and only if for all $x \in \mathbb{R}^n$ with x ≠ 0, it holds that $x^\mathrm{T} M x \gt 0$ (or $x^\mathrm{T} M x \unicode{x2A7E} 0$ ). Similarly, for a Hermitian matrix $M \in \mathbb{C}^{n \times n}$ , it is PD (or PSD) if and only if for all $x \in \mathbb{C}^n$ with x ≠ 0, we have $x^{\dagger} M x \gt 0$ (or $x^{\dagger} M x \unicode{x2A7E} 0$ ). The former definition based on eigenvalues seems to offer a greater intuitive understanding, while the latter is more prevalent in the existing literature on the subject. A more comprehensive exploration of the properties of PD and PSD matrices can be found in [150, 209]. It should be noted that a real PD (PSD) matrix satisfies the conditions for Hermitian matrices, making it a complex PD (PSD) matrix as well. Conversely, for a complex PD (PSD) matrix $M = M^\mathrm{R} + i M^\mathrm{I}$ (where $M^\mathrm{R}$ and $M^\mathrm{I}$ are real symmetric and antisymmetric matrices), we observe that $\forall_{\substack{x \in \mathbb{C}^n \ x \neq 0}} x^{\dagger} \frac{1}{2} (M + M^{\dagger}) x = x^{\dagger} M^\mathrm{R} x \unicode{x2A7E} 0$ , implying that the matrix $M^\mathrm{R}$ is a real PD (PSD) matrix. It is evident that PSD matrices of size n form a convex cone $\mathbb{S}^n_+$ , as indicated in equation (19), and this cone is self-dual.

Now we state two very important properties characterizing PSD matrices by their possible decompositions [209, 316]. It can be shown that for a Hermitian (symmetric) matrix $M \in \mathbb{C}^{n \times n}$ ( $M \in \mathbb{R}^{n \times n}$ ) we have that $M \succeq 0$ is equivalent to each of the following statements:

(i)
There exists $L \in \mathbb{C}^{n \times n}$ ( $L \in \mathbb{R}^{n \times n}$ ) such, that $M = L^{\dagger} L$ ( $M = L^\mathrm{T} L$ ), and L is a lower triangular matrix.
(ii)
There exists a set of vectors $\{v_i\}_{i \in [n]}$ , $v_{i} \in \mathbb{C}^n$ ( $v_{i} \in \mathbb{R}^n$ ), such that $M_{i, j} = v_{i}^{\dagger} \cdot v_{j}$ ( $M_{i, j} = v_{i}^\mathrm{T} \cdot v_{j}$ ).

In the first of these characterizations, a non-unique matrix L is called the Cholesky decomposition of M. The second characterization is equivalent to the existence of a matrix $B \in \mathbb{C}^{n \times n}$ ( $B \in \mathbb{R}^{n \times n}$ ), such that $M = B^{\dagger} B$ ( $M = B^\mathrm{T} B$ ), so, in other words, M is a multiplication table of vectors $\{v_{i}\}_i$ being columns of B. Some authors [324, 326] existence of such matrix B use as the definition of positive semi-definiteness. We say that M is a Gram matrix, or a Gramian. The relation between the existence of a set of vectors and PSD property can be generalized to infinite-dimensional spaces [206]. Trivially, the set of vectors is linearly independent if and only if the determinant of its corresponding Gram matrix is non-zero.

The notion of PD and PSD is also characterized by the Sylvester criteria, formulated as follows. Let M be an m × n matrix, and consider sets $I \subseteq [m]$ and $J \subseteq [n]$ of equal sizes. Let $(M)_{I,J}$ be a submatrix with elements contained in rows from I and columns from J. The determinant of $(M)_{I,J}$ or, in other words, the determinant of a square submatrix obtained by removing some rows and columns of a larger matrix, is called a minor of the matrix. If I = J, then the minor is called principal. If $I = J = [k]$ , for $k \unicode{x2A7D} n,m$ , then the principal minor is called leading. Sylvester's criteria provide necessary and sufficient conditions for positive definiteness and semi-definiteness of a Hermitian matrix [117]. Sylvester's criterion for positive definiteness states that a Hermitian matrix is PD if and only if, all its leading principal minors are positive. Sylvester's criterion for positive semi-definiteness states that a Hermitian matrix is PSD if and only if, all principal minors are non-negative. For example an SDP constraint $\begin{bmatrix} 1 & x \\ x & 1 \end{bmatrix}$ $\succeq 0$ implies by the second Sylvester's criterion that $\left| x \, \right| \unicode{x2A7D} 1$ .

For the Löwner's partial order $\succeq$ it can be easily shown that if $A, B \succeq 0$ , then $A + B \succeq 0$ . If we multiply a PSD matrix by a non-negative constant, we get another PSD matrix. Thus the set of PSD matrices forms a pointed convex cone. It also follows for $A, B \succeq 0$ that ${\textrm{Tr}}(A B) \unicode{x2A7E} 0$ and $A^{\frac{1}{2}}$ exists and is PSD.

3.2. Formulations of semidefinite optimization problems

In the literature there exist a couple of equivalent formulations of SDPs, each has both primal and dual forms. The author prefers the so-called standard or canonical form of SDP given below in (80) and (81) in section 3.2.1, and used in many of the classical textbooks [15, 36, 116], reviews [231, 306], SDP fundamental papers [10, 216, 293, 305, 307] and implementations [214, 292, 309, 310, 312], sometimes with slight changes in labeling [27], different notations for the Frobenius product (3), and more general form of conic formulations [66, 236]. Another important formulation is the one used by Vandenberghe and Boyd [41, 315], which we provide in section 3.2.2. This form seems to be preferred in many quantum information papers [42, 95, 160, 227] with direct influence of [315], which is apparently the default reference to the SDPs. The third important formulation was given by Watrous in his lecture notes [324] and textbook [326], see section 3.2.3 below. It has an elegant symmetric form and also is used in many quantum information books and papers [61, 311], especially involving quantum channels [54, 183, 199, 207, 251].

The paradigmatic part of all the formulations are LMIs, i.e. expressions of the form [40]:

$\begin{align} F\left(x\right) \succeq 0, \text{where} \end{align} \tag{ 78a }$

$\begin{align} F\left(x\right) \equiv F_0 + \sum_{i \in \left[m\right]} x_i F_i, \end{align} \tag{ 78b }$

$x \in \mathbb{R}^m$ is a variable, and F_i , for $i = 0, \ldots, m$ , are symmetric constant matrices $\mathbb{R}^{n \times n}$ . The origin of LMIs is in control theory including solving Lyapunov stability problems, and their interconnection with convex optimization has been noted e.g. by Pyatnitskii and Skorodinskii [40, 262]. In fact, SDPs can be intuitively viewed as optimization problems with linear target functions and LMIs as constraints. Any SDP can be formulated as either a primal or dual problem of the formulations given below. From the form of constraints (78a ) it directly follows that they are convex:

$\begin{align} F \left( \lambda x + \left(1-\lambda\right) x^{\prime} \right) = \lambda F\left(x\right) + \left(1-\lambda\right) F\left(x^{\prime}\right) \succ 0 \end{align} \tag{ 79 }$

for $\lambda \in [0,1]$ . The linear target function is obviously also convex. Thus the SDP problems are convex, so we can use the methods of section 2. We also note that any number of LMIs can be reformulated as a single LMI involving block-diagonal matrices, with each block referring to a relevant LMI. We refer to [68] for an overview of applications of LMIs.

3.2.1. The canonical or standard form.

Again, let $m, n \in \mathbb{N}$ , $m \unicode{x2A7D} \frac{n(n+1)}{2}$ . An SDP problem in a canonical, or standard, primal form is the following optimization task in a variable $X \in \mathbb{S}^n$ :

$\begin{align} \begin{split} \text{minimize } & C \bullet X \\ \text{subject to } & A_i \bullet X = b_i, \text{for } i \in \left[m\right] \\ & X \succeq 0, \end{split} \end{align} \tag{ 80 }$

where $C \in \mathbb{S}^n$ and $A_1, \ldots ,A_m \in \mathbb{S}^n$ are symmetric matrices. The matrices A_i , C, and vector $b \in \mathbb{R}^m$ define the SDP problem. Note that the fact that these matrices are symmetric is not restrictive. For a symmetric matrix X and a matrix C we have $C \bullet X = {\textrm{Tr}} \left( \frac{1}{2}(C + C^\mathrm{T}) X \right)$ , and thus we may always take a symmetric matrix $\frac{1}{2}(C + C^\mathrm{T})$ instead of C. We assume that $A_1, \ldots ,A_m$ are linearly independent (otherwise we can reduce this set). Obviously, LP problem (13) may be written in the form of SDP (80), if X is constrained to be a diagonal matrix, with the diagonal entries used as the x variable. Thus, LP can be considered as a particular case of SDP. The goal expression $C \bullet X = {\textrm{Tr}}(C X)$ , but the former notation is more often used; similarly $A_i \bullet X = {\textrm{Tr}}(A_i X)$ .

A canonical dual SDP problem for (80) is the optimization task in variables $y \in \mathbb{R}^m$ and $Z \in \mathbb{S}^n$ of the following form

$\begin{align} \begin{split} \text{maximize } & b^\mathrm{T} \cdot y \\ \text{subject to } & C - \sum_{i \in \left[m\right]} y_i A_i = Z \\ & Z \succeq 0. \end{split} \end{align} \tag{ 81 }$

Some authors [45, pp 39–40] rewrite the canonical form (80) and (81) with substitutions F_i instead of A_i , −C instead of C, and −λ_i instead of y_i , turning the primal problem to maximization, and the dual problem to minimization.

Similarly as in LP, X is called the primal variable, y the dual variable, Z the dual slack variable, $\{A_i\}$ are linear constraint matrices, b is the RHS of the linear constraint, and C is the linear coefficient. If $X, Z \in \mathbb{R}^{n \times n}$ and $y \in \mathbb{R}^m$ satisfies conditions specified by (80) and (81), then they are called a feasible solution. A feasible variable X is called a primal solution, and feasible variables Z and y constitute the dual solution. An optimal solution is required to be feasible. The values of $C \bullet X$ and $b^\mathrm{T} \cdot y$ are called the values of the primal and dual solutions, or values of the problem, respectively. We have $C \bullet X \unicode{x2A7E} b^\mathrm{T} \cdot y$ . Usually, an SDP solver is expected to find both primal and dual solutions. If either $C = 0 \in \mathbb{R}^{n \times n}$ or b = 0, then such a problem is called feasibility problem and refers to finding whether any solution of given, the primal or dual, problem exists. As the dual form is often delivered from the Lagrange duality 2.6, and y, in that case, plays the role of Lagrange multipliers, this name is also usually attributed to the dual variable y [41].

The fact that primal formulation refers to minimization and dual to maximization problems, is not restrictive. We can always change the sign of the matrix C or the vector b to get the desired optimization problem fitting into the standard form in (80) and (81). What is more, a problem formulated in one of the forms given by (80) and (81) may be reformulated in the other one. The issue of choosing the proper formulation is not always obvious and can have a very significant impact on the difficulty of the problem to a solver [194]. This can be illustrated by the example in table 1 showing the sizes of some SDP problems in dual and primal formulations. Generally, one should choose the formulation which leads to a smaller number of constraints, given by the number m unless some special properties of the structure of the formulation can be used to further simplify the process of solving the problem, see e.g. section 3.9.

3.2.2. The Vandenberghe and Boyd and the SDPA forms.

In the formulation popularized by [41, 315] the primal optimization task is:

$\begin{align} \begin{split} \text{minimize } & c^\mathrm{T} \cdot x \\ \text{subject to } & F\left(x\right) \succeq 0, \\ \end{split} \end{align} \tag{ 82 }$

where F(x) is given by (78b ). As stated in [315] the aim of this formulation is to make the primal formulation 'as explicit as possible'. The dual of (82) is

$\begin{align} \begin{split} \text{maximize } & - {\textrm{Tr}} \left[ F_0 Z \right] \\ \text{subject to } & {\textrm{Tr}} \left[ F_i Z \right] = c_i, \text{for } i \in \left[m\right],\\ & Z \succeq 0, \end{split} \end{align} \tag{ 83 }$

where the variable is a symmetric matrix $Z \in \mathbb{R}^{n \times n}$ .

The form where F₀ takes an opposite sign is often referred to as the SDPA, see section 1.3 for a discussion. Stated explicitly, keeping the original notation and naming of the variables (note using the label Y instead of Z in the dual), the SDPA primal form is [113]:

$\begin{align} \begin{split} \text{minimize } & \sum_{i \in \left[m\right]} c_i x_i, \\ \text{subject to } & X \equiv -F_0 + \sum_{i \in \left[m\right]} F_i x_i \succeq 0, \\ \end{split} \end{align} \tag{ 84 }$

and the SDPA dual form is:

$\begin{align} \begin{split} \text{maximize } & {\textrm{Tr}} \left[ F_0 Y \right] \\ \text{subject to } & {\textrm{Tr}} \left[ F_i Y \right] = c_i, \text{for } i \in \left[m\right],\\ & Y \succeq 0. \end{split} \end{align} \tag{ 85 }$

3.2.3. The Watrous symmetric form.

The third common form of SDPs is given by Watrous [324, 326]. This form is designed to show the symmetry between the primal and dual problems and is particularly convenient for quantum channel analysis. For two complex Euclidean spaces $\mathcal{X}$ and $\mathcal{Y}$ , a semidefinite program in the Watrous form is defined as a triple $(\Phi, A, B)$ , where $\Phi: \mathsf{L}\left[ \mathcal{X}, \mathcal{X} \right] \rightarrow \mathsf{L}\left[ \mathcal{Y}, \mathcal{Y} \right]$ is a Hermitian and trace-preserving map, $A \in \textrm{Herm}{\mathcal{X}}$ , and $B \in \textrm{Herm}(\mathcal{Y})$ . The primal problem in the Watrous form is:

$\begin{align} \begin{split} \text{maximize } & \langle A , X \rangle \\ \text{subject to } & \Phi\left(X\right) = B,\\ & X \succeq 0, \end{split} \end{align} \tag{ 86 }$

and the dual is:

$\begin{align} \begin{split} \text{minimize } & \langle B , Y \rangle \\ \text{subject to } & \Phi^{\dagger}\left(Y\right) \succeq A,\\ & Y \in \textrm{Herm}\left(\mathcal{Y}\right), \end{split} \end{align} \tag{ 87 }$

with a remark that in the original notation of [324, 326] Watrous uses $^*$ instead of $^{\dagger}$ to denote the Hermitian conjugate, $\unicode{x2A7E}$ instead of $\succeq$ to denote the Löwner's partial order, and $X \in \text{Pos}(\mathcal{X})$ instead of $X \succeq 0$ .

3.2.4. The Kronecker-canonical form for convex cones.

Here we briefly show the methodology that offers an alternative way of expressing the canonical formulation of (80) and (81), resembling the LP formulations in (13) and (14). This formalism, in fact, is more encompassing and applicable to a wide range of conic optimization problems, as discussed in section 2.7. The general formulation (70) and (71) is highly convenient and valuable, as it facilitates a seamless transition between primal and dual formulations for any convex cone by establishing the corresponding dual cone.

It can be easily verified that for any real matrices A, B, and C, the relationship $( A \otimes B ) \textrm{vec}(C) = \textrm{vec}{( B C A^\mathrm{T} )}$ holds. We define a matrix $\mathcal{A} \equiv [a_1; \cdots; a_m] \in \mathbb{R}^{n^2 \times m}$ , where $a_i = \textrm{vec}(A_i)$ , referring to the matrices in (80). Hence, a_i represents the ith column of $\mathcal{A}$ . Consequently, we have $\textrm{vec} ( \sum_{i \in [m]} y_i A_i ) = \mathcal{A} y$ and $A_i \bullet X = (\mathcal{A}^\mathrm{T} x)_i$ , where $(\mathcal{A}^\mathrm{T} x)_i$ denotes the ith element of the vector $\mathcal{A}^\mathrm{T} x$ , $c = \textrm{vec}(C)$ , and $x = \textrm{vec}(X)$ . When K represents the self-dual cone of real or convex PSD n by n matrices, substituting these expressions into (70) and (71) yields an alternative and equivalent formulation for the canonical SDP, commonly used, for instance, in [214, 236, 291, 309], as illustrated in (111a ) and (111b ).

3.3. Duality of SDP

Recall that an important property of primal and dual formulations is that any feasible solution to a primal problem provides an upper bound on all feasible solutions to the dual problem. The weak duality property of SDP, viz. $C \bullet X \unicode{x2A7E} b^\mathrm{T} \cdot y$ is derived as follows:

$\begin{align} \begin{aligned} C \bullet X - b^\mathrm{T} \cdot y & = \left( Z + \sum_{i \in \left[m\right]} y_i A_i \right) \bullet X - b^\mathrm{T} \cdot y \\ & = {\textrm{Tr}} \left(ZX\right) + \sum_{i \in \left[m\right]} y_i \cdot {\textrm{Tr}} \left(A_i X\right) - b^\mathrm{T} \cdot y = {\textrm{Tr}}\left(X Z\right) \unicode{x2A7E} 0. \end{aligned} \end{align} \tag{ 88 }$

In LP, the value of the primal and dual problems are always equal meaning the strong duality. Now, we provide a sufficient condition for strong duality to occur in SDP, as it is observed in many cases. Let $p^{*}$ be the optimal value of the primal SDP problem, and $d^{*}$ be the optimal value of the dual SDP problem. One can show that it holds $p^{*} = d^{*}$ if at least one of the conditions is satisfied [9, 140]:

(i)
There exist $y \in \mathbb{R}^m$ , such that $C - \sum_{i \in [m]} y_i A_i \succ 0$ , i.e. the dual problem is strictly feasible (then also the value $d^{*}$ is attained).
(ii)
There exists $X \succ 0$ , such that $A_i \bullet X = b_i$ , i.e. the primal problem is strictly feasible (then also the value $p^{*}$ is attained).

These statements are called the Slater conditions [285].

One of the most confusing and intriguing questions in the theory of SDP is asking whether the dual of the dual form is the primal form. The affirmative answer can be derived in the following way:

$\begin{align} \begin{aligned} \sup_y \left\{b^\mathrm{T} \cdot y : C - \sum_i \left(A_i y_i\right) \succeq 0 \right\} & = \sup_y \inf_{X \succeq 0} \left\{b^\mathrm{T} \cdot y + \left( C - \sum_i \left(A_i y_i\right) \right) \bullet X \right\} \\ & \unicode{x2A7D} \inf_{X \succeq 0} \sup_y \left\{ C \bullet X + \sum_i y_i \cdot \left(b_i - A_i \bullet X \right) \right\} = \inf_{\substack{X \succeq 0, \\ A_i \bullet X = b_i}} \left\{ C \bullet X \right\}. \end{aligned} \end{align} \tag{ 89 }$

Indeed:

$\begin{align} \inf_{X \succeq 0} \left\{b^\mathrm{T} \cdot y + X \bullet Z \right\} & = \begin{cases} b^\mathrm{T} \cdot y & \text{if } Z \succeq 0 \\ -\infty & \text{otherwise} \end{cases}, \text{ and} \end{align} \tag{ 90a }$

$\begin{align} \sup_{y} \left\{ C \bullet X + \sum_i y_i \cdot \left(b_i - A_i \bullet X \right) \right\} & = \begin{cases} C \bullet X & \text{if } {\Large{\forall}}_{i} A_i \bullet X = b_i \\ +\infty & \text{otherwise} \end{cases}. \end{align} \tag{ 90b }$

3.4. Affine bounds from dual problems

We will now show how by solving a dual problem one can get a linear bound on the solution of a parameterized family of primal problems. For the sake of illustration, we will use the SDPA form (84) and (85) with additional linear variables (see section 3.6 for further discussion), but similar reasoning can be applied to pairs of primal and dual problems of any other form.

Suppose that for given $k, m, n \in \mathbb{N}_{+}$ , and $c \in \mathbb{R}^m$ , $\{F_i\}_{i = 0}^m \subset \mathbb{S}^n$ and $\{q_j\}_{j \in [k]} \subset \mathbb{R}^m$ we want to find a lower bound on a function $S: \mathbb{R}^k \rightarrow \mathbb{R}$ defined as:

$\begin{align} S\left(v\right) \equiv \inf_{x \in \mathbb{R}^m} \left\{\sum_{i \in \left[m\right]} c_i x_i : {\Large{\forall}}_{j \in \left[k\right]} q_j \cdot x - v_j = 0; -F_0 + \sum_{i \in \left[m\right]} F_i x_i \succeq 0 \right\}. \end{align} \tag{ 91 }$

For fixed value of $v \in \mathbb{R}^k$ the value of S(v) is given by the solution of the following SDP, see (84):

$\begin{align} \begin{split} \text{minimize } & \sum_{i \in \left[m\right]} c_i x_i, \\ \text{subject to } & X \equiv -F_0 + \sum_{i \in \left[m\right]} F_i x_i \succeq 0, \\ & q_j \cdot x - v_j = 0 \text{ for } j \in \left[k\right]. \end{split} \end{align} \tag{ 92 }$

Following the Lagrangian dualization scheme from section 2.6, we get

$\begin{align} \begin{aligned} S\left(v\right) & = \inf_{x \in \mathbb{R}^m} \sup_{\substack{\beta \in \mathbb{R}^k \\ Y \in \mathbb{S}^n_+}} \left\{\sum_{i \in \left[m\right]} c_i x_i - \left( \sum_{j \in \left[k\right]} \beta_k \cdot \left(q_j \cdot x - v_j\right) \right) - \left( -F_0 + \sum_{i \in \left[m\right]} F_i x_i \right) \bullet Y \right\} \\ & \unicode{x2A7E} \sup_{\substack{\beta \in \mathbb{R}^k \\ Y \in \mathbb{S}^n_+}} \inf_{x \in \mathbb{R}^m} \left\{ F_0 \bullet Y + \sum_{j \in \left[k\right]} \beta_j v_j + \sum_{i \in \left[m\right]} x_i \cdot \left( c_i - \sum_{j \in \left[k\right]} \beta_j q_{j i} - F_i \bullet Y \right) \right\} \\ & = \sup_{\substack{\beta \in \mathbb{R}^k \\ Y \in \mathbb{S}^n_+}} \left\{ F_0 \bullet Y + \sum_{j \in \left[k\right]} \beta_j v_j : {\Large{\forall}}_{i \in \left[m\right]} c_i - \sum_{j \in \left[k\right]} \beta_j q_{j i} - F_i \bullet Y = 0\right\} = \sup_{\left(\beta,Y\right) \in \mathcal{F}} \tilde{S}_{\beta,Y}\left(v\right), \end{aligned} \end{align} \tag{ 93 }$

where $\mathcal{F} \equiv \left\{(\beta, Y) \in \mathbb{R}^k \otimes \mathbb{S}^n_+ : {\Large{\forall}}_{i \in [m]} \sum_{j \in [k]} \beta_j q_{j i} + F_i \bullet Y = c_i \right\}$ is the feasible set of the problem, and $\tilde{S}_{\beta,Y}(v) \equiv F_0 \bullet Y + \sum_{j \in [k]} \beta_j v_j$ . Thus, the dual of (92) is:

$\begin{align} \begin{split} \text{maximize } & {\textrm{Tr}} \left[ F_0 Y \right] + \sum_{j \in \left[k\right]} \beta_j v_j \\ \text{subject to } & {\textrm{Tr}} \left[ F_i Y \right] + \sum_{j \in \left[k\right]} \beta_j q_{j i} = c_i, \text{for } i \in \left[m\right],\\ & Y \succeq 0. \end{split} \end{align} \tag{ 94 }$

In consequence, by taking any feasible pair $(\beta, Y)$ one obtains an affine lower bound $S(v) \unicode{x2A7E} \tilde{S}_{\beta, Y}(v)$ . From the strong duality, it follows that the bound is tight for $(\beta, Y)$ being the optimal solution of (94).

3.5. Complex variables in semidefinite problems

In section 3.2.1 we considered SDPs where the problems were defined by real matrices and vectors, and the optimization was carried over real-valued variables. One of the first works showing how to reduce a problem where some of the elements of the problem involve complex numbers was [122].

Let $B \in \mathbb{C}^{n \times n}$ be a Hermitian matrix, $B^\mathrm{R}$ and $B^\mathrm{I}$ its real and imaginary parts, respectively, i.e. $B = B^\mathrm{R} + i B^\mathrm{I}$ and $B^\mathrm{I} = -B^\mathrm{I}$ . Then $B \succeq 0$ if and only if

$\begin{align} \begin{bmatrix} B^\mathrm{R} & -B^\mathrm{I} \\ B^\mathrm{I} & B^\mathrm{R} \end{bmatrix} \succeq 0. \end{align} \tag{ 95 }$

Indeed, for any complex vector $w = u + i v \in \mathbb{C}^n$ we have

$\begin{align} \begin{aligned} w^{\dagger} B w & = \left( u^\mathrm{T} - i v^\mathrm{T} \right) \left( B^\mathrm{R} + i B^\mathrm{I} \right) \left(u + i v\right) = \left( u^\mathrm{T} B^\mathrm{R} u + v^\mathrm{T} B^\mathrm{R} v - u^\mathrm{T} B^\mathrm{I} v + v^\mathrm{T} B^\mathrm{I} u \right) \\ & \quad + i \left(u^\mathrm{T} B^\mathrm{R} v - v^\mathrm{T} B^\mathrm{R} u + u^\mathrm{T} B^\mathrm{I} u + v^\mathrm{T} B^\mathrm{I} v\right) \unicode{x2A7E} 0 \end{aligned} \end{align} \tag{ 96 }$

if and only if $\begin{bmatrix} u^\mathrm{T} & v^\mathrm{T} \end{bmatrix} \begin{bmatrix} B^\mathrm{R} & -B^\mathrm{I} \\ B^\mathrm{I} & B^\mathrm{R} \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix}$ $\succeq 0$ . This is because $u^\mathrm{T} B^\mathrm{I} u = v^\mathrm{T} B^\mathrm{I} v = 0$ and $u^\mathrm{T} B^\mathrm{R} v = v^\mathrm{T} B^\mathrm{R} u$ . Thus any SDP problem defined in terms of complex vectors and Hermitian matrices can be stated as a problem involving only real vectors with symmetric matrices. For instance, let us consider the case when both the linear coefficient C and the primal variable X are complex matrices, $C = C^\mathrm{R} + i C^\mathrm{I}$ , with $C^\mathrm{R}, C^\mathrm{I} \in \mathbb{R}^{n \times n}$ and $X \in \mathbb{C}^{n \times n}$ . When we reframe this as a real-valued SDP, then the target function takes the form of $\begin{bmatrix} C^\mathrm{R} & -C^\mathrm{I} \\ C^\mathrm{I} & C^\mathrm{R} \end{bmatrix}$ , see (95), and the primal real-valued variable $X^\mathrm{(R)} = \begin{bmatrix} X_{11} & X_{12} \\ X_{21} & X_{22} \end{bmatrix}$ does not require explicit constraints. Instead, the resulting complex variable X is retrieved as $X \equiv 2 X_{11} + i \left( X_{12} - X_{12}^\mathrm{T} \right)$ .

We now briefly discuss the formulation of complex SDP problems for which the target is given by real linear coefficient C. Consider $C \in \mathbb{S}^n$ , and $X \in \mathbb{H}^n$ . Let $X = X^\mathrm{R} + i X^\mathrm{I}$ , where $X^\mathrm{R} \in \mathbb{S}^n$ and $X^\mathrm{I} \in \mathbb{R}^{n \times n}$ . We have:

$\begin{align} {\textrm{Tr}} \left(C X\right) = {\textrm{Tr}} \left(C X^\mathrm{R}\right) + i {\textrm{Tr}}\left(C X^\mathrm{I}\right) = {\textrm{Tr}} \left(C X^\mathrm{R}\right). \end{align} \tag{ 97 }$

Since $X^\mathrm{I}$ is antisymmetric, the Frobenius product of symmetric and antisymmetric matrix is always equal to 0. Thus if C is real and we are interested only in finding the value of the solution, then we can ignore the imaginary part occurring in the problem.

3.6. Slack and surplus variables, mixed problems and equalities

Slackness and complementary slackness are both concepts used in optimization theory, particularly in convex optimization. Slackness refers to the idea that in an optimal solution, some of the inequality constraints are satisfied with equality, i.e. there is no slack or excess capacity in the system. The extent to which they diverge from the equality can be expressed as a new PSD variable, which then can be introduced to convert inequality constraints to equality constraints, as elucidated below. The concept of slack variables is commonly used in LP and SDP. Recall that in LP and SDP formulations, the objective function is optimized subject to a set of constraints, where the constraints can be in the form of equalities or inequalities. In the case of linear constraints of the form $A x \unicode{x2A7D} b$ , where A is an m × n matrix and b is a column vector of length m, introducing a slack variable $\tilde{x}_i$ for each constraint i allows us to convert the inequality constraint into an equality constraint. The idea is to add a non-negative variable $\tilde{x}_i$ to the left-hand side of the ith constraint so that the resulting expression becomes equality. Specifically, if the ith constraint is:

$\begin{align} a_{i1} x_1 + \cdots a_{in} x_n \unicode{x2A7D} b_i \end{align} \tag{ 98 }$

then we can add a slack variable $\tilde{x}_i \unicode{x2A7E} 0$ to obtain an expression that is equivalent to the previous constraint:

$\begin{align} a_{i1} x_1 + \cdots a_{in} x_n + \tilde{x}_i = b_i. \end{align} \tag{ 99 }$

The new variable $\tilde{x}_i$ is called a slack variable because it measures the amount by which the left-hand side of the ith constraint falls short of the RHS b_i . If the left-hand side is already equal to b_i , then $\tilde{x}_i$ is zero.

On the other hand, if we have inequality constraints of the form $A x \unicode{x2A7E} b$ , then we introduce a surplus variable $\tilde{x}_i$ that measures the amount by which the left-hand side of the ith constraint exceeds the RHS b_i . Specifically, we add a non-negative variable $\tilde{x}_i$ to the left-hand side of the ith constraint, so that the resulting expression becomes an equality:

$\begin{align} a_{i1} x_1 + \cdots a_{in} x_n - \tilde{x}_i = b_i. \end{align} \tag{ 100 }$

Therefore, we can see that the use of slack and surplus variables allows us to convert any inequality constraint into an equality constraint, which makes it easier to fit into the canonical form.

On the contrary, complementary slackness, refers to the idea that for an optimal solution, the primal variables and the corresponding dual variables are complementary, in the sense that their product is zero. In other words, slackness is a condition that holds between the primal variables (e.g. the decision variables in an LP) and the primal constraints, while complementary slackness is a condition that holds between the primal solution and the dual solution in a convex optimization problem. For instance, when the strong duality of SDP holds, we have $p^{*} = d^{*}$ , i.e. $C \bullet X = b^\mathrm{T} \cdot y = A_i \bullet X \cdot y$ . Thus $0 = ( C - \sum_{i \in [m]} y_i A_i) \bullet X = {\textrm{Tr}} \left[ Z X \right]$ . This means that if the solutions of primal and dual SDP problems exist, then strong duality is equivalent to complementary slackness. Summing up, whereas slackness conditions tell us which constraints are active in the optimal solution (i.e. satisfied with equality), the complementary slackness conditions tell us which constraints are binding (i.e. have nonzero dual variables) and which are nonbinding (i.e. have zero dual variables). Both conditions are important for understanding and characterizing optimal solutions in optimization problems.

One often considers the so-called mixed LP-SDP cone. The primal mixed problem in variables $(x_\mathrm{L},X_\mathrm{S}) \in \mathbb{R}^{n_\mathrm{L}} \times \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ is of the following form:

$\begin{align} \begin{split} \text{minimize } & c_\mathrm{L}^\mathrm{T} \cdot x_\mathrm{L} + C_\mathrm{S} \bullet X_\mathrm{S} \\ \text{subject to } & \left(A_\mathrm{L}^\mathrm{T}\right)_{i,:} \cdot x_\mathrm{L} + {A_\mathrm{S}}_i \bullet X_\mathrm{S} = b_i, \text{for } i \in \left[m\right] \\ & x_\mathrm{L} \unicode{x2A7E} 0, X_\mathrm{S} \succeq 0 \end{split} \end{align} \tag{ 101 }$

where $A_\mathrm{L} \in \mathbb{R}^{n_\mathrm{L} \times m}$ , $b \in \mathbb{R}^m$ , $x_\mathrm{L}, c_\mathrm{L} \in \mathbb{R}^{n_\mathrm{L}}$ , $C_\mathrm{S} \in \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ and ${A_\mathrm{S}}_1, \ldots ,{A_\mathrm{S}}_m \in \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ . The dual mixed problem in variables $(y,z_\mathrm{L},Z_\mathrm{S}) \in \mathbb{R}^m \times \mathbb{R}^{n_\mathrm{L}} \times \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ is given by the formula:

$\begin{align} \begin{split} \text{maximize } & b^\mathrm{T} \cdot y \\ \text{subject to } & c_\mathrm{L} - A_\mathrm{L} y = z_\mathrm{L}, C_\mathrm{S} - \sum_{i \in \left[m\right]} y_i {A_\mathrm{S}}_i = Z_\mathrm{S}, \\ & z_\mathrm{L} \unicode{x2A7E} 0, Z_\mathrm{S} \succeq 0. \end{split} \end{align} \tag{ 102 }$

We note that as any LP can be reformulated as SDP, the mixed problems are not more general than the SDP problems. One can see that to embed an LP in SDP it is sufficient to place the $n_\mathrm{L}$ linear variables on the diagonal of a new SDP variable of size $n_\mathrm{L} + n_\mathrm{S}$ , where $n_\mathrm{S}$ is the size of the SDP original variable. On the other hand, mixed problems are useful for efficient solver implementations, as the numerical methods needed to solve SDP are more expensive in terms of computational effort than LP. Thus, when stating the problem in the mixed form, one may obtain a reduction of the computational cost of the solver.

We now discuss, how equality constraints are expressed in the canonical form of SDP. Recall that in the primal canonical SDP (80), equalities have the form $A_i \bullet X = b_i$ for $i \in [m]$ , where m is one of the two parameters describing the size of the problem. Thus, to add a linear constraint on variables within X we add a new matrix A_i increasing size of the problem to m + 1. On the other hand, if we add a linear constraint in the dual form (81), we can do one of the following. The first possibility is to eliminate one of the variables y_i , e.g. with lower–upper decomposition (LU) or QR decomposition, and thus reduce size of the problem to m − 1. This simplifies the SDP but requires an additional effort of variable elimination, which itself requires some computational resources and is not always implemented. For instance, the elimination is performed in YALMIP when the user passes an option 'removeequalities' to the model, as discussed in appendix B.1. The second possibility is to reframe the optimization problem over $(x_\mathrm{F},X_\mathrm{S}) \in \mathbb{R}^{n_\mathrm{F}} \times \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ for some $n_\mathrm{F}, n_\mathrm{S} \in \mathbb{N}$ in a form different than the canonical form, see (101), viz.

$\begin{align} \begin{split} \textrm{minimize } & c_\mathrm{F}^\mathrm{T} \cdot x_\mathrm{F} + C_\mathrm{S} \bullet X_\mathrm{S} \\ \text{subject to } & \left(A_\mathrm{F}^\mathrm{T}\right)_{i,:} \cdot x_\mathrm{F} + {A_\mathrm{S}}_i \bullet X_\mathrm{S} = b_i, \text{for } i \in \left[m\right] \\ & X_\mathrm{S} \succeq 0. \end{split} \end{align} \tag{ 103 }$

where $A_\mathrm{F} \in \mathbb{R}^{n_\mathrm{F} \times m}$ , $b \in \mathbb{R}^m$ , $x_\mathrm{F}, c_\mathrm{F} \in \mathbb{R}^{n_\mathrm{F}}$ , $C_\mathrm{S} \in \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ and ${A_\mathrm{S}}_1, \ldots ,{A_\mathrm{S}}_m \in \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ , and there is no constraint on $x_\mathrm{F}$ . The dual problem of (103) in variables $(y,z_\mathrm{F},Z_\mathrm{S}) \in \mathbb{R}^m \times \{0\}^{n_\mathrm{F}} \times \mathbb{S}^{n_\mathrm{S} \times n_\mathrm{S}}$ is of the following form:

$\begin{align} \begin{split} \text{maximize } & b^\mathrm{T} \cdot y \\ \text{subject to } & c_\mathrm{F} - A_\mathrm{F} y = z_\mathrm{F}, \\ & C_\mathrm{S} - \sum_{i \in \left[m\right]} y_i {A_\mathrm{S}}_i = Z_\mathrm{S}, Z_\mathrm{S} \succeq 0. \\ \end{split} \end{align} \tag{ 104 }$

The forms (103) and (104) are sometimes called the standard form with free variables [135]. Note that the trivial cone $\{0\}^{n_\mathrm{F}}$ appearing in (104) is the dual cone of $\mathbb{R}^{n_\mathrm{F}}$ appearing in (103). Since this implies $z_\mathrm{F} = 0$ , the first condition in (104) is equivalent to $A_\mathrm{F} y = c_\mathrm{F}$ , providing a way to express equality constraints on the dual variable y. This possibility of treating the equality constraint requires the solver to be able to deal with free variables in the primal problem. This is beyond the capabilities of the usual IPMs as discussed in section 3.8, see also [208]. The standard way of dealing with free variables, implemented in SeDuMi and SDPT3, is to represent a vector $x_\mathrm{F} \in \mathbb{R}^{n_\mathrm{F}}$ as a difference of two vectors $x_{(+)}, x_{(-)} \in \mathbb{R}^{n_\mathrm{F}}_+$ as $x_\mathrm{F} = x_{(+)} - x_{(-)}$ . This guides us to the third possibility of representing equality constrain $(A_\mathrm{F})_{i,:} \cdot y = {c_\mathrm{F}}_i$ , as two inequalities:

$\begin{align} \begin{aligned} & \left(A_\mathrm{F}\right)_{i,:} \cdot y \unicode{x2A7E} {c_\mathrm{F}}_i - \epsilon, \\ & \left(A_\mathrm{F}\right)_{i,:} \cdot y \unicode{x2A7D} {c_\mathrm{F}}_i + \epsilon, \end{aligned} \end{align} \tag{ 105 }$

for some small ε. In consequence, equalities even in the dual form are increasing the complexity. For instance, YALMIP allows the user to specify, how to treat the explicitly stated equality constraints with the mentioned option 'removeequalities', or, in case the chosen solver requires it, YALMIP does it automatically with the function solveequalities. A detailed discussion of conversion the problems (103) and (104) to the canonical form (80) and (81) is provided in [175].

3.7. Schur complement and submatrices

Consider a partition of a square matrix $M \in \mathbb{C}^{(n_1 + n_2) \times (n_1 + n_2)}$ to submatrices, viz. $M = \begin{bmatrix} A & B \\ C & D \end{bmatrix}$ , where $A \in \mathbb{C}^{n_1 \times n_1}$ and $D \in \mathbb{C}^{n_2 \times n_2}$ are square matrices. Recall that the principal submatrix of a square matrix is the special case of a submatrix where the same rows and columns are removed. Thus A and D are principal submatrices. It is easy to see that a principal submatrix of a PSD matrix is also PSD. For instance consider an arbitrary vector $v \in \mathbb{C}^{n_1 + n_2}$ with non-zero entries only in the first n₁ positions, and a vector $v^{\prime} \in \mathbb{C}^{n_1}$ with entries equal to the first n₁ entries of v. Obviously, since $v^{\dagger} M v \unicode{x2A7E} 0$ it also holds that $v^{^{\prime}\dagger} A v^{\prime} \unicode{x2A7E} 0$ .

Assume the submatrix D to be invertible. For numerical implementations, it would also be desirable for D to be well-conditioned to give accurate results under finite-precision arithmetic upon inverting it. The Schur complement of block D is given by $A - B D^{-1} C$ and denoted $M / D$ . Similarly, $M / A \equiv D - C A^{-1} B$ [41, 115, 158, 281]. We have $M \succeq 0 \implies M / D \succeq 0$ ; and for symmetric M (i.e. $B = C^\mathrm{T}$ ):

$\begin{align} M \succ 0 \iff A, M / A \succ 0, \text{ and} \end{align} \tag{ 106a }$

$\begin{align} A \succ 0 \implies \left[ M \succeq 0 \iff M / A \succeq 0 \right]. \end{align} \tag{ 106b }$

Obviously, the same holds for D. The Schur complement method [343] allows reducing a large system of equations to a smaller one, involving only a subset of variables, e.g. $M \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$ , by solving two simpler equations, namely $(A - B D^{-1} C) x_1 = y_1 - B D^{-1} y_2$ , and then $C x_1 + D x_2 = y_2$ . The reason why this method is useful is that one usually needs $O(n^3)$ operations to solve linear equations with n variables, and thus it is profitable to decompose the initial equation into two smaller equations.

Schur complement is also a tool for introducing the following useful trick. Consider an LMI:

$\begin{align} \begin{bmatrix} t & c\left(x\right) \\ c^\mathrm{T}\left(x\right) & F\left(x\right) \end{bmatrix} \succeq 0, \end{align} \tag{ 107 }$

where $x \in \mathbb{R}^k$ for some k, $F(x) \in \mathbb{R}^{m \times m}$ is a linear matrix expression of the form (78b ) for some m, $c: \mathbb{R}^k \rightarrow \mathbb{R}^m$ is a linear function, and t is a positive number. From (106b ) it follows

$\begin{align} t \unicode{x2A7E} c\left(x\right) \cdot F^{-1}\left(x\right) \cdot c^\mathrm{T}\left(x\right). \end{align} \tag{ 108 }$

The expression (108) allows for expression fairy generic non-linear constraints as LMIs, e.g. taking k = 2, m = 1 with $c(x) = x_1$ and $F(x) = x_2$ we get the constraint $t \unicode{x2A7E} \frac{x_1^2}{x_2}$ . Similarly, from Schur lemma it follows that for $A, B, R \in \mathbb{H}^n$ :

$\begin{align} \begin{bmatrix} A & R \\ R & B \end{bmatrix} \succeq 0 \Leftrightarrow B \succeq R A^{-1} R \Leftrightarrow \left[A^{-1/2} B A^{-1/2}\right]^{1/2} \succeq A^{-1/2} R A^{-1/2} \Leftrightarrow A \# B - R \succeq 0, \end{align} \tag{ 109 }$

where the $1/2$ -weighted matrix geometric mean is given by (31).

3.8. How does a solver use IPMs?

Even though the topic of implementation of IPM is not specific to quantum information, from our experience, it is useful to have at least a general understanding of how actually a solver is deriving its results. This helps to identify potential problems, estimate the difficulties, and interpret the outputs of the solver. An important concept in path-following IPMs is the central path, which consists of a set of feasible solutions $(X,y,Z)$ parameterized by a non-negative variable ν [124, 178, 179, 218, 219, 256, 345]. The central path is defined by the following conditions:

$\begin{align} & X, Z \succeq 0 \quad \text{(PSD matrices)}, \end{align} \tag{ 110a }$

$\begin{align} & A_i \bullet X = b_i \text{ for } i \in \left[m\right] \quad \text{(linear constraints)}, \end{align} \tag{ 110b }$

$\begin{align} & C - \sum_{i \in \left[m\right]} y_i A_i = Z \quad \text{(dual feasibility)}, \end{align} \tag{ 110c }$

$\begin{align} & X Z = \nu {\unicode{x1D7D9}} \quad \text{(approx. complementary slackness)}, \end{align} \tag{ 110d }$

where $\{A_i\}$ , b_i , and C are problem-specific matrices for the canonical formulation described in section 3.2.1. The central path captures a family of solutions that gradually approaches the optimal solution as ν increases. By following this path, usually employing the concept of the Newton steps, IPMs efficiently navigate the feasible region of the SDP problem toward the optimal solution. Equivalently (110) can be written in Kronecker-canonical form, see section 3.2.4:

$\begin{align} \begin{aligned} \mathcal{A}^\mathrm{T} x & = b \\ \end{aligned} \end{align} \tag{ 111a }$

$\begin{align} \begin{aligned} c - \mathcal{A} y & = z, \\ \end{aligned} \end{align} \tag{ 111b }$

$\begin{align} \begin{aligned} X Z & = \nu {\unicode{x1D7D9}}, \end{aligned} \end{align} \tag{ 111c }$

together with $X, Z \succeq 0$ . Further in the text, the expression ${\textrm{Tr}}(Z X)$ is referred to as the gap. Note that the strong duality of an SDP problem implies the complementary slackness, see section 3.6, stating that the optimal primal and dual variables are orthogonal, i.e.

$\begin{align} {\textrm{Tr}}\left(Z X\right) = 0, \end{align} \tag{ 112 }$

meaning that the gap is equal to 0. On the other hand, it should be stressed that the gap (112) and the duality gap are closely related but different terms. The former is defined in terms of the primal and dual variables X and Z, even if they do not provide a feasible solution, i.e. even if they do not satisfy the conditions in (80) and (81). The latter is defined as (49) and expresses the difference between the optimal primal and dual solutions of the problem. We note here that practical implementations of SDP solvers usually find solutions that are not feasible in a strict sense. Instead, the solutions satisfy the condition from (80) and (81) only with some accuracy. Here we discuss the expressions we use further in this work to evaluate primal and dual infeasibility. See [214] for more details on the issue of infeasibility norms. Let $c \equiv \textrm{vec}(C)$ , $x \equiv \textrm{vec}(X)$ and $z \equiv \textrm{vec}(Z)$ , as in the Kronecker-canonical form, see section 3.2.4. Let us define the following terms, viz. the residuals for feasibility conditions in (80) and (81), see (111a ) and (111b ):

$\begin{align} r_p \equiv b - \mathcal{A}^\mathrm{T} x \in \mathbb{R}^m, \end{align} \tag{ 113a }$

$\begin{align} r_d \equiv c - \mathcal{A} y - z \in \mathbb{R}^{n^2}. \end{align} \tag{ 113b }$

Using the above conditions (111) we get that the Newton step $(\Delta X, \Delta y, \Delta Z)$ is supposed to satisfy the following formulae:

$\begin{align} \begin{aligned} \mathcal{A}^\mathrm{T} \Delta x = r_p, \\ \mathcal{A} \Delta y + \Delta z = r_d. \end{aligned} \end{align} \tag{ 114 }$

If the method assumes that both r_p and r_d are zero, $r_p = r_d = 0$ , it is referred to as a feasible IPM. Otherwise, it is considered an infeasible IPM. The conditions (114) are supposed to iteratively bring the variables $(X,y,Z)$ closer to the feasibility constraints, (111a ) and (111b ). On the other hand, note that those two equations do not determine fully the solution; yet we still need to consider (111c ) somehow. At the same time, from the strong duality, we get that the optimal solution, $(X^{*}, y^{*}, Z^{*})$ has the property that it is on the central path at the point with ν = 0, i.e. the gap is 0, see section 3.3. Thus, the Newton step should not only ensure feasibility but also reduce the gap between primal and dual solutions. Actually, the condition imposed on the Newton step from the condition (111c ) is most problematic for another reason. The problem arises from the fact that the matrices $X^{(i)}$ and $Z^{(i)}$ , where i stands for the current iteration, possibly do not commute. For this reason, the following form of conditions on the target of the Newton step is imposed:

$\begin{align} \Theta_{\nu}\left(X, Z\right) = \nu {\unicode{x1D7D9}} \in \mathbb{R}^{n \times n}, \end{align} \tag{ 115 }$

where $\Theta_{\nu}(X, Z)$ is a symmetrization of $X Z$ ; we discuss a couple of symmetrizations below. The algorithm stops when residual norms and the gap $\left| {\textrm{Tr}}(X Z) \, \right|$ , are all less than the specified threshold. Examples of the primal and dual residual norms used e.g. in [211] are $\frac{1}{1 + |b|_\mathrm{F}} \left| b - \mathcal{A}^\mathrm{T} x \, \right|_\mathrm{F}$ , and $\frac{1}{1 + |c|_\mathrm{F}} \left| c - \mathcal{A} y - z \, \right|_\mathrm{F}$ , respectively, where we recognize normalized norms of the expressions (113a ) and (113b ).

The problem of efficiently solving the Newton system (111) numerically is discussed in detail in chapter 5 of [211]. It is worth noting that by employing the Schur complement method, see section 3.7, we can reduce the system of equations to a smaller size. Solving a linear system of k equations typically requires $O(k^3)$ floating point operations (FlOps). In the initial system, we have $k = 2n^2 + m$ equations, where n > m. Consequently, obtaining the solution requires $O(n^6)$ FlOps. However, the Schur complement equation has a size of m and only requires $O(m^3)$ FlOps.

Now let us examine the search directions discussed earlier. Consider the condition stated in (111c ), namely $X Z = \nu {\unicode{x1D7D9}}$ . When we take a step, we have the equation $(X + \Delta X)(Z + \Delta Z) = \nu {\unicode{x1D7D9}}$ , or, neglecting the second-order term $\Delta X \Delta Z$ , we have $\Delta X Z + X \Delta Z = \nu {\unicode{x1D7D9}} - X Z$ . It is required that the steps ΔX and ΔZ be symmetric. The second equation in (114) reveals that ΔZ is always symmetric, given the numerical precision. However, the same cannot be said for ΔX, which may not be symmetric. Consequently, there is a necessity to symmetrize (111c ). For instance, in 1998 the following natural symmetrization of (111c ), called AHO, was introduced by Alizadeh et al [10]:

$\begin{align} \Theta_{\nu}^\mathrm{AHO}\left(X, Z\right) \equiv \left( \frac{1}{2} \left(X Z + Z X\right) = \nu {\unicode{x1D7D9}} \right). \end{align} \tag{ 116 }$

While the search direction defined by this symmetrization holds historical significance, it is no longer widely utilized by the majority of SDP solvers. For instance, recent implementations of SDPT3 [310] have omitted this particular search direction. Another search direction, called HKM, has been introduced independently by Helmberg et al [141], Kojima et al [180] and Monteiro [216], and is used by many modern solvers. The HKM has the following primal form:

$\begin{align} \Theta_{\nu}^\mathrm{HKM,primal}\left(X, Z\right) \equiv \left( Z^{\frac{1}{2}} X Z^{\frac{1}{2}} = \nu {\unicode{x1D7D9}} \right), \end{align} \tag{ 117 }$

and the dual form $\Theta_{\nu}^\mathrm{HKM,dual}(X, Z) \equiv ( X^{\frac{1}{2}} Z X^{\frac{1}{2}} = \nu {\unicode{x1D7D9}} )$ . A third important search direction is the Nesterov and Todd [235, 236], or NT, direction. Its definition is more involved than in the cases of AHO and HKM. To define it, we consider a matrix W satisfying $W^{-1} X W^{-1} = Z$ . With the aid of the matrix W, the symmetrization of (111c ) can be formulated as $\frac{1}{2} ( W^{-\frac{1}{2}} X Z + Z X W^{-\frac{1}{2}} ) = \nu W^{-1}$ .

The work of Monteiro and Zhang (MZ) from 1998 [217, 220, 344], introduced the following family of search directions, which includes the three mentioned directions, viz. AHO, HKM, and NT. The linear transformation of MZ is given by the following formula

$\begin{align} H_\mathrm{P}\left(M\right) \equiv \frac{1}{2} \left( P M P^{-1} + P^{-T} M^\mathrm{T} P^\mathrm{T} \right), \end{align} \tag{ 118 }$

with P being an invertible matrix. Next, we define $\Theta_{\nu}(X,Z) \equiv \left( H_\mathrm{P}(X Z) = \nu {\unicode{x1D7D9}} \right)$ . By selecting different invertible matrices P, we can observe that each of the search directions discussed can be obtained. Specifically, when $P = {\unicode{x1D7D9}}$ , we obtain the AHO search direction. On the other hand, choosing $P = Z^{\frac{1}{2}}$ and $P = X^{-\frac{1}{2}}$ leads to the primal and dual HKM search directions, respectively. Furthermore, if we consider an invertible matrix P satisfying the condition $P^\mathrm{T} P = W^{-1}$ , we obtain the NT search direction.

Once the search direction has been determined, the next step in an IPM is to select the appropriate step length in that direction. This step length is crucial for ensuring the feasibility of the iterates. Specifically, the IPM chooses a step-length, represented by a pair of constants α and β in the range $(0,1]$ , such that the following conditions are satisfied:

$\begin{align} X + \alpha \Delta X \succeq 0, \end{align} \tag{ 119a }$

$\begin{align} Z + \beta \Delta Z \succeq 0. \end{align} \tag{ 119b }$

It is important to note that while Newton's method is used to determine the direction, the iterative solver does not necessarily take the full Newton step. To provide further clarity, it is worth mentioning the following points. According to [308], the value of t that solves the optimization problem defined by M and ΔM can be obtained in the following way. The objective is to maximize t subject to the constraint $M + t \Delta M \succeq 0$ . The formula to compute this step length is given by $\max \left( \text{eig}(C^\mathrm{-T} \Delta M C^{-1}) \right)$ , where C represents the Cholesky decomposition of M. By using this formula, we can determine the appropriate step lengths α and β for the IPM. It is important to note that convergence proofs of IPM algorithms often necessitate that at each step, the current solution is in close proximity (in some defined sense) to the central path. This requirement imposes additional constraints on the step length. However, for the sake of efficiency, these constraints may be relaxed, at the cost of losing the guarantee of convergence. Furthermore, it is worth highlighting that a common cause of failure in IPM algorithms arises during the Schur complement matrix decomposition. This issue tends to occur when the iterates approach the optimal solution and the primal variable and dual slack variables become nearly orthogonal, resulting in ${\textrm{Tr}}(XZ) \approx 0$ . In such cases, the Schur complement matrix may become ill-conditioned, leading to numerical instability or inaccurate results.

3.9. Solver internal mechanisms: predictor–corrector, warm start, problem structure

Now, we will discuss several internal mechanisms employed by solvers to provide a deeper understanding of their functionality and potential challenges or performance gains. We begin by introducing the concept of predictor–corrector, shedding light on its significance and usage within solvers. Additionally, we explore the application of perturbations to iterates, which can prove beneficial when tackling numerical issues while solving complex problems with stringent constraints. Furthermore, we emphasize the importance of leveraging the structure of specific problems to unlock potential solver optimizations and fine-tune performance. By highlighting these aspects, we aim to provide insights into the diverse approaches and strategies that can be employed in solver implementations.

As we mentioned above, when discussing the constraint (111c ), we expect an SDP solver to approach the value ν = 0. But here a question arises: should this be done immediately or gradually? And in the latter case: how gradually? One of the most popular answers has been given by Mehrotra [205], who introduced the so-called predictor–corrector mechanism. The predictor–corrector method is a powerful numerical technique widely used in solvers for solving complex mathematical problems efficiently. This iterative algorithm combines two essential steps: prediction and correction. In the prediction step, an approximate solution is computed based on the available information. This predicted solution is then refined in the correction step by incorporating additional calculations or adjustments to improve its accuracy and convergence toward the proper solution.

The first step of the predictor–corrector mechanism is the calculation of the predictor direction, sometimes referred to as an affine scaling direction [292], which employs an aggressive strategy to advance along the central path. In this approach, the target for the Newton step is set to ν = 0 in (115), indicating that the predictor step aims to approach the optimal solution. Let $(\delta X, \delta y, \delta Z)$ represent the predictor step, with the step-length determined by $\alpha_\mathrm{P}$ and $\beta_\mathrm{P}$ . These quantities are subsequently used to compute the value of ν that will serve as the objective for the Newton step direction in the subsequent iteration of the SDP solver. The predictor step itself is not taken but rather employed to derive a second-order correction for the corrector, or centering, direction. The actual procedure of calculation of the new value of ν is quite complicated. We refer reader to section 6.5 of [211] for more details. Just to provide some taste of the method we mention that e.g. in SDPT3 solver the value $\alpha_\mathrm{P}$ is upper bounded by a certain user-specified parameter gam, and the following formula for the corrector step [307, 310]:

$\begin{align} \frac{1}{n} {\textrm{Tr}}\left(X Z\right) \cdot \left( \frac{{\textrm{Tr}} \left( \left(X + \alpha_\mathrm{P} \delta X\right) \left(Z + \beta_\mathrm{P} \delta Z\right) \right)}{{\textrm{Tr}}\left(X Z\right)} \right)^{\text{expon}\_{\text{used}}}, \end{align} \tag{ 120 }$

where ${\text{expon}}\_{\text{used}}$ is a variable whose value is either a constant or is determined with some other algorithm based on another user-specified parameter expon. Next, in the corrector part of the iteration, one sets the new value of ν and calculates the Newton step $(\Delta X, \Delta y, \Delta Z)$ for the second time, with a different RHS. Often, to the goal $\nu {\unicode{x1D7D9}}$ in (111c ), an additional term $F(X, Z, \delta X, \delta Z)$ with a second order correction is added. Finally, the step-lengths $\alpha_\mathrm{C}$ and $\beta_\mathrm{C}$ for the corrector step are computed to ensure the preservation of positivity. In the subsequent iteration, the IPM algorithm sets the following:

$\begin{align} X \equiv X + \alpha_\mathrm{C} \Delta X, \end{align} \tag{ 121a }$

$\begin{align} y \equiv y + \beta_\mathrm{C} \Delta y, \end{align} \tag{ 121b }$

$\begin{align} Z \equiv Z + \beta_\mathrm{C} \Delta Z. \end{align} \tag{ 121c }$

The concept of warm start in solvers plays a crucial role in optimizing computational efficiency and reducing solution times for various optimization problems. Warm start refers to the technique of providing an initial feasible solution to a solver, obtained as a guess, or some generic scheme, or has been calculated from an already solved similar or related problem. Rather than starting the solver from scratch, the warm start approach utilizes the information from a previously solved problem to accelerate the convergence of subsequent iterations. By leveraging this initial solution, warm start techniques can significantly improve the overall performance of solvers. We will only briefly overview the warm start in solvers, and refer to section 6.3 of [211] for a detailed discussion. It has been observed that it is desirable for the initial iterate to have the magnitude of at least the same order as the optimal solution. The following method of cold-start is used in SDPT3 [310]: $X^{(0)} \equiv \xi {\unicode{x1D7D9}}$ , $Z^{(0)} \equiv \eta {\unicode{x1D7D9}}$ , and $y^{(0)}$ is the zero vector of the relevant dimension, where $\xi \equiv \max ( 10, \sqrt{n}, n \max_{i \in [m]} \frac{1+|b_i|_\mathrm{F}}{1+|A_i|_\mathrm{F}} )$ and $\eta \equiv \max ( 10, \sqrt{n}, |C|_\mathrm{F}, \max_{i \in [m]} |A_i|_\mathrm{F} )$ . In [211] we proposed and analyzed warm start strategies for NPA problems.

Another mechanism used in SDP solvers, which proved to be advantageous to certain scenarios, is the utilization of perturbations during iterations. Perturbations involve introducing slight modifications to the current iterates under specific conditions. The primary purpose of employing perturbation strategies is to prevent solvers from becoming stuck near the boundary of e.g. the PSD cone. While the primary objective of perturbations is not to reduce the number of iterations or CPU time, but rather to circumvent failures, it has been observed that in instances where the solver does not encounter failures, the most efficient solution tends to be one without perturbations [211]. By incorporating perturbations into the iterative process, the solver can navigate away from critical regions and explore a wider solution space, potentially avoiding numerical instabilities or convergence issues. While we will not delve into the details of perturbation strategies, we present a simple example in algorithm 1 to illustrate their form. The purpose of this strategy is to strike a balance between improving the duality gap and ensuring feasibility. In our observations of SDPs occurring in NPA [211], we have found that the reduction of the size of the gap often presents the greatest challenge. Specifically, the iterates tend to reach the feasibility threshold relatively quickly, and the majority of iterations are dedicated to reducing the gap. It is important to note that the efficiency and effectiveness of perturbation strategies may vary depending on the problem and solver being employed. In cases where the solver encounters failures, perturbations can provide a crucial mechanism to overcome such issues and continue the iterative process. However, it is worth noting that in scenarios where the solver operates smoothly without failures, perturbations may introduce additional computational overhead without providing substantial benefits in terms of solution quality or convergence speed. Overall, the use of perturbations in iterations offers a valuable approach to enhance the robustness and reliability of solvers when tackling problems involving the PSD cone. Nevertheless, the decision to incorporate perturbations should be made considering the specific problem characteristics, solver behavior, and desired trade-offs between reliability and computational efficiency.

Algorithm 1. Example of a perturbation of the iteratively improved solution in an SDP solver.
if $\text{gap} \gt 100 \cdot \epsilon_\mathrm{P}$ then
$X \gets X + 0.01 \cdot t_p \cdot {\unicode{x1D7D9}}$
end if
if $\epsilon_\mathrm{P} \gt 100 \cdot \epsilon_\mathrm{D}$ then
$Z \gets Z + 0.1 \cdot t_p \cdot {\unicode{x1D7D9}}$
end if

The last mechanism we mention is exploiting the specific structure of the problem by a solver to enhance its performance. It allows for tailored solver optimizations that can significantly enhance performance. By exploiting the problem structure, solvers can take advantage of inherent characteristics such as symmetry, sparsity, or specific constraints to reduce computational complexity. This approach enables the solver to focus computational resources on the most relevant parts of the problem, leading to faster convergence and more efficient solutions. Additionally, by understanding the problem structure, solvers can employ specialized algorithms and techniques that are specifically designed to leverage the problem's unique properties, further improving solution quality and computational efficiency. For instance, in [211] we have proposed a special data format to improve the performance of operations taken by a solver dedicated to the problems occurring in the dual formulation of NPA problems, where such properties as the sparsity and entries pattern was taken into account.

One of the first generic approaches exploiting sparsity patterns was given by Fujisawa et al [111], where the methods to leverage the sparsity of the problem matrices to improve efficiency were explored. The computation of the Schur complement matrix, as discussed in section 3.7, is often recognized as the most computationally intensive step in solving an SDP problem. One strategy they employ to support the Schur complement calculations is reordering the matrices $\{A_i\}_i$ . They demonstrate that the most effective reordering is one that arranges the matrices in a non-increasing order based on the number of nonzero elements f_i . Additionally, due to the symmetry of the Schur complement B, only entries $B_{i,j}$ with $j \unicode{x2A7E} i$ need to be directly evaluated. Fujisawa et al presented three different methods for computing the element $B_{i,j}$ , with the choice depending on the sparsity patterns of matrices A_i and A_j . For highly sparse matrices, they utilize the formula $B_{i,j} = \sum_{a,b,c,d \in [n]} (A_i)_{c,d} W_{c,a} W_{d,b} (A_j)_{a,b}$ to compute the Schur complement. The calculation of this quantity requires $(2 f_i + 1) f_j$ multiplications. The work [318] introduces the so-called correlative sparsity pattern graph, which relates to a certain sparse structure in the objective and constraint polynomials of unconstrained and inequality-constrained sparse polynomial optimization problems. The graph is used to obtain sets of the supports for SoS polynomials and get the improved performance of relevant SDPs. Sparsity patterns that are more specific to non-commutative polynomial optimization, particularly relevant to quantum information problems, were investigated in [172]. The method suggested partitioning the input variables into cliques based on the so-called correlative sparsity pattern exhibited by the polynomials present in the objective function and constraints. In [320] another particular type of sparsity occurring in the input data for large-scale sparse noncommutative polynomial optimization problems, called term sparsity is introduced.

4. Constructions of SDP useful for quantum information

In this section, we provide an overview of popular constructions which are used as building blocks for more complicated optimization problems used in quantum information. In section 4.1 we discuss semidefinite representations of semialgebraic functions which allow e.g. to express approximations of various types of quantum entropies as SDPs. In section 4.2 we provide an overview of the separability criteria of quantum states originating from the famous PPT criterion. Next, in section 4.3 we describe the Choi–Jamiołkowski isomorphism and highlight its relation to PSD constraints. Then, in section 4.4 the SoSs decomposition important for polynomial optimization is discussed, and in section 4.5 we describe the famous Lovász θ function. Afterward, we will discuss the application of moment matrices in the realm of quantum information and explore different aspects of their use. These include section 4.6 which discusses the relationship between correlation matrices and the so-called moment matrices, together with the NPA hierarchy which is a method for optimization over non-commuting variables; sections 4.7 and 4.8, which investigates three distinct methods for optimizing probability distributions or behaviors subject to dimension constraints.

4.1. Semidefinite representations of semialgebraic sets

Spectrahedron is defined as a geometric object that can be characterized as a solution set of an LMI (78a ) or, in other words, it is an intersection of the PSD cone with a linear affine subspace. This representation allows spectrahedra to capture the feasible regions of SDPs. When spectrahedra are subjected to linear or affine transformations, the resulting shapes are referred to as projected spectrahedra, or spectrahedral shadows, or SDP representable sets [145]. These projected spectrahedra retain the convexity property and also belong to the class of semialgebraic sets. It is worth noting that while every spectrahedral shadow is a convex semialgebraic set, but the converse statement, which was posed as a question by Nemirovski in [231], previously conjectured [145] to be true until 2017 [280], does not hold in general. This means that not all convex semialgebraic sets can be represented as spectrahedra. The notion of spectrahedra was introduced in [267]; see [317] for an overview, and [28] for a detailed discussion of their relation to the geometry of quantum states. An example of a spectrahedron of particular interest to quantum information is the spectraplex, which is the set of all PSD matrices in a given dimension with trace 1, i.e. the normalized quantum states.

To be more specific, we say that a set $\mathcal{S} \subseteq \mathbb{R}^n$ is LMI representable or has an LMI representation [147] if there exists a set of n + 1 symmetric n × n matrices $\{A_i\}_{i \in \{0\} \cup [n]} \subset \mathbb{S}^n$ such that

$\begin{align} \mathcal{S} = \left\{x \in \mathbb{R}^n : A_0 + \sum_{i \in \left[n\right]} A_i x_i \succeq 0 \right\}. \end{align} \tag{ 122 }$

The question of which closed convex sets can be SDP represented for n = 2 was first posed by Parrilo and Sturmfels in [247]. Suppose that for some $N \in \mathbb{N}_{+}$ there exist the following two sets of symmetric n × n matrices: $\{A_i\}_{i \in \{0\} \cup [n]} \subset \mathbb{S}^n$ and $\{B_j\}_{j \in [N]} \subset \mathbb{S}^n$ , such that $\mathcal{S}$ is a projection to $\mathbb{R}^n$ of the set

$\begin{align} \hat{\mathcal{S}} = \left\{\left(x,u\right) \in \mathbb{R}^{n+N} : A_0 + \sum_{i \in \left[n\right]} A_i x_i + \sum_{j \in \left[N\right]} B_j u_j \succeq 0 \right\}, \end{align} \tag{ 123 }$

so that $\mathcal{S} = \left\{x \in \mathbb{R}^n : {\Large{\exists}}_{u \in \mathbb{R}^N} (x,u) \in \hat{\mathcal{S}} \right\}$ . Then, we say that $\mathcal{S}$ is SDP representable, or has an SDP representation or a lifted LMI representation, or is a spectrahedral shadow [280]. For instance, from (109) it follows that the hypograph of the matrix geometric mean, viz. $\mathbf{hyp}{[\#]} = \left\{(X, Y, R) \in \mathbb{H}^n_{++} \times \mathbb{H}^n_{++} \times \mathbb{H}^n : X \# Y - R \succeq 0 \right\}$ , has an SDP representation. As shown in [99, prop. 1], for a odd $p \in \mathbb{N}_{+}$ , and $l \in \mathbb{N}_{+}$ , with $p \lt 2^l$ , it holds that

$\begin{align} \mathbf{hyp}{\left[\#_{p / 2^l}\right]} = \left\{\left(X, Y, R_l\right) : {\Large{\exists}}_{\left(R_i\right)_{i \in \left[l-1\right]} \subset \mathbb{H}^n}, {\Large{\forall}}_{i \in \left[l\right]} \begin{bmatrix} X \#_{m_i} Y & R_i \\ R_i & R_{i-1} \end{bmatrix} \succeq 0, R_0 = Y \right\}, \end{align} \tag{ 124 }$

where $(m_i)_{i \in [l]}$ is the binary expansion of $p / 2^l$ , i.e. $p / 2^l = \sum_{i \in [l]} m_i / 2^{l - i + 1}$ and m₀ = 0. Thus, there exist an SDP representation of $\mathbf{hyp}{[\#_{p / 2^l}]}$ consisting of l LMI, each of size 2n by 2n.

A necessary condition for a set to be LMI representable is to be convex and basic closed semialgebraic. If a set is SDP representable, then it might not be basic closed semialgebraic, but it must be convex semialgebraic. In [145, 146] sufficient conditions for SDP representability were given. It was also shown that the set of all SDP representable sets is closed under taking linear images or preimages, finite intersections, or convex hulls of finite unions [145, 238]. In [237] it was shown that the interior of an SDP representable set is again an SDP representable set. The result of [279] was that closed convex hulls of one-dimensional semialgebraic sets are also SDP representable. The seminal work [245] showed how to construct a complete family of SDPs of polynomial size, which can be used to prove the infeasibility of a finite set of polynomial constraints.

Since the feasible set of SDP is a semialgebraic set, SDPs cannot be directly used to model non-semialgebraic sets and functions, even if they are convex. The work [100] provided a method to approximate certain useful non-semialgebraic sets with SDP representations of relatively small size. Consider a non-semialgebraic concave function $g : \mathbb{R} \rightarrow \mathbb{R}$ . Suppose it has an integral representation $g(x) = \int_0^1 f_t(x) \mathrm{d}t$ , and that the integral can be approximated, e.g. using one of the Gauss quadratures [263], and then $g(x) \approx r_m(x) \equiv \sum_{j \in [m]} w_j f_{t_j}(x)$ , where the weights w_j and nodes t_j depend on the quadrature. The quantity m that defines how many terms occur in the quadrature is called the order of the quadrature. The order of the quadrature determines the accuracy of the approximation, with higher orders resulting in more accurate approximations. For a function of particular interest, the logarithm, we have $\log(x) = \int_0^1 \frac{x - 1}{t \cdot (x - 1) + 1} \mathrm{d}t$ . For any fixed t, the hypograph of the concave function $f_t(x) \equiv \frac{x - 1}{t \cdot (x - 1) + 1}$ has an SDP representation, viz. $f_t(x) \unicode{x2A7E} r$ if and only if $\begin{bmatrix} x - 1 - r & -\sqrt{t} r \\ -\sqrt{t} r & 1 - r t \end{bmatrix}$ $\succeq 0$ . One can show that the matrix hypograph of f_t for any $t \in [0,1]$ has the following SDP representation [100, proposition 2]:

$\begin{align} \begin{aligned} \mathbf{hyp}~{f_t} & = \left\{\left(X, R\right) \in \mathbb{H}^n_{++} \times \mathbb{H}^n : \left(X - {\unicode{x1D7D9}}\right) \cdot \left[t \cdot \left(X - {\unicode{x1D7D9}}\right) + {\unicode{x1D7D9}}\right]^{-1} \succeq R \right\} \\ & = \left\{\left(X, R\right) \in \mathbb{H}^n_{++} \times \mathbb{H}^n : \begin{bmatrix} X - {\unicode{x1D7D9}} - R & -\sqrt{t} R \\ -\sqrt{t} R & {\unicode{x1D7D9}} - t R \end{bmatrix} \succeq 0 \right\}. \end{aligned} \end{align} \tag{ 125 }$

The approximation $\log{x} \approx r_m(x)$ is most accurate for small values of x. Since we have $\log(x) = \frac{1}{h} \log(x^h)$ , it is often beneficial to use another approximation, viz. $r_{m,k}(x) \equiv 2^k r_m(x^{1/{2^k}})$ . It can be shown that $r_{m,k}$ is operator concave [100, proposition 3] and $\mathbf{P}_{r_{m,k}}[Y,X] = 2^k \mathbf{P}_{r_m} [X \#_{2^{-k}} Y, X]$ [100, equation (18)]. What is more, consider $V, X \in \mathbb{H}^n_{++}$ and $R \in \mathbb{H}^n$ . It holds [100, p 272] $R^{\prime} \succeq \mathbf{P}_{-r_m}[Y,X]$ if and only if there exist $(R^{\prime}_i)_{i \in [m]} \subset \mathbb{H}^n$ such that $R^{\prime} = \sum_{i \in [m]} R^{\prime}_i$ and ${\Large{\forall}}_{i \in [m]} \begin{bmatrix} Y - X + R^{\prime}_i & \sqrt{t_i} R^{\prime}_i \\ \sqrt{t_i} R^{\prime}_i & X + t_i R^{\prime}_i \end{bmatrix}$ $\succeq 0$ .

From (30) and $\log{x} \approx r_{m,k}(x)$ we can expect $S(X|Y) \approx \mathbf{P}_{(-r_{m,k})}[Y,X]$ and thus is some sense:

$\begin{align} \begin{aligned} \mathbf{epi }{S\left(X|Y\right)} \approx K^n_{m,k} &\equiv \left\{\left(X,Y,R^{\prime}\right) \in \mathbb{H}^n_+ \times \mathbb{H}^n_+ \times \mathbb{H}^n : R^{\prime} \succeq \mathbf{P}_{\left(-r_{m,k}\right)}\left[Y,X\right] \right\} \\ & = \left\{\left(X,Y,R^{\prime}\right) \in \mathbb{H}^n_+ \times \mathbb{H}^n_+ \times \mathbb{H}^n : R^{\prime} \succeq -2^k \mathbf{P}_{r_m} \left[X \#_{2^{-k}} Y, X\right] \right\}. \end{aligned} \end{align} \tag{ 126 }$

This, together with the SDP representation (124), shows that the set $K^n_{m,k}$ has an SDP representation [100, theorem 3]. This, in consequence, allows optimizations over quantum entropies [46, 98] using SDP.

4.2. DPS conditions of separability

The DPS [86, 87] method is a powerful technique used to determine the separability of multipartite quantum states, by providing a hierarchy of SDP relaxations that approximate the separability conditions. To be more specific, DPS introduces a hierarchy of conditions involving partial transpositions allowing for a test of separability, with strength increasing with the hierarchy level. This approach is relevant also for studying the separability of multipartite quantum systems with more than two parties. The DPS method utilizes a series of SDP relaxations, where each relaxation, pertaining to higher levels, introduces additional variables and constraints. Thus, at each level of the hierarchy, the DPS method formulates an SDP. The PSD conditions play a crucial role in these relaxations, as they express the constraint that the obtained solutions are physically valid quantum states. Roughly speaking, in the DPS method for a given state $\rho_{\mathcal{A}\mathcal{B}}$ one asks whether there exist a hierarchy of symmetric extensions, i.e. a family of states $\rho_{\mathcal{A} \mathcal{B}_1 \cdots \mathcal{B}_N}$ defined for any N, such that ${\Large{\forall}}_{i \in [N]} \rho_{\mathcal{A}\mathcal{B}} = {\textrm{Tr}}_{\mathcal{B}_j : j \neq i} [\rho_{\mathcal{A} \mathcal{B}_1 \cdots \mathcal{B}_N}]$ . It happens that the state $\rho_{\mathcal{A}\mathcal{B}}$ is separable if and only if such a hierarchy exists for each natural N. The state $\rho_{\mathcal{A}\mathcal{B}}$ is separable if and only if a hierarchy of symmetric extensions exists for every natural number N. For a given fixed value of N, the task of verifying the existence of a symmetric extension is equivalent to an SDP. Consequently, an algorithm can be devised by incrementally examining the extendability condition for increasing values of N. This algorithm is guaranteed to terminate if the initial state $\rho_{\mathcal{A}\mathcal{B}}$ is entangled, thus it detects the non-separability. However, if the state is separable, the algorithm will continue indefinitely without termination.

Let us provide a more detailed overview of the concept of DPS. Consider a state $\rho_{\mathcal{A}\mathcal{B}}$ residing in the composite Hilbert space $\mathcal{A} \otimes \mathcal{B}$ , which exhibits separability, meaning that it can be expressed as a convex combination of pure product states:

$\begin{align} \rho_{\mathcal{A}\mathcal{B}} = \sum_{i} \lambda_i | \, \phi_i \rangle \langle \phi_i \, |_\mathcal{A} \otimes | \, \varphi_i \rangle \langle \varphi_i \, |_\mathcal{B}, \end{align} \tag{ 127 }$

where the coefficients λ_i satisfy the conditions $\sum_{i} \lambda_i = 1$ and $\lambda_i \unicode{x2A7E} 0$ . Consider a Hilbert space ${\mathcal{A}^k} \otimes {\mathcal{B}^l}$ , and let $\tilde{\rho}$ be a state on that space. If $\tilde{\rho}$ satisfies the condition:

$\begin{align} {\textrm{Tr}}_{\mathcal{A}^{k-1} \mathcal{B}^{l-1}} \left[ \tilde{\rho} \right] = \rho_{\mathcal{A}\mathcal{B}}, \end{align} \tag{ 128 }$

where the partial trace is taken over all but the first copy of each space, then $\tilde{\rho}$ is called an extension [97, 264] of ρ. Let $\mathcal{S}_\mathcal{A}$ represent the set of all permutation operators among copies of the space $\mathcal{A}$ , and the same applies to B. The state extension $\tilde{\rho}$ is considered symmetric if, for every $P \in \mathcal{S}_\mathcal{A} \otimes \mathcal{S}_B$ , the following condition holds:

$\begin{align} \tilde{\rho} = P \tilde{\rho} P. \end{align} \tag{ 129 }$

On the other hand, the state extension $\tilde{\rho}$ is classified as PPT if $\tilde{\rho}$ remains positive after applying any partial transposition on the subsystems. When $\rho_{\mathcal{A}\mathcal{B}}$ is separable, it is guaranteed that for any values of k and l, there exists an extension $\tilde{\rho}$ that is a PPT symmetric extension of $\rho_{\mathcal{A}\mathcal{B}}$ . The fundamental principle behind the DPS hierarchy is to examine whether a PPT symmetric extension of $\rho_{\mathcal{A}\mathcal{B}}$ exists for fixed values of k and l, and if not, this implies that $\rho_{\mathcal{A}\mathcal{B}}$ is not separable. As the constraints of PPT symmetric extension can be expressed as SDPs, the DPS method enables optimization over a relaxation of the set of separable states on the given spaces. As the values of k and l increase, the relaxation approaches the actual set of all separable states more closely. Therefore, starting from a PPT symmetric extension state $\tilde{\rho}$ , it is possible to construct a state ϱ on $\mathcal{A} \otimes \mathcal{B}$ that is, in a certain sense, close to being separable.

The method can be applied analogously to more than two parties, as in the following example involving three subsystems. We will now demonstrate an application of DPS, which involves a method for representing quantum states and measurements using a single SDP variable. Consider three unit vectors: $| \, \Phi^{\lambda} \rangle = \sum_{i \in [d_\mathcal{A}]} \sum_{j \in [d_\mathcal{B}]} \phi_{i j} | \, ij \rangle_{\mathcal{A}\mathcal{B}}$ , $| \, u^{\lambda} \rangle = \sum_{i \in [d_\mathcal{A}]} \mu_i^{\lambda} | \, i \rangle_{\mathcal{A}}$ , and $| \, v^{\lambda} \rangle = \sum_{j \in [d_\mathcal{B}]} \nu_j^{\lambda} | \, j \rangle_{\mathcal{B}}$ . Here, λ represents a global hidden variable with an arbitrary probability distribution ${p^{\lambda}}$ . Next, we define the following operator

$\begin{align} W_{\mathcal{A} \mathcal{B} \mathcal{A}^{\prime} \mathcal{B}^{\prime}} \equiv \sum_{\lambda} p^{\lambda} \cdot \left[ | \, \Phi^{\lambda} \rangle \langle \Phi^{\lambda} \, |_{\mathcal{A}B} \otimes | \, u^{\lambda} \rangle \langle u^{\lambda} \, |_{A^{\prime}} \otimes | \, v^{\lambda} \rangle \langle v^{\lambda} \, |_{\mathcal{B}^{\prime}} \right]. \end{align} \tag{ 130 }$

For two subsystems S₁, S₂, both of dimension d_S , the SWAP operator is defined in the following way:

$\begin{align} \mathrm{SWAP}\left(S_1, S_2\right) \equiv \sum_{i,j \in \left[d_S\right]} | \, i j \rangle \langle j i \, |_{S_1 S_2}. \end{align} \tag{ 131 }$

The idea that a tensor product of the density matrix ρ and measurements $\{M\}$ contain all elements necessary to express the probabilities of the form ${\textrm{Tr}} (\rho M)$ was formulated in [327]. In [223], where the so-called Navascués–de la Torre–Vértesi (NTV) SDP hierarchy was introduced, it was recognized that this idea in combination with DPS hierarchy allows approximating the probabilities ${\textrm{Tr}} (\rho M)$ as entries in SDP variables. NTV provided one of the methods to approximate the set of quantum correlations with dimension constraints; other such methods are discussed in section 4.7. A similar approach was used in [341, equation (4)] to express a constraint of purity of the state, and in consequence to develop a method of providing rank constraints on the considered operators. Direct calculations show that ${\textrm{Tr}} \left[ W_{\mathcal{A} \mathcal{B} \mathcal{A}^{\prime} \mathcal{B}^{\prime}} \left( \mathrm{SWAP}(\mathcal{A}, \mathcal{A}^{\prime}) \otimes \mathrm{SWAP}(\mathcal{B}, \mathcal{B}^{\prime}) \right) \right]$ is equal to:

$\begin{align} \begin{aligned} \sum_{\lambda} p^{\lambda} \cdot {\textrm{Tr}} \left[ \tilde{W}_{\mathcal{A} \mathcal{B} \mathcal{A}^{\prime} \mathcal{B}^{\prime}}^\lambda \right] & = \sum_{\lambda} p^{\lambda} \cdot \left[ \sum_{i_1, i_2 \in \left[d_\mathcal{A}\right]} \sum_{j_1, j_2 \in \left[d_\mathcal{B}\right]} \phi_{i_1 j_1}^{\lambda} \phi_{i_2 j_2}^{\lambda*} \mu_{i_2}^{\lambda} \mu_{i_1}^{\lambda*} \nu_{j_2}^{\lambda} \nu_{j_1}^{\lambda*} \right] \\ & = \sum_{\lambda} p^{\lambda} \cdot {\textrm{Tr}} \left[ | \, \Phi^{\lambda} \rangle \langle \Phi^{\lambda} \, |_{\mathcal{A}\mathcal{B}} \left( | \, u^{\lambda} \rangle \langle u^{\lambda} \, |_\mathcal{A} \otimes | \, v^{\lambda} \rangle \langle v^{\lambda} \, |_\mathcal{B} \right) \right], \end{aligned} \end{align} \tag{ 132 }$

where $\tilde{W}_{\mathcal{A} \mathcal{B} \mathcal{A}^{\prime} \mathcal{B}^{\prime}}^\lambda$ is defined as:

$\begin{align} \sum_{\substack{i_1, i_2, i_3, \\ i_4, i_5, i_6 \in \left[d_\mathcal{A}\right]}} \sum_{\substack{j_1, j_2, j_3, \\ j_4, j_5, j_6 \in \left[d_\mathcal{B}\right]}} \phi_{i_1 j_1}^{\lambda} \phi_{i_2 j_2}^{\lambda*} \mu_{i_3}^{\lambda} \mu_{i_4}^{\lambda*} \nu_{j_3}^{\lambda} \nu_{j_4}^{\lambda*} | \, i_1 j_1 i_3 j_3 \rangle \langle i_2 j_2 i_4 j_4 | i_5 j_5 i_6 j_6 \rangle \langle i_6 j_6 i_5 j_5 \, |_{\mathcal{A} \mathcal{B} \mathcal{A}^{\prime} \mathcal{B}^{\prime}}. \end{align} \tag{ 133 }$

The resulting expression is the probability of projection of the state on some projective measurements. With similar calculations, we also obtain:

$\begin{align} {\textrm{Tr}} \left[ W_{\mathcal{A} \mathcal{B} \mathcal{A}^{\prime} \mathcal{B}^{\prime}} \left( \mathrm{SWAP}\left(\mathcal{A}, \mathcal{A}^{\prime}\right) \otimes {\unicode{x1D7D9}}_{\mathcal{B} \mathcal{B}^{\prime}} \right) \right] = \sum_{\lambda} p^{\lambda} \cdot {\textrm{Tr}} \left[ | \, \Phi^{\lambda} \rangle \langle \Phi^{\lambda} \, |_{\mathcal{A}\mathcal{B}} \left( | \, u^{\lambda} \rangle \langle u^{\lambda} \, |_\mathcal{A} \otimes {\unicode{x1D7D9}}_\mathcal{B} \right) \right]. \end{align} \tag{ 134 }$

We see that operators created in a similar way like (130) contain entries expressing Frobenius products (3) of a state and measurements, and SWAP operators provide a tool to extract them to obtain quantum probabilities. It is easy to generalize the above formulae to cover cases involving e.g. more projective measurements and more parties. Obviously, the operator (130) is separable, and this constraint is imposed with the discussed DPS method.

The DPS method implementation is given in appendix B.4, together with an example of a Tsirelson bound calculation using NTV method.

4.3. Choi–Jamiołkowski isomorphism and quantum channels

The Choi–Jamiołkowski isomorphism introduced in 1972 by Jamiołkowski in [155] and, independently in 1975 by Choi in [70] is a fundamental concept in quantum information theory that establishes a correspondence between quantum states and quantum channels, see [161] for a detailed discussion and historical remarks. The isomorphism also called a state-channel duality, provides a mathematical framework to represent quantum channels as density matrices, enabling the study and manipulation of quantum processes using tools from quantum state theory. We say that a map is PSD when it is transforming PSD matrices to PSD matrices; it is completely positive if $\mathcal{E} \otimes {\unicode{x1D7D9}}$ is PSD for ${\unicode{x1D7D9}}$ acting over arbitrary space; a linear map is a trace preserving when the trace of the input matrix is equal to the trace of the output matrix. The Choi–Jamiołkowski isomorphism is defined as follows. Given a linear PSD map $\mathcal{E}: \mathbb{C}^{d \times d} \rightarrow \mathbb{C}^{d^{\prime} \times d^{\prime}}$ , i.e. $\mathcal{E} \in \mathsf{L}\left[ \mathbb{C}^{d \times d} , \mathbb{C}^{d^{\prime} \times d^{\prime}} \right]$ , that transforms input states on $\mathbb{C}^{d \times d}$ to output states on $\mathbb{C}^{d^{\prime} \times d^{\prime}}$ , its corresponding Choi matrix $J(\mathcal{E}) \in \mathbb{C}^{d^{\prime} \times d^{\prime}} \otimes \mathbb{C}^{d \times d}$ is the PSD matrix defined in the following way:

$\begin{align} J\left(\mathcal{E}\right) \equiv \sum_{i,j \in \left[d\right]} \mathcal{E} \left[ | \, i \rangle \langle j \, | \right] \otimes | \, i \rangle \langle j \, | = d \cdot \left( \mathcal{E} \otimes {\unicode{x1D7D9}}_{d} \right) \left[ | \, \Phi^{+} \rangle \langle \Phi^{+} \, | \right], \end{align} \tag{ 135 }$

where $| \, \Phi^{+} \rangle \equiv \frac{1}{\sqrt{d}} \sum_{i \in [d]} | \, i \rangle \otimes | \, i \rangle$ is the maximally entangled state.

Given a PSD matrix, it is possible to reconstruct the corresponding quantum channel. This reconstruction process allows us to extract useful information about the properties and behavior of the quantum channel. Let us consider a state $\rho = \sum_{k,l \in [d]} \rho_{k,l} | \, k \rangle \langle l \, | \in \mathbb{C}^{d \times d}$ . Then

$\begin{align} {\unicode{x1D7D9}}_{d^{\prime}} \otimes \rho^\mathrm{T} = \left( \sum_{m \in \left[d^{\prime}\right]} | \, m \rangle \langle m \, | \right) \otimes \left( \sum_{k,l \in \left[d\right]} \rho_{k,l} | \, l \rangle \langle k \, | \right) = \sum_{m \in \left[d^{\prime}\right]} \sum_{k,l \in \left[d\right]} \rho_{k,l} | \, m \rangle \langle m \, | \otimes | \, l \rangle \langle k |, \end{align} \tag{ 136 }$

and we have

$\begin{align} \begin{aligned} J\left(\mathcal{E}\right) \cdot \left( {\unicode{x1D7D9}}_{d^{\prime}} \otimes \rho^\mathrm{T} \right) & = \sum_{i,j \in \left[d\right]} \sum_{k,l \in \left[d\right]} \sum_{m \in \left[d^{\prime}\right]} \rho_{k,l} \cdot \left( \mathcal{E} \left[ | \, i \rangle \langle j \, | \right] \otimes | \, i \rangle \langle j \, | \right) \cdot \left( | \, m \rangle \langle m \, | \otimes | \, l \rangle \langle k \, | \right) \\ & = \sum_{i,j,k,l \in \left[d\right]} \sum_{m \in \left[d^{\prime}\right]} \rho_{k,l} \cdot \left( \mathcal{E} \left[ | \, i \rangle \langle j \, | \right] \cdot | \, m \rangle \langle m \, | \right) \otimes \left( | \, i \rangle \langle j \, | \cdot | \, l \rangle \langle k \, | \right) \\ & = \sum_{i,j,k \in \left[d\right]} \sum_{m \in \left[d^{^{\prime}}\right]} \rho_{k,l} \cdot \left( \mathcal{E} \left[ | \, i \rangle \langle j \, | \right] \cdot | \, m \rangle \langle m \, | \right) \otimes | \, i \rangle \langle k \, | \\ & = \sum_{i,j,k \in \left[d\right]} \rho_{k,l} \cdot \mathcal{E} \left[ | \, i \rangle \langle j \, | \right] \otimes | \, i \rangle \langle k | \equiv Y. \end{aligned} \end{align} \tag{ 137 }$

Then, we take the partial trace of Y over the second subspace, removing the input space:

$\begin{align} \begin{aligned} {\textrm{Tr}}_2 Y & = \sum_{l \in \left[d\right]} \langle l \, | Y | \, l \rangle = \sum_{i,j,k,l \in \left[d\right]} \rho_{k,l} \cdot \mathcal{E} \left[ | \, i \rangle \langle j \, | \right] \langle l | i \rangle \langle k | l \rangle \\ & = \sum_{i,j \in \left[d\right]} \rho_{l,j} \mathcal{E} \left[ | \, l \rangle \langle j \, | \right] = \mathcal{E} \left[ \sum_{i,j \in \left[d\right]} \rho_{l,j} | \, l \rangle \langle j \, | \right] = \mathcal{E} \left[ \rho \right]. \end{aligned} \end{align} \tag{ 138 }$

Thus we have the following crucial property:

$\begin{align} {\textrm{Tr}}_2 \left[ J\left(\mathcal{E}\right) \cdot \left( {\unicode{x1D7D9}}_{d^{\prime}} \otimes \rho^\mathrm{T} \right) \right] = \mathcal{E} \left[ \rho \right]. \end{align} \tag{ 139 }$

A direct consequence of (139) is that for a POVM $\{M^b\}_b$ on $\mathbb{C}^{d^{\prime} \times d^{\prime}}$ , we have ${\textrm{Tr}} \left[ J(\mathcal{E}) \cdot \left( M^b \otimes \rho^\mathrm{T} \right) \right] = {\textrm{Tr}} \left[ \mathcal{E} \left[ \rho \right] M^b \right]$ , which is the probability of the outcome b of the POVM applied to the output state of the channel.

It can be shown, that any linear map $\mathcal{E}$ is completely PSD if and only if its Choi matrix (135) $J(\mathcal{E})$ is PSD. Similarly, the Choi–Jamiołkowski isomorphism captures the property of trace preserving with the constraint that the Choi matrix after tracing out the output subsystem is equal to the identity operator on the input subsystem, i.e.:

$\begin{align} {\textrm{Tr}}_1 \left[ J\left(\mathcal{E}\right) \right] = {\unicode{x1D7D9}}_{d}. \end{align} \tag{ 140 }$

Trivially, for $J(\mathcal{E})$ defined in (135) and trace preserving $\mathcal{E}$ we have ${\textrm{Tr}} \left[ \mathcal{E} \left[ | i \rangle \langle j | \right] \right] = {\textrm{Tr}} [ | i \rangle \langle j |] = \delta_{i,j}$ and thus ${\textrm{Tr}}_1 [J(\mathcal{E})] = {\unicode{x1D7D9}}_{d}$ . Conversely, consider a matrix $X^{\prime} \in \mathbb{C}^{d^{\prime} \times d^{\prime}} \otimes \mathbb{C}^{d \times d}$ satisfying ${\textrm{Tr}}_1 [X] = {\unicode{x1D7D9}}_{d}$ . Let $X^{\prime} = \sum_{i,j \in [d]} X^\prime_{i,j} \otimes | \, i \rangle \langle j \, |$ . Then since ${\textrm{Tr}}[ | \, i \rangle \langle j \, |] = \delta_{i,j}$ , we have ${\Large{\forall}}_{i,j} {\textrm{Tr}} [X^\prime_{i,j}] = \delta_{i,j}$ , and thus for any ρ it holds that:

$\begin{align} {\textrm{Tr}} \left[ X^{\prime} \cdot \left( {\unicode{x1D7D9}}_{d^{\prime}} \otimes \rho^\mathrm{T} \right) \right] = {\textrm{Tr}}_2 \left[ \sum_{i,j \in \left[d\right]} {\textrm{Tr}}_1 \left[ X^\prime_{i,j} \otimes \left( | \, i \rangle \langle j \, | \cdot \rho^\mathrm{T} \right) \right] \right] = {\textrm{Tr}}_2 \left[ \sum_{i \in \left[d\right]} | \, i \rangle \langle i \, | \cdot \rho^\mathrm{T} \right] = {\textrm{Tr}}\left[\rho\right]. \end{align} \tag{ 141 }$

It is possible to employ the Choi–Jamiołkowski isomorphism to express other properties of quantum channels. Consider a channel $\mathcal{E}_{A^{\prime} B^{\prime} \leftarrow AB}$ transforming states states on $\mathbb{C}^{n_A \times n_A} \otimes \mathbb{C}^{n_B \times n_B}$ to states on $\mathbb{C}^{n_{A^{\prime}} \times n_{A^{\prime}}} \otimes \mathbb{C}^{n_{B^{\prime}} \times n_{B^{\prime}}}$ , for some n_A , $n_{A^{\prime}}$ , n_B , and $n_{B^{\prime}}$ , with the Choi matrix $J(\mathcal{E}_{A^{\prime} B^{\prime} \leftarrow AB}) \in \mathbb{C}^{n_{A^{\prime}} \times n_{A^{\prime}}} \times \mathbb{C}^{n_{B^{\prime}} \times n_{B^{\prime}}} \otimes \mathbb{C}^{n_A \times n_A} \otimes \mathbb{C}^{n_B \times n_B}$ . One usually interprets A and B as inputs and A' and B' as outputs of Alice and Bob, respectively. For instance, one can show that the marginal output state of Alice is a result of a fixed operation on the marginal input state of Alice, or, in other words, the channel is non-signaling from Bob to Alice [78] if and only if ${\textrm{Tr}}_{B^{\prime}} \left[ J(\mathcal{E}_{A^{\prime} B^{\prime} \leftarrow AB}) \right] = {\textrm{Tr}}_{BB^{\prime}} \left[ J(\mathcal{E}_{A^{\prime} B^{\prime} \leftarrow AB}) \right] \otimes {\unicode{x1D7D9}}_B$ [187, equation (22)]; similar relation holds for channels non-signaling from Alice to Bob. Suppose that Alice and Bob are controlling ancillary subsystems, $\tilde{A}$ and $\tilde{B}$ , respectively. A channel is called PPT-preserving when it transforms a bipartite PPT state $\rho_{A \tilde{A} B \tilde{B}}$ , i.e. a state satisfying $\rho_{A \tilde{A} B \tilde{B}}^{\mathrm{T}_{B \tilde{B}}} \succeq 0$ , to a biparitites PPT state [265, 266]. In [266] it was shown that $\mathcal{E}_{A^{\prime} B^{\prime} \leftarrow AB}$ is PPT-preserving if and only if $\left[ J(\mathcal{E}_{A^{\prime} B^{\prime} \leftarrow AB}) \right]^{\mathrm{T}_{B B^{\prime}}} \succeq 0$ . We refer to [187] for a detailed discussion of other properties of quantum channels possible to be expressed with constraints in SDPs.

In summary, the PSD condition is essential in the Choi–Jamiołkowski isomorphism as it ensures the validity of the represented quantum channels. It provides a mathematical framework to analyze and manipulate quantum processes, allowing for the exploration of various properties and applications in quantum information theory. The discussed methods are, in particular, used to express bounds on various types of channel capacities as SDPs [96, 296, 322] or for channels discrimination [330].

4.4. SoS decomposition of polynomials

The SoS technique is a powerful tool used in SDP to represent nonnegative polynomials as sums of squares of other polynomials. Recall that in SDP, the goal is to optimize a linear objective function subject to LMI constraints. However, many optimization problems involve nonnegative polynomials, and checking the nonnegativity of a polynomial can be challenging. The SoS technique provides a way to approximate these nonnegative polynomials using sums of squares, which can be readily handled in SDP. This technique is useful in many areas of mathematics, including optimization, control theory, and signal processing. One application of the SoS decomposition is in optimization problems. In particular, it can be used to determine whether a polynomial is non-negative over a given domain. This is important in optimization because many optimization problems involve minimizing or maximizing a polynomial subject to certain constraints. By using the SoS decomposition, one can determine whether the polynomial is non-negative over the feasible region, which can help in finding the optimal solution. The method found various applications in multiple areas of science and is particularly useful for polynomial optimization, robust control, and polynomial system analysis, as it allows for tractable representation and computation of nonnegative polynomials in SDP frameworks [67, 157, 195, 243–246, 258].

The main idea behind the SoS technique is to express a nonnegative polynomial as an SoSs of lower-degree polynomials. This is achieved by introducing additional variables and using semidefinite constraints. Specifically, the nonnegative polynomial is decomposed into a SoSs of polynomials, where each polynomial is multiplied by a semidefinite matrix. The positivity of the original polynomial is then guaranteed by the positivity of the semidefinite matrices. Let us consider a homogeneous polynomial h(x), where $x \in \mathbb{R}^n$ , and all terms of h(x) have a degree of 2m. We say that h(x) is a SoS polynomial if and only if, for some k there exists a set $\{g_{i} \}_{i \in [k]}$ , where each g_i is a polynomial of degree m and $h(x) = \sum_{i \in [k]} g_{i}(x)^{2}$ . Obviously, any SoS polynomial is always positive. Moreover, the interesting property of the SoS polynomials is that any real non-negative polynomial can be approximated arbitrarily closely by a sequence of SoS polynomials, known as the SoS hierarchy [185]. For any polynomial $g_i(x)$ , we can express it as the inner product of a vector v_i and the basis of monomials $x^{(m)}$ of degree m. Mathematically, this can be written as $g_i(x) = v_i^\mathrm{T} \cdot x^{(m)}$ . The basis $x^{(m)}$ consists of monomials of degree m, and the dimension of this basis is given by $d \equiv {{n+m-1} \choose {m}}$ , thus $v_i \in \mathbb{R}^d$ . If SoS exist, then a set $\mathcal{S} \equiv \{v_i\}_{i = 1}^{k}$ also exist and:

$\begin{align} \begin{aligned} h\left(x\right) & = \sum_{i \in \left[k\right]} g_{i}\left(x\right)^{2} = \sum_{i \in \left[k\right]} \left(v_i^\mathrm{T} \cdot x^{\left(m\right)}\right)^2 = \sum_{i \in \left[k\right]} \left(x^{\left(m\right) \mathrm{T}} \cdot v_i\right) \left(v_i^\mathrm{T} \cdot x^{\left(m\right)}\right) \\ & = \sum_{i \in \left[k\right]} x^{\left(m\right) \mathrm{T}} \left(v_i v_i^\mathrm{T}\right) x^{\left(m\right)} = x^{\left(m\right) \mathrm{T}} \left(\sum_{i \in \left[k\right]} v_i v_i^\mathrm{T}\right) x^{\left(m\right)} \equiv x^{\left(m\right) \mathrm{T}} M x^{\left(m\right)}. \end{aligned} \end{align} \tag{ 142 }$

Hence, even if the set $\mathcal{S}$ is not explicitly known, it is evident that $M \equiv \sum_{i \in [k]} v_i v_i^\mathrm{T} \succeq 0$ , $M \in \mathbb{S}^{d \times d}$ . This observation implies that verifying whether a polynomial h(x) is SoS is equivalent to determining the existence of a PSD matrix $M \succeq 0$ that satisfies the relation $h(x) = x^{(m) \mathrm{T}} M x^{(m)}$ .

Now, let us consider a symmetric matrix $H \in \mathbb{S}^d$ , not necessarily PSD, such that $h(x) = x^{(m)\mathrm{T}} H x^{(m)}$ . Constructing such a matrix is straightforward, as it suffices to assign the relevant coefficients from h(x) onto the diagonal elements of H. Let $\{N_i\}_{i \in [D]}$ , for some D, ${\Large{\forall}}_{i} N_i \in \mathbb{S}^d$ , be a basis of the space of all symmetric d by d matrices satisfying the equation $x^{(m)\mathrm{T}} N_i x^{(m)} = 0$ . The dimension of this space depends on m, i.e. the degree of the polynomial h(x), as well as n, the number of variables. The objective now is to verify the feasibility of the so-called Gram representation of the SoS polynomial. This representation is expressed as $H + \sum_{i} y_i N_{i} \succeq 0$ , where y_i are coefficients. Finally, the feasibility of the Gram representation of the SoS polynomial is examined by checking if the matrix $H + \sum_{i} y_i N_{i}$ is PSD. This assessment allows for the determination of whether the given polynomial can be represented as an SoS.

The non-commutative analog of SoS called sum of Hermitian squares or non-commutative SoS [204] was introduced in [144]. We say that a Hermitian polynomial p(X) in non-commutative variables $\mathbf{X} = (X_i)_i$ is a non-commutative SoS (or, simply, SoS) when there exist polynomials $(r_j(X))_j$ such that $p(X) = \sum_j r_j^{\dagger} r_j$ . For operators used as non-commuting variables, being SoS means that the polynomial of the operators is PSD, meaning, in particular, that its expectation value is non-negative for all quantum states. The non-commutative polynomial is weighted SoS (WSoS) generated by a collection of Hermitian polynomials in non-commutative variables $\mathcal{P}$ when it is of the form [85]:

$\begin{align} \underbrace{\sum_j r_j^{\dagger} r_j}_{\text{SoS}} + \underbrace{\sum_k \sum_l s_{k,l}^{\dagger} p_k s_{k,l}}_{\text{weighted term}} \end{align} \tag{ 143 }$

for $p_i \in \mathcal{P}$ , and some polynomials $(r_j(X))_j$ and $(s_{k,l}(X))_{k,l}$ . An algorithm for finding a sum of Hermitian squares decompositions for Hermitian polynomials in non-commuting variables based on SDP was given in [173]. The SoS decompositions were used to provide the so-called self-testing of quantum states and measurements in [21, 202, 339], see [295] for a review and discussion.

As a direct example of the application of SoS in quantum information, we will analyze the so-called quantum moment problem. In this problem, we ask whether, for a given probability distribution, there exists a quantum state and measurements that produce such distribution (see section 1.4). A complementary problem is to decide whether a given instance of the quantum moment problem is unsatisfiable. A crucial tool for this purpose is one of the versions of the Positivstellensatz. Positivstellensatze are theorems in real algebraic geometry that provide a way to determine whether a polynomial is positive on a semi-algebraic set, or if it can be written as an SoS; see [34, chapter 4] of an introduction for the commutative variables case, and [143, 278] for the non-commutative variables case.

Suppose that the quantum setup of interest involves the measurement operators $\mathbf{X} = (X_i)_i$ . We briefly show that many of the desired properties of measurements can be expressed as a requirement that certain polynomials of these operators vanish. One of the requirements is that the measurements over different subsystems commute. Indeed, the condition $[X_i, X_j] = 0$ is equivalent to statement the Hermitian polynomial $i [X_i, X_j]$ is equal zero. If for a certain set $\mathcal{M}$ the operators $\{X_i\}_{i \in \mathcal{M}}$ are constituting a projective measurement, then the normalization condition is expressed as the requirement that the Hermitian polynomial ${\unicode{x1D7D9}} - \sum_{i \in \mathcal{M}} X_i$ is equal zero; similarly the idempotency condition of X_i is expressed as $X_i^2 - X_i$ is equal zero. Another simple constraint possible to be expressed with vanishing Hermitian polynomials is the requirement that X has eigenvalues in $\{+1,-1\}$ ; this is equivalent to the requirement that ${\unicode{x1D7D9}} -X^2$ is equal zero. Let us define $\mathcal{P}$ to be the set of all Hermitian polynomials of one of these forms, as well as their negatives, where each of the polynomials imposes a constraint that is desired in a given scenario.

Consider an expression $G \equiv \sum_{\mathbf{a}, \mathbf{x}} \alpha_{\mathbf{a}|\mathbf{x}} p_{\mathbf{a}|\mathbf{x}}[\mathbf{X}]$ . We are interested in finding the Tsirelson bound [72] of such an expression. If for any quantum state $| \, \Phi \rangle$ and measurement operators X it holds $\langle \Phi \, | G[\mathbf{X}] | \, \Phi \rangle \unicode{x2A7D} q$ for some $q \in \mathbb{R}$ , then trivially $q \cdot {\unicode{x1D7D9}} - G \succeq 0$ . In [85, theorem 4.3] the following form of Positivstellensatz was given: if $\left[ {\Large{\forall}}_{\mathbf{X}} \left( {\Large{\forall}}_{p \in \mathcal{P}} p[\mathbf{X}] = 0 \implies {\Large{\forall}}_{| \, \Phi \rangle} \langle \Phi \, | (q \cdot {\unicode{x1D7D9}} - G) | \, \Phi \rangle \gt 0 \right) \right]$ , then $v \cdot {\unicode{x1D7D9}} - G$ is WSoS. In other words, if the Hermitian polynomial $q \cdot {\unicode{x1D7D9}} - G$ in non-commuting variables X is a PD operator (expressed as ${\Large{\forall}}_{| \, \Phi \rangle} \langle \Phi \, | (q \cdot {\unicode{x1D7D9}} - G) | \, \Phi \rangle \gt 0$ ) under the assumption that X are quantum measurements (expressed as ${\Large{\forall}}_{p \in \mathcal{P}} p[\mathbf{X}] = 0$ ), then it can be written in the WSoS form. Thus, this Positivstellensatz says that if there exists no quantum state and measurements attaining the value q, then $q \cdot {\unicode{x1D7D9}} - G$ is WSoS.

Now, the question is, how to derive the value of q using SDP. We do not have information regarding the degree of polynomials $(r_j(X))_j$ and $(s_{k,l}(X))_{k,l}$ occurring in (143). The result of [85, section 5] is a hierarchy of relaxations allowing to get a sequence q_n such that $\lim_{n \to \infty} q_n = q$ , with $q_n \unicode{x2A7E} q$ . For the level n of the relaxation, we require that r_j and $s_{k,l}$ are of degree at most n and n − 1, respectively.

Consider the first level, n = 1 in the CHSH scenario [73], with $G = A_1 B_1 + A_1 B_2 + A_2 B_1 - A_2 B_2$ , where $E^a_x$ and $F^b_y$ are commuting projective measurement operators of Alice and Bob, respectively, see section 1.4, and $A_x \equiv E^0_x - E^1_x$ and $B_y \equiv F^0_y - F^1_y$ . The basis of polynomials of degree 1 in variables occurring in G is e.g. $x^{(1)} = [A_1; A_2; B_1; B_2]$ . The SoS part of WSoS (143) is thus $\sum_j r_j^{\dagger} r_j = x^{(1)T} M x^{(1)}$ , see (142), for some $M \succeq 0$ . The only constraint expressed in $\mathcal{P}$ are those imposing that A_x and B_y have eigenvalues in $\{+1,-1\}$ ; let us denote the relevant constraint polynomials as $p^{(A)}_x \equiv ({\unicode{x1D7D9}} - A_x^2) \in \mathcal{P}$ and $p^{(B)}_y \equiv ({\unicode{x1D7D9}} - B_y^2) \in \mathcal{P}$ , respectively. Indeed, the polynomials $p^{(A)}_x$ and $p^{(B)}_y$ vanish if and only if these eigenvalue constraints on A_x and B_y are satisfied. Since $s_{k,l}$ are at this level all of degree $n-1 = 0$ , they are real numbers; let us denote them as $\gamma^{(A)}_x \in \mathbb{R}$ and $\gamma^{(B)}_y \in \mathbb{R}$ , respectively. The WSoS decomposition (143) is now of the form:

$\begin{align} q_1 \cdot {\unicode{x1D7D9}} - \left(A_1 B_1 + A_1 B_2 + A_2 B_1 - A_2 B_2\right) = x^{\left(1\right)\mathrm{T}} M x^{\left(1\right)} + \sum_{x \in \left[2\right]} \gamma^{\left(A\right)}_x A_x + \sum_{y \in \left[2\right]} \gamma^{\left(B\right)}_y B_y. \end{align} \tag{ 144 }$

To find the value q₁ we can solve the problem of minimizing q₁ subject to (144) and $M \succeq 0$ ; this clearly is an SDP. The result is

$\begin{align} \begin{aligned} & 2 \sqrt{2} \cdot {\unicode{x1D7D9}} - \left(A_1 B_1 + A_1 B_2 + A_2 B_1 - A_2 B_2\right) \\ & \quad = \frac{1}{2 \sqrt{2}} \left( r_1^{\dagger} r_1 + r_2^{\dagger} r_2 \right) + \frac{1}{\sqrt{2}} \left( p^{\left(A\right)}_1 + p^{\left(A\right)}_2 + p^{\left(B\right)}_1 + p^{\left(B\right)}_2 \right), \end{aligned} \end{align} \tag{ 145 }$

where $r_1 \equiv A_1 + A_2 - \sqrt{2} B_1$ and $r_2 \equiv A_1 - A_2 - \sqrt{2} B_2$ . The Tsirelson bound is thus $2 \sqrt{2}$ .

4.5. Lovász theta and contextuality

The concept of zero-error capacity plays a significant role in information theory as it pertains to the flawless transmission of information through a communication channel. This notion is of utmost importance as it guarantees the reliability and precision of data transfer, which holds immense significance in diverse domains, including telecommunications, computer networking, and cryptography. The notion of a zero-error capacity of a channel represented by a graph was introduced by Shannon in 1956 in the paper The zero error capacity of a noisy channel [283], as defined below. The calculation of this entity, unfortunately, poses significant challenges. Lovász addressed this problem in [196] by formulating an SDP relaxation known as the Lovász θ function, or Lovász number. The introduction of this function had a profound influence not only in classical and quantum information theories [71, 76, 88] but also in related fields such as graph theory [164, 171]. Its impact transcends disciplinary boundaries, highlighting its importance and wide-ranging implications.

Consider an alphabet consisting of n letters that need to be communicated through an erroneous channel. We can represent this communication scenario using a graph G, where each vertex corresponds to a letter from the alphabet. The edges of the graph indicate the possible confusion between letters, based on the communication model being considered. It is evident that the count of one-letter messages that are guaranteed not to be confused is equivalent to the size of the largest independent set in the graph, denoted as $\alpha(G)$ .

In the context of zero-error communication in the asymptotic limit, where multiple uses of the communication channel and non-trivial coding schemes are allowed, it often becomes possible to transmit a higher average number of letters per channel use. To illustrate this concept, let us consider the number of k-letter messages that can be transmitted without confusion, denoted as $\alpha(G^{k})$ . It is observed that $\alpha(G^{k}) \unicode{x2A7E} \alpha(G)^{k}$ , indicating that the size of the largest independent set, $\alpha(G)$ , raised to the power of k provides a lower bound on the number of distinct messages that can be encoded without the risk of confusion. For instance, if a single-letter message allows for $l = \alpha(G)$ different messages without confusion, then with k letters, we can encode at least l^k distinct messages without the risk of confusion. As an example, consider the cycle graph C₅ with five vertices, where $\alpha(C_5) = 2$ and $\alpha(C_5^2) = 5$ .

The Shannon capacity of a graph G is a measure defined as follows. It is denoted by $\Theta(G)$ and is given by the supremum over all values of k of the expression $\alpha(G^{k})^{\frac{1}{k}}$ , i.e. over all possible lengths of messages encoding a chunk of information:

$\begin{align} \Theta\left(G\right) = \sup_{k} \alpha\left(G^{k}\right)^{\frac{1}{k}}. \end{align} \tag{ 146 }$

Here, $\alpha(G^{k})$ represents the size of the largest independent set in the graph G^k, which is obtained by taking the strong product of G with itself k times. The Shannon capacity provides an important characterization of the graph's ability to transmit information without errors, and it is widely used in the field of information theory.

To provide the formulation of SDP relaxation of the Shannon capacity, and thus also of the independence number of a graph, introduce the concept of the strong product of two graphs. Let us consider a pair of graphs, G and H, and define their strong product as $G \boxtimes H$ . The vertex set of $G \boxtimes H$ , denoted as $V(G \boxtimes H)$ , is the Cartesian product of the vertex sets of G and H, i.e. $V(G \boxtimes H) = V(G) \times V(H)$ . In this construction, a vertex $(x_1, y_1)$ in $G \boxtimes H$ is adjacent to another vertex $(x_2, y_2)$ if and only if one of the following conditions holds:

vertex x₁ is adjacent to vertex x₂ in G, and vertex y₁ is adjacent to vertex y₂ in H;
vertex x₁ is equal to vertex x₂, and vertex y₁ is adjacent to vertex y₂ in H; or
vertex x₁ is adjacent to vertex x₂ in G, and vertex y₁ is equal to vertex y₂.

This construction is known as the strong product of graphs. To extend this notion, we define $G^1 = G$ , and for k + 1, we have $G^{k+1} = G^k \boxtimes G$ .

An orthonormal representation (OR) of G in $\mathbb{R}^d$ for some d is a set of vectors $\{| \, u_i \rangle\}_{i \in V(G)} \subset \mathbb{R}^d$ satisfying $\langle u_i , u_j \rangle = 0$ for all pairs of non-adjacent vertices $i,j \in V(G)$ . We denote by $\mathcal{OR}(G)$ the set of all OR of G in any dimension. The value of the OR is

$\begin{align} \min_{\substack{| \, \Psi \rangle \in \mathbb{R}^d, \\ \left| | \, \Psi \rangle = 1 \, \right|}} \max_{i \in V\left(G\right)} \frac{1}{\left| \langle \Psi , u_i \rangle \, \right|^2}. \end{align} \tag{ 147 }$

Any Ψ for that the minimum in (147) is attained, is called a handle of OR. If $\{| \, u_i \rangle\}_{i \in V(G)}$ and $\{| \, v_j \rangle\}_{j \in V(H)}$ are OR of graphs G and H, then $\{| \, u_i \rangle \otimes | \, v_j \rangle\}_{i \in V(G), j \in V(H)}$ is an OR of the graph $G \boxtimes H$ [196, p 2]. If the vectors are in $\mathbb{C}^d$ instead of $\mathbb{R}^d$ , then the term orthogonal embedding is used instead of OR. The orthogonal rank of G, denoted $\xi(G)$ is the smallest positive integer d such that there exists an orthogonal embedding [44, 59]. Lovász's function, denoted as $\theta(G)$ , is defined as the minimum of (147) over $\mathcal{OR}(G)$ .

Lovász's $\theta(G)$ plays an important role in the study of the Shannon capacity. It possesses the property that the Shannon capacity $\Theta(G)$ , is bounded from above by $\theta(G)$ , $\Theta(G) \unicode{x2A7D} \theta(G)$ . To provide an SDP formulation of $\theta(G)$ , let us consider a set $\mathcal{A}$ consisting of all symmetric matrices $\{A_i\}_i$ that satisfy the following conditions: for any two nodes i and j in the graph G, if i = j or if i and j are not adjacent in G, then the entry A_ij is set to 1. The remaining entries of these matrices are left unconstrained. The value of $\theta(G)$ is equal to the minimum of the largest eigenvalue among all matrices in the set $\mathcal{A}$ [196, theorem 3]. This relaxation technique provides an effective approach to approximate the Shannon capacity of a graph and is possible to be expressed as an SDP. Indeed, the constraints defining the set $\mathcal{A}$ are linear, and the SDP is

$\begin{align} \begin{split} \text{minimize } & \lambda \\ \text{subject to } & X \in \mathcal{A} \\ & \lambda \cdot {\unicode{x1D7D9}} - X \succeq 0. \end{split} \end{align} \tag{ 148 }$

Alternatively, it can be shown [196, theorem 5] that

$\begin{align} \theta\left(G\right) = \max_{\left\{| \, u_i \rangle\right\}_i \in \mathcal{OR}\left(\bar{G}\right)} \max_{\substack{| \, \Psi \rangle \in \mathbb{R}^d, \\ \left| | \, \Psi \rangle \, \right| = 1}} \sum_{i \in V\left(G\right)} \left| \langle \Psi , u_i \rangle \, \right|^2, \end{align} \tag{ 149 }$

where $\bar{G}$ is the complementary graph of G.

In [129] Grötschel, Lovász and Schrijver introduced the weighted version of θ function, intending to derive the maximum weight independent sets in perfect graphs. Let $\mathbf{w} = (w_i)_{i \in V(G)}$ be weights of nodes in a weighted graph G. The generalization of the θ function (147) to weighted graphs is defined as [131, p 4]:

$\begin{align} \theta\left(G, \mathbf{w}\right) \equiv \min_{\left\{| \, u_i \rangle\right\}_i \in \mathcal{OR}\left(G\right)} \min_{\substack{| \, \Psi \rangle \in \mathbb{R}^d, \\ \left| | \, \Psi \rangle \, \right| = 1}} \max_{i \in V\left(G\right)} \frac{w_i}{\left| \langle \Psi , u_i \rangle \, \right|^2}. \end{align} \tag{ 150 }$

A direct generalization of (149) is [131, theorem 2.3]:

$\begin{align} \theta\left(G, \mathbf{w}\right) = \max_{\left\{| \, u_i \rangle\right\}_i \in \mathcal{OR}\left(\bar{G}\right)} \max_{\substack{| \, \Psi \rangle \in \mathbb{R}^d, \\ \left| | \, \Psi \rangle \, \right| = 1}} \sum_{i \in V\left(G\right)} w_i \left| \langle \Psi , u_i \rangle \, \right|^2. \end{align} \tag{ 151 }$

The value of (151) can also be expressed as SDP [129].

The fundamental role of the Lovász θ for quantum correlations and contextuality was recognized in [57] and developed in [58]. The results state that the maximal value of correlations allowed by quantum mechanics is given by the Lovász number of the so-called exclusivity graph. Here we briefly sketch the results, and we refer to [3] for a discussion and further advancements.

The exclusivity graph of a multi-partite correlation experiment represents the possible events with vertices and the exclusion of pairs of events by edges. We say that the events e₁ and e₂ are exclusive if and only if there exist two jointly measurable observables (tests) µ_i and µ_j that distinguish between them. The experiments with space-like separated tests are Bell inequalities [25, 73] as discussed in section 1.4. More general scenarios are non-contextual inequalities, which distinguish between theories in which outcomes are predefined from contextual theories, including quantum mechanics [120, 176, 287]. For instance, in the CHSH Bell experiment [73] there are four tests, each providing binary results, viz. two measurements performed by Alice, and two measurements performed by Bob.

Consider a positive linear combination of events, or positive non-contextual game expression, of the form $\sum_i w_i P(e_i)$ , with all $w_i \gt 0$ . The CHSH Bell inequality [73] can be rewritten in this form as

$\begin{align} \sum_{a \in \left\{0,1\right\}} P\left(a,a|0,0\right) + P\left(a,a|0,1\right) + P\left(a,a|1,0\right) + P\left(a,\neg a|1,1\right)\unicode{x2A7D} 3. \end{align} \tag{ 152 }$

The exclusivity graph of the positive non-contextual game expression is the induced subgraph of the exclusivity graph of the experiment, see figure 1. In [58] it was shown that from (151) it follows that the attainable upper bound on the positive non-contextual game expression in quantum mechanics is exactly $\theta(G, \mathbf{w})$ , where G is the exclusivity graph of the positive non-contextual game expression [273].

**Figure 1.** Exclusivity graph for the two-partite Bell scenario with two settings and two outcomes for each of the parties. The events are labeled as $ab|xy$ , with the settings of Alice and Bob denoted as x and y, and their outcomes as a and b. Sets of pairwise exclusive events are those lying on the same line or in the same circle. The exclusivity graph for the positive non-contextual game expression (152) is the induced subgraph containing only the black nodes.
Download figure:
Standard image High-resolution image

**Figure 1.** Exclusivity graph for the two-partite Bell scenario with two settings and two outcomes for each of the parties. The events are labeled as $ab|xy$ , with the settings of Alice and Bob denoted as x and y, and their outcomes as a and b. Sets of pairwise exclusive events are those lying on the same line or in the same circle. The exclusivity graph for the positive non-contextual game expression (152) is the induced subgraph containing only the black nodes.
Download figure:
Standard image High-resolution image

The work [82] reveals another association between the Lovász number and fundamental quantum phenomena, viz. the uncertainty relations [139, 167, 270], which characterize the limitations in precisely predicting outcomes of simultaneous measurements in quantum mechanics. The findings have practical implications and can be applied to formulate entropic uncertainty relations, separability criteria, and entanglement witnesses.

The mentioned above orthogonal rank $\xi(G)$ has a direct relation to the one-round quantum communication complexity of calculation of a function $f(x,y)$ , which is equal to $\lceil \log_2(\xi(G)) \rceil$ , with the promise that the joint input $(x,y) \in \mathcal{D} \subset V(G) \times \mathcal{Y}$ . The vertices i and j in G are connected if and only if ${\Large{\exists}}_{y \in \mathcal{Y}} (i,y) \in \mathcal{D} \wedge (j,y) \in \mathcal{D}$ [83, theorem 8.5.2]. As discussed in [225], the so-called almost quantum or $\mathfrak{Q}_{1+AB}$ SDP relaxation of the set of quantum behaviors, see section 4.6, is also closely related to the Lovász θ.

4.6. Correlation matrices, moment matrices, and optimization over non-commuting variables

Consider a sequence of real-valued random variables $\mathcal{S} = (x_1, \ldots x_n)$ . The covariance matrix of $\mathcal{S}$ is defined as the matrix whose entries are given by relevant covariations of pairs of variables, $\textrm{Cov}[\mathcal{S}] = \left[ \textrm{cov}[x_i, x_j] \right]_{x_i, x_j \in \mathcal{S}} \in \mathbb{R}^{\mathcal{S} \times \mathcal{S}}$ . The rows and columns, in this case, could be indexed with the numbers of the variables, and thus belong to $[n]$ and then it would be $\textrm{Cov}[\mathcal{S}] \in \mathbb{R}^{n \times n}$ . Instead, we take a more generic approach and index the rows and columns with the variables themselves. Such matrices, indexed by labels or other expressions are called moment matrices. Now, consider a vector of constant coefficients $v = (v_1, \ldots v_n) \in \mathbb{R}^{\mathcal{S} \times 1}$ . We have

$\begin{align} v^\mathrm{T} \textrm{Cov}\left[\mathcal{S}\right] v = \sum_{i,j \in \left[n\right]} v_i \cdot \textrm{cov}\left[x_i, x_j\right] \cdot v_j = \textrm{cov} \left[ \sum_{i \in \left[n\right]} x_i, \sum_{j \in \left[n\right]} x_j \right] = \textrm{var} \left[ \sum_{i \in \left[n\right]} x_i \right] \unicode{x2A7E} 0, \end{align} \tag{ 153 }$

and thus the covariance matrix is PSD. The correlation matrix $\textrm{corr}(\mathcal{S})$ is the matrix whose entries are given by relevant correlation of pairs of variables, i.e. $\textrm{Corr}[\mathcal{S}] = \left[ \textrm{corr}[x_i, x_j] \right]_{x_i \in \mathcal{S}, x_j \in \mathcal{S}} \in \mathbb{R}^{\mathcal{S} \times \mathcal{S}}$ . Equivalently, the correlation matrix is equal to the covariance matrix of all variables rescaled to have variance 1, $\textrm{Corr}[\mathcal{S}] = \textrm{Cov}[\bar{\mathcal{S}}]$ , for $\bar{\mathcal{S}} = (x_1 / {\boldsymbol{\sigma}} \left[ x_1 \right], \ldots x_n / {\boldsymbol{\sigma}} \left[ x_n \right])$ . In consequence, all diagonal values of the correlation matrix are 1 and the matrix is also PSD. If variables are linearly independent, then it is PD. A sample numerical calculation with a correlation matrix is given in appendix B.2. The concept of moment matrices is used in the methods discussed further in this section.

The methods discussed in this work using moment matrices find a major application in optimizing linear functions of probabilities in various quantum scenarios, such as entangled parties with separated sharing or communication of a quantum system with a given dimension. These methods have gained significant popularity due to their effectiveness and applicability in quantum optimization tasks. It is worth noting that earlier studies in 2004 focused on optimizing success probabilities without utilizing moment matrices [94]. Instead, they relied on the formalism of Lagrangian in close connection with the concept of convex optimization. Particularly, researchers explored the problem of determining optimal success probabilities for static linear optics quantum gates, and intriguingly, it was found to be related to convex optimization theory. Through this connection, they successfully derived upper bounds for the success probability of networks implementing single-mode gates. Moreover, the concept of Lagrange duality played a crucial role in providing rigorous proofs for these derived bounds.

4.6.1. The NPA hierarchy.

Now, we provide a brief introduction to the optimization over non-commuting variables, with a concentration on the so-called NPA method. The NPA method is based on SDP, as presented in the paper Bounding the set of quantum correlations by Navascués et al (2007) [226, 227]. Moment matrices are a basic element of the technique. The concept was inspired by the seminal work by Lasserre [184].

The problem at hand is to find a way to characterize, at least approximately, the class of all quantum behaviors $\mathfrak{Q}$ without resorting to the formalism of quantum theory. To this end, the NPA method introduces a hierarchy of SDP problems $\left\{\mathfrak{Q}_k\right\}_{k = 1}^{\infty}$ . Each level of the hierarchy corresponds to a specific SDP problem, where higher levels yield more accurate solutions. In other words, as we increase the level k, the set $\mathfrak{Q}_{k+1}$ becomes a subset of $\mathfrak{Q}_k$ , $\mathfrak{Q}_{k+1} \subset \mathfrak{Q}_k$ , providing a progressively better approximation of $\mathfrak{Q}$ . However, as the level increases, the SDPs become more complex and computationally demanding. It is important to note that the hierarchy of SDP problems converges to the quantum set $\mathfrak{Q}$ , meaning that the intersection of all sets in the hierarchy is equal to $\mathfrak{Q}$ , viz. $\cap_{k = 1}^{\infty} \mathfrak{Q}_k = \mathfrak{Q}$ . By considering all levels of the hierarchy, we can accurately capture the entire quantum set $\mathfrak{Q}$ without explicit involvement of the formalism of Hilbert spaces, and when we restrict considerations to a particular level, then we can effectively approximate the optimization over the set of all quantum behaviors $\mathfrak{Q}$ using SDP.

A sequence of operators, which is formed by concatenating projective measurement operators, plays a crucial role in the context of quantum systems. Consider an illustrative example of such a sequence, denoted as $E^1_2 E^3_2 F^2_1 E^1_1$ , consisting of four operators. The operators associated with Alice, denoted as $E^a_x$ , commute with the operators corresponding to Bob, denoted as $F^b_y$ . This allows us to rearrange the sequence without altering the original action of the operators. Thus, we can rewrite the sequence as $E^1_2 E^3_2 E^1_1 F^2_1$ by interchanging the operators while respecting the commutation relationship between Alice's and Bob's measurements. This reordering is made possible by the commutativity property exhibited by the operators belonging to Alice's and Bob's measurements.

In a sequence of operators, we can exploit the orthogonality property $E^a_x E^{a^{^{\prime}}}_x = 0$ and $F^b_y F^{b^{^{\prime}}}_y = 0$ for $a \neq a^{^{\prime}}$ and $b \neq b^{^{\prime}}$ . Applying this property and utilizing the commutation property, we can rearrange operators within the sequence. For instance, let us consider the expression $E^2_1 F^3_3 E^1_1$ . By utilizing the commutation property, we can rearrange the operators as $E^2_1 E^1_1 F^3_3$ , which equals zero due to the orthogonality between $E^2_1$ and $E^1_1$ . Additionally, it is worth noting that since $E^a_x$ and $F^b_y$ are projectors, we have the property $(E^a_x)^k = E^a_x$ for any $k \unicode{x2A7E} 1$ , and the same holds for $F^b_y$ . This property further aids in simplifying the expressions involving repeated application of the projectors. To characterize the length of a sequence of operators, we define it as the minimum number of projectors required to represent the sequence. In this context, we consider the identity operator ${\unicode{x1D7D9}}$ as the null sequence, denoting no projectors, and its length is defined to be zero. This notion of length provides a measure of the complexity or number of steps involved in a sequence of operators.

We will now be considering sets of sequences of operators from the reduced set of operators discussed in section 1.4. Using the NPA method, we can construct a hierarchy of relaxations by choosing different sets of sequences. Specifically, we define a set $\mathcal{S}_k$ to be the set of all sequences of operators $\{E^{\tilde{a}}_x, F^{\tilde{b}}_y\}_{\tilde{a},\tilde{b},x,y}$ of the length at most k (including the sequence of length 0, i.e. ${\unicode{x1D7D9}}$ ), with the indices $\tilde{a}$ and $\tilde{b}$ covering all values excluding the last one. We also define the so-called intermediate sets of sequences, where only specific sequences are included, for instance

$\begin{align} \mathcal{S}_{1+AB} = \mathcal{S}_1 \cup \left\{E^a_x F^b_y \right\}_{\tilde{a},\tilde{b},x,y} = \left\{{\unicode{x1D7D9}}, E^{\tilde{a}}_x, F^{\tilde{b}}_y, E^{\tilde{a}}_x F^{\tilde{b}}_y \right\}_{\tilde{a},\tilde{b},x,y}. \end{align} \tag{ 154 }$

The hierarchy of level $\mathfrak{Q}_{2}$ means that the set $\mathcal{S}$ consists of all sequences of measurement operators of length 2, whereas in level $\mathfrak{Q}_{1+AB}$ , $\mathcal{S}$ is a set of all sequences of length 1 and sequences with one operator of Alice and one of Bob. $\mathfrak{Q}_{1+AB}$ revealed to be so efficient that it is called an almost quantum set of behaviors [225].

The key idea of the NPA method can be summarized as follows. Consider a behavior $\{P(a,b|x,y) \}$ and suppose that it is quantum. This means that there exists a specific realization involving a quantum state $| \, \psi \rangle$ and projective measurements $\{E^a_x, F^b_y \}$ such that, for all settings x and y, and outcomes $\tilde{a}$ and $\tilde{b}$ , the relation (8) holds, and expresses the probability of obtaining outcomes a and b when measurements x and y are performed on the quantum state $| \, \psi \rangle$ . In the context of the NPA method, the notion of moment matrices is used as follows. For any operators O_i and O_j belonging to the set $\mathcal{S}$ of size n, we define the element of the moment matrix as:

$\begin{align} \Gamma_{O_i, O_j} \equiv \langle \psi \, | O_i^{\dagger} O_j | \, \psi \rangle. \end{align} \tag{ 155 }$

This equation establishes a connection between certain elements of the moment matrix and joint probability distributions. Specifically, we have

$\begin{align} \Gamma_{E^{\tilde{a}}_x, F^{\tilde{b}}_y} = P\left(\tilde{a},\tilde{b}|x,y\right), \end{align} \tag{ 156 }$

which demonstrates that the elements of the moment matrix correspond to the probabilities of obtaining outcomes $\tilde{a}$ and $\tilde{b}$ for the measurements x and y. Additionally, we have the element $\Gamma_{{\unicode{x1D7D9}}, {\unicode{x1D7D9}}} = 1$ , which represents the identity operator, indicating that its contribution to the moment matrix is unity, and $| \, \psi \rangle$ is normalized. This definition results in an n × n moment matrix, where the rows and columns are indexed by the elements of the set $\mathcal{S}$ . Hence, the moment matrix serves as a representation of the moments associated with the considered behavior. For instance, the sequence of operators with a length of at most 1 in the case of the reduced set (9) is represented by $\mathcal{S}_1 = \{{\unicode{x1D7D9}}, E^0_1, E^0_2, F_1^0, F_2^0\} \equiv \{O_1, O_2, O_3, O_4, O_5\}$ . From (155) and (156) we have $P_A(0|0) = \Gamma_{{\unicode{x1D7D9}},E^0_0} \equiv y_1$ , $P_A(0|1) = \Gamma_{{\unicode{x1D7D9}},E^0_1} \equiv y_2$ , $P_B(0|0) = \Gamma_{{\unicode{x1D7D9}},F^0_0} \equiv y_4$ , $P_B(0|1) = \Gamma_{{\unicode{x1D7D9}},F_1^0} \equiv y_7$ , $P(0,0|0,0) = \Gamma_{E^0_0,F^0_0} \equiv y_5$ , $P(0,0|1,0) = \Gamma_{E^0_1,F^0_0} \equiv y_6$ , $P(0,0|0,1) = \Gamma_{E^0_0,F_1^0} \equiv y_8$ , and $P(0,0|1,1) = \Gamma_{E^0_1,F_1^0} \equiv y_9$ . This leads to the following formula for the Γ matrix, which is in this case a $5 \times 5$ real matrix:

$\begin{align} \begin{bmatrix} 1 & y_1 & y_2 & y_4 & y_7 \\ y_1 & y_1 & y_3 & y_5 & y_8 \\ y_2 & y_3 & y_2 & y_6 & y_9 \\ y_4 & y_5 & y_6 & y_4 & y_{10} \\ y_7 & y_8 & y_9 & y_{10} & y_7 \end{bmatrix}. \end{align} \tag{ 157 }$

The probabilities not occurring directly in the matrix can be derived from the non-signaling constraints (7). For instance, from the matrix (157) we have $P(0,1|0,1) = P_A(0|0) - P(0,0|0,1) = y_1 - y_8$ .

We can observe that the elements of the moment matrix Γ are subject to the following linear constraints. For any indices i, j, k, and l, the equality $\Gamma_{O_i,O_j} = \Gamma_{O_k,O_l}$ holds whenever the corresponding operators satisfy $O_i^{\dagger} O_j = O_k^{\dagger} O_l$ , i.e. $O_i^{\dagger} O_j = O_k^{\dagger} O_l \implies \Gamma_{O_i,O_j} = \Gamma_{O_k,O_l}$ . This constraint ensures that the inner products of identical operator sequences yield equal moments. Similarly, if $O_i^{\dagger} O_j$ results in the zero operator, then it implies that $\Gamma_{O_i,O_j} = 0$ . This condition ensures that the moments associated with operator sequences resulting in the null operator are also zero. These linear constraints provide necessary relations between the elements of the moment matrix, allowing us to impose consistency and capture important properties of the behavior. Positive semi-definiteness of Γ is a direct consequence of (155). Indeed, let $v \in \mathbb{C}^n$ . For $V = \sum_j v_j O_j$ we have

$\begin{align} v^{\dagger} \Gamma v = \sum_{i,j} v_i^{*} \Gamma_{O_i,O_j} v_j = \sum_{i,j} v_i^{*} \langle \psi \, | O_i^{\dagger} O_j | \, \psi \rangle v_j = \langle \psi \, | V^{\dagger} V | \, \psi \rangle = \left| V | \, \psi \rangle \, \right|^2 \unicode{x2A7E} 0, \end{align} \tag{ 158 }$

and thus $\Gamma \succeq 0$ . To summarize, we observe the following:

The sets $\mathcal{S}_1 \subset \mathcal{S}_2 \cdots \subset \mathcal{S}_{\infty}$ form an increasing sequence, where each set contains longer sequences of measurement operators.
The hierarchy levels $\mathfrak{Q}_1 \supset \mathfrak{Q}_2 \cdots \supset \mathfrak{Q}$ form a decreasing sequence, indicating a refinement of the approximation to the quantum set.
The quantum set $\mathfrak{Q}$ is equal to the intersection of all levels Q_k , i.e. $\mathfrak{Q} = \bigcap_{k = 1}^{\infty} Q_k$ .

The final equality, which pertains to convergence to the quantum set, has been proven in [227]. The sizes of the sets $\mathcal{S}_k$ and, consequently, the sizes of the Γ matrices, grow exponentially, specifically as $O \left( ( \left| A \, \right| \cdot \left| X \, \right| + \left| B \, \right| \cdot \left| Y \, \right|)^k \right)$ . In practice, sets beyond $\mathfrak{Q}_3$ are rarely utilized, and due to finite precision arithmetics large matrices may cause numerical issues. The set $\mathfrak{Q}_{1+AB}$ is generally sufficient for most purposes and is often referred to as the almost quantum set. The set $\mathfrak{Q}_1$ is often called the macroscopic locality set [229]. A comparison of the primal and dual approaches for imposing the aforementioned operator constraints is presented in table 1. A comprehensive discussion on this topic can be found in section 2.3.1 of [211]. It is worth noting that when expressing the constraint of NPA optimizations, the parameter m representing the size of the canonical SDP form in (80) and (81) is significantly smaller if we opt for the latter approach. It can be seen that the above example (157) is written in the form $C - \sum_i y_i A_i$ of the canonical dual formulation (81), and can be easily expressed in other forms discussed in section 3.2.

There are multiple variants and extensions of NPA. In [254] the inventors of NPA showed how to apply their techniques to the general problem of polynomial optimization over non-commuting variables. In [163] it was shown how to use NPA to analyze the so-called extended non-local games. These games involve three parties, viz. Alice, Bob, and a referee. Initially, Alice and Bob share a tripartite quantum state with the referee. In these games, the conditions for Alice and Bob to win may depend not only on their answers to randomly selected questions but also on the outcomes of measurements performed by the referee on its portion of the shared quantum state. In a recent work [257] a method for analysis of classical and quantum correlations in networks with causally independent parties was introduced, providing a way to use NPA in complex quantum networks. Another work analyzing generalizations of NPA for characterization of the quantum network correlations, together with convergence results was given in [269], see [301] for an overview.

The almost quantum correlations are applied and discussed in [149, 277]. Their great theoretical importance stems from their role in axiomatics of quantum mechanics [225]. In [225] it was proposed to consider the set of behaviors allowed in the almost quantum relaxation level of NPA as a physical theory and hypothesized that the actual physics of the real world is not the quantum theory but the newly proposed almost quantum theory. It was shown that non-trivial communication complexity, no advantage for non-local computation, and local orthogonality are weaker in determining which behaviors are physical than the almost quantum theory. It was also shown that the almost quantum set is closed under various classical operations, including post-selection, composition, grouping of parties, and so-called wiring operation [11]. In [191] a quantitative comparison of several sets of super-quantum behaviors is given.

In [277, appendix A] the NPA method has been modified to express a relaxation of the set of quantum assemblages [62], i.e. sets of unnormalized states toward which a multipartite state can be steered to [331]. In this method, the Γ matrix' entries are not numbers but matrices themselves allowing for introduction in particular the so-called almost quantum assemblages, see [276] for a physical definition and discussion for its applications. A hierarchy for analysis of quantum steering was also given in [177]. The proposed method enables the derivation of steering witnesses for arbitrary families of quantum states. A framework for the analytical derivation of non-linear steering criteria was also presented.

The work [221] introduces the so-called Moroder's hierarchy, where in addition to the NPA constraints, more restrictions of a certain form regarding entanglement can be imposed on the quantum state. From Moroder's hierarchy a further variant, allowing to imposing of constraints on Bob's measuring devices was given in [260].

A prominent application of NPA is in device-independent quantum cryptography, in particular in quantum randomness certification. The initial methods used a single parameter, the Bell inequality violation, as the certificate for this task [101, 252, 253]. In [22, 239] it was shown how to modify NPA so that the full experimental statistics can be imposed as SDP constraints with the method called more randomness from the same data or the Nieto–Silleras hierarchy. What is more, the method allows to use of the dual optimization task to derive a new Bell functional suited to provide the most randomness from the particular experimental realization. To this end, the method described in section 3.4 is used, where the variable β provides the coefficients of the Bell functional, and the equations $v_j = q_j \cdot x$ express the behavior constraints.

An alternative approach to optimization over non-commutative polynomials is given in [55]. In that work, the problem of minimization of a trace of a given polynomial function in non-commuting variables using SDP is considered. Next, in [174] a method for constrained trace and eigenvalue optimization of noncommutative polynomials was introduced. The results were used in [298] to characterize the classical and quantum correlations that arise in prepare-and-measure experiments when communication is informationally restricted.

4.6.2. Optimization of von Neumann entropy.

In particular, the NPA method can be used to calculate the lower bound of the von Neumann conditional entropy given any kind of knowledge (classical or quantum) that an eavesdropper may possess if is subject to the laws of quantum mechanics. The method uses the SDP representations of the logarithm function discussed in section 4.1 together with the Gauss–Radau quadrature rule for the lower bound. Let w_i and t_i be the nodes and weights defined by this quadrature. Specifically, this method can be employed to compute a lower bound on the von Neumann conditional entropy under the presence of an eavesdropper, considering both classical and quantum knowledge, while adhering to the principles of quantum mechanics. Let $\mathcal{A}$ , $\mathcal{B}$ , and $\mathcal{E}$ denote the Hilbert spaces corresponding to Alice's, Bob's, and the eavesdropper's devices, respectively. The quadrature rule provides a set of nodes, $\{t_i\}_i$ , and weights, $\{w_i\}_i$ , that are utilized in the computation. The lower bound formula for the settings selection $x^{*}$ and $y^{*}$ is given as [49]:

$\begin{align} \sum_{i} c_i \left( \sum_{a,b = 0,1} \inf_{\substack{Z_{a,b} \in B\left(Q_\mathcal{E}\right), \\ \text{cond}\left(P\right)}} \left( 1 + \phi\left[E^a_{x^{*}}, F^b_{y^{*}}, Z_{a,b}, t_i\right] \right) \right), \end{align} \tag{ 159 }$

where $\phi[E^a_{x^{*}}, F^b_{y^{*}}, Z_{a,b}, t_i]$ is defined as

$\begin{align} {\textrm{Tr}} \left[ \rho_{\mathcal{A}\mathcal{B}\mathcal{E}} \left( E^a_{x^{*}} \otimes F^b_{y^{*}} \otimes \left( Z_{a,b} + Z_{a,b}^{\dagger} + \left(1 - t_i\right) Z_{a,b} Z_{a,b}^{\dagger} \right) + t_i \left( {\unicode{x1D7D9}}_{\mathcal{A}\mathcal{B}} \otimes Z_{a,b} Z_{a,b}^{\dagger} \right) \right) \right]. \end{align} \tag{ 160 }$

The expression $\text{cond}(P)$ means that the behavior $\{P(a,b|x,y)\}$ satisfies a set of linear constraints defined by the protocol, and c_i are coefficients calculated from Gauss–Radau quadrature as $c_i \equiv w_i / (t_i \log(2))$ . The i index in the sum (159) assumes values that index nodes in quadrature, skipping the last one. An example of the implementation of a method is given in [48] and described in detail in [47].

4.6.3. Self-testing with SWAP method.

The phenomenon of self-testing is characterized by the ability to assess both states and measurements of certain quantum devices in a black-box setting, relying solely on observed statistics without the need for prior device calibration. However, before the work [340] the existing examples of self-testing are limited in their applicability, as they only provide meaningful assessments for devices that closely resemble the ideal case. In [340] these limitations were overcome by adopting a novel approach to self-testing, utilizing an SDP hierarchy for the characterization of quantum correlations. This approach allows for a more comprehensive and robust assessment of quantum devices, enabling meaningful evaluations even in scenarios where the devices deviate from the ideal case.

We illustrate the method with a bi-partite Bell scenario. Suppose we have gathered an experimental description of the behavior of a device $\{P(a,b|x,y)\}$ , and that we expect that these statistics should have been obtained using a particular quantum state $| \, \bar{\psi} \rangle_{\mathcal{A}\mathcal{B}}$ and projective measurements $\{\bar{E}^a_x, \bar{F}^b_y \}$ . We would like to have a quantitative way of estimating, how close is the actual state that was prepared in the laboratory, considered as a black box described only by the statistics, to the theoretical one $| \, \bar{\psi} \rangle_{\mathcal{A}\mathcal{B}}$ . The work [340] proposed a method to perform such self-testing where the content of the black box is hypothesized to be swapped with a trusted system in a thought experiment; the method itself is called SWAP. Suppose that it is possible to formulate four linear functions $\mathcal{F}_{E,\sigma_x}$ , $\mathcal{F}_{E,\sigma_z}$ , $\mathcal{F}_{F,\sigma_x}$ , and $\mathcal{F}_{F,\sigma_z}$ , such that $\mathcal{F}_{E,\sigma_x}[\{\bar{E}^a_x \}] = \sigma_x$ , $\mathcal{F}_{E,\sigma_z}[\{\bar{E}^a_x \}] = \sigma_z$ , $\mathcal{F}_{F,\sigma_x}[\{\bar{F}^b_y \}] = \sigma_x$ , and $\mathcal{F}_{F,\sigma_z}[\{\bar{F}^b_y \}] = \sigma_z$ . Recall the SWAP operator given in (131). For $d_S = 2$ it takes the form

$\begin{align} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\0 & 0 & 0 & 1 \end{bmatrix} = U\left[\sigma_x\right] \cdot V\left[\sigma_z\right] \cdot U\left[\sigma_x\right], \end{align} \tag{ 161 }$

where

$\begin{align} U\left[X\right] \equiv {\unicode{x1D7D9}}_{2} \otimes | \, 0 \rangle \langle 0 \, | + X \otimes | \, 1 \rangle \langle 1 \, |, \text{and} \end{align} \tag{ 162a }$

$\begin{align} V\left[X\right] \equiv \frac{1}{2} \left( \left({\unicode{x1D7D9}}_{2} + X\right) \otimes {\unicode{x1D7D9}}_{2} + \left({\unicode{x1D7D9}}_{2} - X\right) \otimes \sigma_x \right). \end{align} \tag{ 162b }$

The black box is possibly using some other quantum state $| \, \psi \rangle_{\mathcal{A}\mathcal{B}}$ and measurements $\{E^a_x, F^b_y \}$ to implement $P(a,b|x,y)$ , but if the considered scenario possesses the self-testing property, then one can expect that the state and measurements will be not very far from the theoretical ones $| \, \bar{\psi} \rangle$ and $\{\bar{E}^a_x, \bar{F}^b_y \}$ . Now, let us perform a thought experiment with hypothesized swapping of the black-box state $| \, \psi \rangle_{\mathcal{A}\mathcal{B}}$ with trusted ancillary states $| \, \bar{\phi}_1 \rangle_{\mathcal{A}^{\prime}}$ and $| \, \bar{\phi}_2 \rangle_{\mathcal{B}^{\prime}}$ using the black-box measurements $\{E^a_x, F^b_y \}$ together with trusted operations on the ancillas, where we denote by $\mathcal{A}$ and $\mathcal{B}$ their black-box subsystems, and by $\mathcal{A}^{\prime}$ and $\mathcal{B}^{\prime}$ the subsystems of their trusted ancillas. Let us consider the ancillas to be qubits, so that we can apply (161) to each party, Alice and Bob. Since the operations $\{E^a_x\}$ on the black-box subsystem of Alice are expected to approximate to some extent $\{\bar{E}^a_x\}$ , we may expect that $\mathcal{F}_{E,\sigma_x}[\{E^a_x \}] \approx \sigma_x$ and $\mathcal{F}_{E,\sigma_z}[\{E^a_x \}] \approx \sigma_z$ ; and similarly for Bob, viz. $\mathcal{F}_{F,\sigma_x}[\{F^b_y \}] \approx \sigma_x$ and $\mathcal{F}_{F,\sigma_z}[\{F^b_y \}] \approx \sigma_z$ . Thus for:

$\begin{align} \mathcal{S}_{\mathcal{A} \mathcal{A}^{\prime}} \equiv U_{\mathcal{A} \mathcal{A}^{\prime}}\left[\mathcal{F}_{E,\sigma_x}\left[\left\{E^a_x \right\}\right]\right] \cdot V_{\mathcal{A} \mathcal{A}^{\prime}}\left[ \mathcal{F}_{E,\sigma_z}\left[\left\{E^a_x \right\}\right] \right] \cdot U_{\mathcal{A} \mathcal{A}^{\prime}}\left[\mathcal{F}_{E,\sigma_x}\left[\left\{E^a_x \right\}\right]\right], \end{align} \tag{ 163a }$

$\begin{align} \mathcal{S}_{\mathcal{B} \mathcal{B}^{\prime}} \equiv U_{\mathcal{B} \mathcal{B}^{\prime}}\left[\mathcal{F}_{F,\sigma_x}\left[\left\{F^b_y \right\}\right]\right] \cdot V_{\mathcal{B} \mathcal{B}^{\prime}}\left[ \mathcal{F}_{F,\sigma_z}\left[\left\{F^b_y \right\}\right] \right] \cdot U_{\mathcal{B} \mathcal{B}^{\prime}}\left[\mathcal{F}_{F,\sigma_x}\left[\left\{F^b_y \right\}\right]\right], \end{align} \tag{ 163b }$

we have $\mathrm{SWAP}(\mathcal{A}, \mathcal{A}^{\prime}) \approx \mathcal{S}_{\mathcal{A} \mathcal{A}^{\prime}}$ and $\mathrm{SWAP}(\mathcal{B}, \mathcal{B}^{\prime}) \approx \mathcal{S}_{\mathcal{B} \mathcal{B}^{\prime}}$ . Define $\mathcal{S}_{\mathcal{A}\mathcal{B}\mathcal{A}^{\prime}\mathcal{B}^{\prime}} \equiv \mathcal{S}_{\mathcal{A} \mathcal{A}^{\prime}} \otimes \mathcal{S}_{\mathcal{B} \mathcal{B}^{\prime}}$ (with proper ordering of the subsystems), and consider the hypothetical state:

$\begin{align} \rho_\mathrm{SW AP} \equiv {\textrm{Tr}}_{\mathcal{A}\mathcal{B}} \left[ \mathcal{S}_{\mathcal{A}\mathcal{B}\mathcal{A}^{\prime}\mathcal{B}^{\prime}} \cdot \left( | \, \psi \rangle \langle \psi \, |_{\mathcal{A}\mathcal{B}} \otimes | \, \bar{\phi}_1 \rangle \langle \bar{\phi}_1 \, |_{\mathcal{A}^{\prime}} \otimes | \, \bar{\phi}_2 \rangle \langle \bar{\phi}_2 \, |_{\mathcal{B}^{\prime}} \right) \cdot \mathcal{S}_{\mathcal{A}\mathcal{B}\mathcal{A}^{\prime}\mathcal{B}^{\prime}}^{\dagger} \right]. \end{align} \tag{ 164 }$

Knowing the explicit form of the trusted ancillas $| \, \bar{\phi}_1 \rangle_{\mathcal{A}^{\prime}}$ and $| \, \bar{\phi}_2 \rangle_{\mathcal{B}^{\prime}}$ and applying algebraic calculations with (163) on $| \, \psi \rangle_{\mathcal{A}\mathcal{B}}$ we can derive that the formula for $\rho_\mathrm{SWAP}$ depending on the elements possible to be retrieved from the NPA moment matrix, see section 4.6. This way e.g. the fidelity of the state or other linear functions of its element can be optimized with the same technique as NPA, as proposed in the seminal paper [340], or NV (see section 4.7.2) as shown in [299].

Without loss of generality, we can assume that the functions $\mathcal{F}_{\cdot,\cdot}$ are acting on the reduced set of operators $\{{\unicode{x1D7D9}}, E^{\tilde{a}}_x, F^{\tilde{b}}_y\}_{\tilde{a},\tilde{b},x,y}$ . To illustrate the above concept, we hypothesize that for a particular behavior certain linear combinations of the operators of Alice and Bob express their operators σ_x and σ_z, i.e.:

$\begin{align} \begin{aligned} \bar{A}_0 &\equiv \mathcal{F}_{E,\sigma_z}\left[\left\{\bar{E}^{\tilde{a}}_x \right\}\right] = \sigma_z, \\ \bar{A}_1 &\equiv \mathcal{F}_{E,\sigma_x}\left[\left\{\bar{E}^{\tilde{a}}_x \right\}\right] = \sigma_x, \\ \bar{B}_0 &\equiv \mathcal{F}_{F,\sigma_z}\left[\left\{\bar{F}^{\tilde{b}}_y \right\}\right] = \sigma_z, \\ \bar{B}_1 &\equiv \mathcal{F}_{F,\sigma_x}\left[\left\{\bar{F}^{\tilde{b}}_y \right\}\right] = \sigma_x. \end{aligned} \end{align} \tag{ 165 }$

Using the formulae (162) and (163) we get the following expression for Alice:

$\begin{align} \begin{aligned} \mathcal{S}_{\mathcal{A} \mathcal{A}^{\prime}} & = U\left[\bar{A}_1\right] \cdot V\left[\bar{A}_0\right] \cdot U\left[\bar{A}_1\right] = \frac{1}{2} \left({\unicode{x1D7D9}} \otimes | \, 0 \rangle \langle 0 \, | + \bar{A}_1 \otimes | \, 1 \rangle \langle 1 \, |\right) \cdot \\ & \quad \cdot \left(\left({\unicode{x1D7D9}} + \bar{A}_0\right) \otimes {\unicode{x1D7D9}} + \left({\unicode{x1D7D9}} - \bar{A}_0\right) \otimes \sigma_x\right) \cdot \left({\unicode{x1D7D9}} \otimes | \, 0 \rangle \langle 0 \, | + \bar{A}_1 \otimes | \, 1 \rangle \langle 1 \, |\right) \\ & = \frac{1}{2} \begin{bmatrix} {\unicode{x1D7D9}} + \bar{A}_0 & \bar{A}_1 - \bar{A}_1 \bar{A}_0 \\ \bar{A}_1 - \bar{A}_0 \bar{A}_1 & {\unicode{x1D7D9}} + \bar{A}_1 \bar{A}_0 \bar{A}_1 \end{bmatrix}, \end{aligned} \end{align} \tag{ 166 }$

and similarly for Bob. Let us take the ancillas $| \, \bar{\phi}_1 \rangle_{\mathcal{A}^{\prime}} = | \, \bar{\phi}_2 \rangle_{\mathcal{B}^{\prime}} = | \, 0 \rangle \langle 0 \, |$ . Then, from (164) we have

$\begin{align} \begin{aligned} \rho_\mathrm{SWAP} &\equiv {\textrm{Tr}}_{\mathcal{A}\mathcal{B}} \left( \left( \mathcal{S}_{\mathcal{A} \mathcal{A}^{\prime}} \otimes \mathcal{S}_{\mathcal{B} \mathcal{B}^{\prime}} \right) \cdot \begin{bmatrix} | \, \bar{\psi} \rangle \langle \bar{\psi} \, |_{\mathcal{A}\mathcal{B}} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} \cdot \left( \mathcal{S}_{\mathcal{A} \mathcal{A}^{\prime}}^{\dagger} \otimes \mathcal{S}_{\mathcal{B} \mathcal{B}^{\prime}}^{\dagger} \right) \right) \\ & = \frac{1}{4} {\textrm{Tr}}_{\mathcal{A}\mathcal{B}} \left( \begin{bmatrix} \left({\unicode{x1D7D9}} + \bar{A}_0\right) \left({\unicode{x1D7D9}} + \bar{B}_0\right) | \, \bar{\psi} \rangle \langle \bar{\psi} \, |_{\mathcal{A}\mathcal{B}} & 0 & 0 & 0 \\ \left({\unicode{x1D7D9}} + \bar{A}_0\right) \left(\bar{B}_1 - \bar{B}_0 \bar{B}_1\right) | \, \bar{\psi} \rangle \langle \bar{\psi} \, |_{\mathcal{A}\mathcal{B}} & 0 & 0 & 0 \\ \left(\bar{A}_1 - \bar{A}_0 \bar{A}_1\right) \left({\unicode{x1D7D9}} + \bar{B}_0\right) | \, \bar{\psi} \rangle \langle \bar{\psi} \, |_{\mathcal{A}\mathcal{B}} & 0 & 0 & 0 \\ \left(\bar{A}_1 - \bar{A}_0 \bar{A}_1\right) \left(\bar{B}_1 - \bar{B}_0 \bar{B}_1\right) | \, \bar{\psi} \rangle \langle \bar{\psi} \, |_{\mathcal{A}\mathcal{B}} & 0 & 0 & 0 \end{bmatrix} \cdot \right. \\ & \quad \cdot \left. \begin{bmatrix} \left({\unicode{x1D7D9}} + \bar{A}_0\right) \left({\unicode{x1D7D9}} + \bar{B}_0\right) & \cdots & \cdots & \cdots \\ \left({\unicode{x1D7D9}} + \bar{A}_0\right) \left(\bar{B}_1 - \bar{B}_1 \bar{B}_0\right) & \cdots & \cdots & \cdots \\ \left(\bar{A}_1 - \bar{A}_1 \bar{A}_0\right) \left({\unicode{x1D7D9}} + \bar{B}_0\right) & \cdots & \cdots & \cdots \\ \left(\bar{A}_1 - \bar{A}_1 \bar{A}_0\right) \left(\bar{B}_1 - \bar{B}_1 \bar{B}_0\right) & \cdots & \cdots & \cdots \end{bmatrix}^\mathrm{T} \right), \end{aligned} \end{align} \tag{ 167 }$

where we omitted the unnecessary terms in the last matrix. One can easily see that $\rho_\mathrm{SWAP}$ is represented by a $4 \times 4$ matrix and acts on $\mathcal{A}^{\prime} \otimes \mathcal{B}^{\prime}$ . Direct calculations show that for instance:

$\begin{align} \left(\rho_\mathrm{SWAP}\right)_{2,1} = \frac{1}{4} \langle \bar{\psi} \, |_{\mathcal{A}\mathcal{B}} \left({\unicode{x1D7D9}} + \bar{A}_0\right)^2 \left({\unicode{x1D7D9}} + \bar{B}_0\right) \left(\bar{B}_1 - \bar{B}_0 \bar{B}_1\right) | \, \bar{\psi} \rangle_{\mathcal{A}\mathcal{B}}. \end{align} \tag{ 168 }$

To be more specific, let us focus on the so-called elegant Bell expression [119] which has a property that whenever its Tsirelson bound is attained, then Alice's operators satisfy $\sigma_z = -{\unicode{x1D7D9}} + 2 \bar{E}^0_3 = \bar{A}_0$ and $\sigma_x = -{\unicode{x1D7D9}} + 2 \bar{E}^0_1 = \bar{A}_1$ ( $x = 1,2,3$ ), and another linear functional can express σ_x and σ_z using the operators of Bob from the reduced set. This allows us to calculate e.g. lower bound on the fidelity of $\rho_\mathrm{SWAP}$ when the value of the Bell expression is slightly less than the Tsirelson bound [286].

The SWAP method has found multiple applications, in particular in the analysis of experimental data. The work [75] used the SWAP method to show that for every bipartite entangled quantum state in arbitrary dimension, there exists a behavior $\{P(a,b|x,y) \}$ (see section 1.4) allowing for self-testing of the state. In [321] the method was applied to self-test arbitrary qutrit states of the form $(2 + \gamma^2)^{-\frac{1}{2}} ( | \, 00 \rangle + \gamma | \, 11 \rangle + | \, 22 \rangle)$ and used for the analysis of a large scale quantum optical circuitry.

4.7. Non-commuting variables with dimension constraints

In this section, we will explore two distinct approaches that leverage moment matrices for optimizing over states and operators of a fixed dimension. The first approach, discussed in section 4.7.1, builds directly upon the NPA method and incorporates the dimension constraint as additional linear constraints. On the other hand, the second technique, presented in section 4.7.2, shares a similar structure with NPA but adopts a randomized approach to construct the basis of the space of SDP variables.

4.7.1. Dimension constraints imposed on NPA hierarchy.

We present our method, which was introduced and developed in our previous works [189, 212], and is referred to as MLP hierarchy. This method enables the analysis of semi-device-independent [248] scenarios using the powerful techniques of SDP by reducing the problem to a device-independent [203] framework modeled in the NPA hierarchy.

To introduce the SDP relaxation, we first consider a device D0, consisting of two distinct black boxes assigned to Alice and Bob, respectively. We know the dimension of the messages exchanged between the two parts. The box of Alices generates and emits quantum states from some $\{\rho_{x}\}_{x \in \bar{X}}$ . Bob is provided with a separate device that includes settings corresponding to measurement choices involving measurements denoted as $\left\{\{M^b_y\}_{b \in \bar{B}} \right\}_{y \in \bar{Y}}$ . We denote the conditional probability of obtaining an outcome b when settings x and y are selected as $P_\textbf{{D0}}(b|x,y)$ . Suppose we are provided with a dimension witness W in the form presented in (10) earlier. Assume this dimension witness yields an average value of W₀ in experiments conducted on the D0 device.

Although we do not know the specification of the device D0, we can consider an alternative device denoted as D1, as follows. The device D1 comprises two components, each equipped with buttons labeled the same way as in D0. In D1 we assume that both parts share a singlet state of dimension d. Alice's component performs a projective measurement with outcomes 0 (indicating successful projection) or 1 (otherwise), depending on the chosen input x. This measurement projects Alice's part of the singlet onto the state ρ_x, which corresponds to the relevant state in the device D0. If the projection succeeds, which occurs with a probability of $\frac{1}{d}$ , the device returns a = 0 and transforms Alice's side into the state ρ_x. Otherwise, it returns a = 1. Since the shared state is a singlet, this measurement prepares the same d-dimensional state on Bob's side. Bob subsequently performs the same measurements $\{M^b_y\}_{b}$ as the device D0 would perform, and he returns the outcome b. Let us denote the probability that Alice obtains outcome a with setting x, while Bob obtains outcome b with setting y, as $P_\textbf{{D1}}(a,b|x,y)$ . The case where a = 1 giving the probabilities $P_\textbf{{D1}}(1,b|x,y)$ are not utilized in our relaxation. Trivially, we have:

$\begin{align} P_{\textbf{D0}}\left(b|x,y\right) = d \cdot P_{\textbf{D1}}\left(0,b|x,y\right). \end{align} \tag{ 169 }$

Let us now turn our attention to a third device, denoted as D2, which shares the same interface as D1. In contrast to the previous devices, the internal workings of D2 are unrestricted, meaning that we make no assumptions about the measurements performed. Both Alice's and Bob's parts are now allowed to be in an arbitrary state ρ of any dimension. For this device, we give an additional constraint

$\begin{align} \forall_{x} P_{\textbf{D2}}\left(0|x\right) = \frac{1}{d}, \end{align} \tag{ 170 }$

where $P_\textbf{{D2}}(a|x)$ is the probability of getting the outcome a by Alice with the setting x with the device D2. It is important to highlight that the description of this device falls under the category of device-independent scenarios. Consequently, we can utilize the NPA method to mathematically represent its behavior, specifically the behavior exhibited by device D2. In the case of the third device, we can represent the probability of obtaining outcomes a and b for a given combination of settings x and y for Alice and Bob, respectively, as $P_\textbf{{D2}}(a,b|x,y)$ .

It is evident that all the behaviors achievable by the device D1, and equivalently by device D0, can also be obtained by device D2. Furthermore, as device D2 is a relaxed variant of the original device D0. It is important to note that one of the key characteristics of the set of behaviors is its efficient approximation using the NPA hierarchy. An advantageous aspect of this method is that it provides a bound for any dimension of the communicated system, with the linear bound being the only parameter that needs adjustment. We conclude that using the relation $P(b|x,y) = d \cdot P(0,b|x,y)$ , we can impose a dimension constraint to an existing implementation of the NPA as in (170), i.e. $P(0|x) = 1/d$ .

4.7.2. NV hierarchy.

Navascués and Vertesi (2014) proposed [224, 228] a hierarchy of SDPs aimed at upper bounding quantum correlation and behaviors in scenarios with dimension constraint, where similarly as in NPA, the improved accuracy is obtained when increasing the hierarchy level. The NV hierarchy is particularly useful in dimension-bounded scenarios, where the number of dimensions of the quantum systems involved is limited. In such scenarios, the full characterization of quantum correlations becomes computationally challenging due to the exponential growth of the dimension. NV provides a systematic and tractable approach to approximate and quantify quantum correlations in these scenarios.

Consider a sequence of operators representing states and measurements, denoted as $\mathcal{S} = (O_1, \ldots, O_n)$ , where each operator corresponds to a specific quantum state or measurement or their polynomial function, for some N. For instance, we can have a sequence that includes identity operator ${\unicode{x1D7D9}}$ , pure states ρ₀₀, ρ₀₁, ρ₁₀, ρ₁₁, and measurement projectors $P^1_0$ , $P^1_1$ , $P^2_0$ , $P^2_1$ . It is important to note that all states in the sequence are pure and all measurements are projectors with fixed rank. In the NV method, a crucial component is a moment matrix denoted as $M = [M]_{\mathcal{S},\mathcal{S}}$ . This matrix is indexed by the sequence $\mathcal{S}$ and is defined similarly to the moment matrix used in the NPA. The entries of the moment matrix are given by the inner products of the operators in the sequence, specifically $M_{O_i, O_j} = {\textrm{Tr}}(O_i^{\dagger} O_j)$ . It is worth noting that when the sequence $\mathcal{S}$ contains both states and measurements, the moment matrix will have entries that correspond to the probabilities $P(b|x,y)$ of obtaining outcome b when performing measurement $M^y_b$ on the state ρ_x. This allows the moment matrix to capture the statistical information of the correlations between states and measurements, providing a framework for characterizing and quantifying the quantum correlations present in the system.

The implementation of the NV method involves a randomized approach to construct a set of moment matrices. The first step is to randomize the moment matrices, which will define the optimization problem in the SDP framework. The goal is to optimize a linear combination G of the entries $P(b|x,y)$ , $G = \sum_{b,x,y} \beta_{b,x,y} P(b|x,y)$ for some fixed $\{\beta_{b,x,y} \} \subset \mathbb{R}$ , which may, for instance, correspond to the average success probability of a quantum random access code. To provide more detailed steps, the implementation begins with an initialization phase where a basis of moment matrices is created:

(i)
The set $\mathcal{M}$ is initialized as an empty set to store the moment matrices, $\mathcal{M} = \emptyset$ .
(ii)
Operators in the sequence, specific to the given hierarchy level, are randomized to form the sequence $\mathcal{S}$ .
(iii)
The moment matrix $\tilde{\Gamma}$ is constructed by evaluating the inner products of the operators in the randomized sequence $\mathcal{S}$ , $\tilde{\Gamma} \equiv \left[ {\textrm{Tr}}(O_i^{\dagger} O_j) \right]_{\mathcal{S},\mathcal{S}}$ .
(iv)
The matrix $\tilde{\Gamma}^{\perp}$ is obtained by projecting $\tilde{\Gamma}$ onto the subspace orthogonal to the span of the moment matrices in $\mathcal{M}$ .
(v)
If $\tilde{\Gamma}^{\perp}$ is a zero matrix, the process of extending the basis of moment matrices $\mathcal{M}$ is terminated.
(vi)
Otherwise, the orthogonalized moment matrix $\tilde{\Gamma}^{\perp}$ is added to the set $\mathcal{M}$ , $\mathcal{M} = \mathcal{M} \cup \{\tilde{\Gamma}^{\perp}\}$ . The process returns to step 2 to generate the next randomized sequence.

By iteratively adding orthogonalized moment matrices to $\mathcal{M}$ , the NV implementation constructs a basis of moment matrices that captures the possible correlations in the considered system. These moment matrices form the foundation for the subsequent SDP optimization, where the objective is to find the optimal values for the entries $P(b|x,y)$ in order to maximize the value of G. The optimization has the form:

$\begin{align} \begin{split} \text{maximize } & {\textrm{Tr}} \left[ \hat{\mathcal{B}} \Gamma \right] \\ \text{subject to } & \Gamma \in \text{span}\left( \mathcal{M} \right), \\ & \left(\Gamma\right)_{{\unicode{x1D7D9}}, {\unicode{x1D7D9}}} = 1, \\ & \Gamma \succeq 0, \end{split} \end{align} \tag{ 171 }$

where we call $\hat{\mathcal{B}}$ the game matrix of the expression G, and construct it to select from Γ the relevant values ${\textrm{Tr}}[\rho_x M^y_b]$ with coefficients defined by probability functional we are considering.

The hierarchy can get a significant boost of performance when symmetries of G are exploited [7, 302].

4.8. The see-saw iterative non-linear optimization

The see-saw method is an iterative optimization technique used in SDP to find approximate solutions for certain optimization problems [242, 328]. It is particularly effective for problems involving quantum states and measurements. The method involves alternating optimization over states and measurements, refining the solutions at each iteration until convergence is achieved. In the see-saw method, the optimization problem is first formulated as an SDP, where the objective function and constraints are expressed in terms of quantum states and measurements. The method starts with an initial guess for either the states or the measurements, which is usually a simple randomization. In each iteration, the method optimizes over the states while keeping the measurements fixed, and then optimizes over the measurements while keeping the states fixed. This alternating optimization continues until a convergence criterion is met, such as a small change in the objective function or constraints. The see-saw method leverages the interplay between states and measurements in quantum systems. By iteratively optimizing over states and measurements, the method explores different combinations that lead to improved solutions. When a set of states is given, the expression (10) can be optimized using a similar approach as in QSD [20, 142, 153] by employing SDPs, see appendix B.3. Similarly, if measurements are provided, SDP can be utilized with states as variables to optimize the expression. In cases where neither states nor measurements are known, SDP can be employed to simultaneously find optimal solutions for both. The see-saw method operates based on the following outline:

(i)
Initially, a set of states is chosen randomly as an initial guess.
(ii)
Using SDP, the method optimizes over measurements while keeping the states fixed. The objective is to find measurements that maximize the target expression.
(iii)
Next, the method optimizes over states while keeping the measurements fixed. It uses SDP to find states that maximize the target expression.
(iv)
The process iterates by returning to step 2 if certain stopping criteria are not satisfied. The stopping criteria could be based on the convergence of the objective function or other specified conditions.

It is important to note that the see-saw method does not guarantee finding the global optimum of the optimization problem. Instead, it provides an approximate solution that can be improved iteratively. To enhance the chances of finding better solutions, the method suggests restarting the process multiple times with different initial states. By repeating the see-saw method with various initial states, the hope is to explore different regions of the solution space and potentially find better solutions. Although the method may not guarantee the global optimum, it offers a practical approach for approximating the optimal solution to the optimization problem at hand.

In a related results [259] a toolbox designed to determine the optimal discrimination of optical modes in two distinct scenarios has been introduced. The first scenario, typical of metrology, involves the verifier controlling the light source and establishing a reference frame for the phase. The second scenario, more relevant to cryptography, considers cases where the verifier only observes states diagonal in the photon-number basis. The toolbox utilizes LP and SDP methods to deliver rigorous bounds for the discrimination process, enhancing the understanding and practicality of the methods in both applications.

An example of a see-saw implementation in Matlab is given in appendix B.5.

5. Conclusions

In conclusion, this paper has delved into the realm of SDP within the context of quantum information, offering a comprehensive exploration of their mathematical foundations and practical applications. By elucidating the concepts of convex optimization, duality, and SDP formulations, the study has equipped researchers and practitioners with powerful tools to address optimization challenges in quantum systems. The insights gained from the research of SDP have proven invaluable in advancing the field of quantum information, enabling the characterization and manipulation of quantum correlations, optimization of quantum states, and design of efficient quantum algorithms and protocols. The practical implementation of SDP discussed in the paper, has empowered researchers to effectively formulate and solve optimization problems in quantum systems, fostering the development of more efficient quantum communication protocols, self-testing methods, and a deeper understanding of quantum entanglement.

Acknowledgments

The work on this review started in 2019. The current support from the Knut and Alice Wallenberg Foundation through the Wallenberg Centre for Quantum Technology, the Swedish Research Council, and NCBiR QUANTERA/2/2020 (www.quantera.eu) an ERA-Net co-fund in Quantum Technologies under the project eDICT is acknowledged. In the period 2019–2022 the work was partially conducted at the Department of Algorithms and System Modeling at Gdańsk University of Technology and was also supported by the Foundation for Polish Science (IRAP project, ICTQT, Contract No. 2018/MAB/5, co-financed by EU within Smart Growth Operational Programme).

Recently we learned about other two independent works about SDP in quantum information [284, 300]. The content of these works and the current work are to a large extent complimentary. The current review concentrates on providing the mathematical and implementation background, by a detailed presentation of the general optimization framework, it covers the discussion of implementations of solvers and is intended primarily for active researchers in quantum information looking for the answer to the question why the methods work.

Data availability statement

No new data were created or analyzed in this study.

Appendix A: Proof of the decoupling lemma

We will now provide a proof of the decoupling lemma used in section 2.5. We follow the line of [37, 38]. The lemma is used as a constraint qualification condition, i.e. it provides the strong duality sufficient criteria of the Fenchel–Rockafellar scheme. To this end, we first introduce the concept of convex series.

A.1. Convex series

The notion of convex series was introduced in [154]; see [37, p 113nn] for a detailed discussion. Convex series of C are series of the form $\sum_{i = 1}^{+\infty} \lambda_i x_i$ with ${\Large{\forall}}_{i} x_i \in C$ , ${\Large{\forall}}_{i} \lambda_i \unicode{x2A7E} 0$ and $\sum_{i = 1}^{+\infty} \lambda_i = 1$ . C is defined to be convex series closed if the sum of every convergent convex series of C is contained in C. One can check that if C is convex series closed, then [37, p 116]:

$\begin{align} \textrm{int }{C} = \textrm{core }{C}. \end{align} \tag{ A.1 }$

C is defined to be convex series compact if every convex series of C converges to an element of C. It can be shown that C is convex series compact if and only if, it is convex series closed and bounded. Thus, ${B}_X$ is convex series compact.

A.2. Preliminary comment

F defined as (52) is also $\textrm{lsc}$ , since A is continuous. Without loss of generality, we also assume that $f(0) = g(0) = 0$ , since if it is not the case, we can trivially transform the primal problem (53) shifting it by a constant value. From this it follows $F(0,0) = 0$ and

$\begin{align} {\Large{\forall}}{\lambda \in \left[0,1\right], x \in X, y \in Y} F\left(\lambda x, \lambda y\right) \unicode{x2A7D} \lambda F\left(x,y\right). \end{align} \tag{ A.2 }$

A.3. Step 1: define the convex set S

Let us define [37, p 127]

$\begin{align} S \equiv \bigcup_{x \in {B}_X} \left\{y \in Y : F\left(x,y\right) \unicode{x2A7D} 1 \right\}. \end{align} \tag{ A.3 }$

Let $y_0, y_1 \in S$ . Then there exist $x_0, x_1 \in {B}_X$ such that $F(x_i, y_i) \unicode{x2A7D} 1$ for $i = 0,1$ . For any $\lambda \in [0,1]$ from the convexity of F we have $F(\lambda x_0 + (1-\lambda) x_1, \lambda y_0 + (1-\lambda) y_1) \unicode{x2A7D} \lambda F(x_0, y_0) + (1-\lambda) F(x_1, y_1) \unicode{x2A7D} 1$ and thus $\lambda y_0 + (1-\lambda) y_1 \in S$ implying convexity of S.

A.4. Step 2: show that $0 \in \textrm{core}\ {S}$

From the definition (15) and the assumption given in (61) we get ${\Large{\forall}}_{y \in Y} {\Large{\exists}}_{T_y \gt 0} (T_y y) \in \textrm{dom}{\theta} = \textrm{dom}{g} - A \textrm{dom}{f}$ . Consider arbitrary $y \in Y$ and take any $x_y \in \textrm{dom}{f}$ such that $T_y y + A x_y \in \textrm{dom}{g}$ . Then $k_y \equiv F(x_y, T_y y) \lt +\infty$ . We want to ensure that ${\Large{\exists}}_{\alpha_y \gt 0} \alpha_y y \in S$ so that S is absorbing. Indeed, let $\alpha_y = \min(1/{k_y}, 1/ \left| \left| x_y \, \right| \right|)$ . This implies that $F(\alpha_y x_y, \alpha_y y) \unicode{x2A7D} 1$ (by (A.2)) and that $\alpha_y x_y \in {B}_X$ , and thus $\alpha_y y \in S$ . Since y represents arbitrary direction in Y, so $0 \in \textrm{core}{S}$ (and, what is more, S is an absorbing set).

A.5. Step 3: show that $\textrm{core }{S} = \textrm{int }{S}$

We will use (A.1). To this end, we check that S is convex series closed. Indeed, consider any convergent convex series of S, $\sum_{i = 1}^{+\infty} \lambda_i y_i$ with ${\Large{\forall}}_{i} y_i \in S$ , summing to some y. It suffices to show that $y \in S$ . From (A.3) it follows that ${\Large{\forall}}_{i} {\Large{\exists}}_{x_i \in {B}_X} F(x_i, y_i) \unicode{x2A7D} 1$ . The series $\sum_{i = 1}^{+\infty} \lambda_i x_i$ converges to some x since $\textrm{B}_X$ is convex series compact. We have $F(x, y) = \sum_{i = 1}^{+\infty} F(\lambda_i x_i, \lambda_i y_i) \unicode{x2A7D} \sum_{i = 1}^{+\infty} \lambda_i F(x_i, y_i) \unicode{x2A7D} \sum_{i = 1}^{+\infty} \lambda_i = 1$ , where for the first inequality we used the assumption that F is $\textrm{lsc}$ and (A.2). Thus, $y \in S$ , and any convergent convex series of S has sum y contained in S meaning that S is convex series closed. Using (A.1) we conclude that for S we have $\textrm{core }{S} = \textrm{int }{S}$ .

A.6. Step 4: show that θ is continuous in the neighborhood of 0

It can be shown [37, p 112] that for a Banach space Z a convex function $f: Z \rightarrow \mathbb{R} \cup \{+\infty\}$ , locally bounded above at $z \in \textrm{int}(\textrm{dom}{f})$ is also locally Lipschitz at z. Thus it is also a fortiori continuous at z. From (54) and (A.3) it follows that ${\Large{\forall}}_{y \in S} \theta(y) \unicode{x2A7D} 1$ , so θ is continuous at $0 \in \textrm{int}{S}$ .

Appendix B: Code samples in Matlab

B.1. Illustration of simple problem formulations using YALMIP

We start code discussions with a trivial example of SDP finding the largest eigenvalue of a matrix. The purpose of this example is to provide a short overview of the syntax characteristic of usage of the YALMIP modeling toolbox [193]. To install it one should follow the instructions from the repository and also install one of the supported solvers, e.g. SDPT3 [309] or SeDuMi [290, 291].

We start with randomizing a Hermitian 3 by 3 matrix X using the ordinary Matlab syntax. We see in listing 1 that in this instance the eigenvalues were $-0.086\,472$ , $0.684\,28$ , and 3.227.

Next, we define a 3 by 3 Hermitian variable in listing 2. The function sdpvar is used for this purpose. The first two parameters are the number of rows $n_\mathrm{r}$ and columns $n_\mathrm{c}$ of the matrix. The third parameter specifies the structure of the variable. One of the possibilities includes 'full' when all entries of the matrix are parameterized independently, meaning $n_\mathrm{r} n_\mathrm{c}$ parameters for real, and $2 n_\mathrm{r} n_\mathrm{c}$ parameters for complex matrices. Another possibility for the parametrization when $n_\mathrm{r} = n_\mathrm{c} = n$ is 'symmetric' meaning that the element at ith row and jth column is exactly equal to the one at jth row and ith column, using $n (n+1) / 2$ parameters for real, and n² parameters for complex matrices. A third possibility is 'hermitian' meaning that the element at ith row and jth column is equal to the complex conjugate of the element at jrow and ith column. Other possible structures are 'diagonal' for diagonal matrices, 'toeplitz' for symmetric Toeplitz matrices, 'hankel' for unsymmetric Hankel matrices, 'rhankel' for symmetric Hankel matrices, and 'skew' for skew-symmetric matrices. When a real square matrix variable is to be created, an abbreviated form sdpvar(n) can be used to create n by n real symmetric matrix.

We can notice that the coefficient range is $\{1\}$ , meaning that all coefficients in the variable S are equal to 1. It is possible to use the parameterized variables of the type sdpvar with various coefficient ranges. Usually, if the coefficients are spread by several orders of magnitude, meaning that the program is mixing large and small coefficients, this leads a solver to get into numerical problems. Similar problems may happen if the coefficients are very large or very small. The variables that occur with very small coefficients usually do not influence significantly the value of the solution and can be removed using the clean function from the YALMIP, as shown in listing 3.

**Listing 3.** `sdpvar` with large range of coefficients, and removing small coefficients with `clean` function.
Download figure:
Standard image High-resolution image

Now, we will execute the optimization with the command optimize(F = [S > = 0; trace (S) = = 1], target = -trace(X * S)). The first argument of this function specifies the constraints of the optimization, and the second is the target of the optimization. We assigned the constraints to the variable F. Let us investigate this variable as shown in listing 4. We have specified the positive semi-definiteness constraint S > = 0 on the complex 3 by 3 matrix S, i.e. S > = 0. This is described as Matrix inequality (complex) 3x3. The second imposed constraint of trace normalization, trace (S) = = 1, is described as Equality constraint 1x1.

**Listing 4.** Investigation of a sample constraint in SDP: positive semi-definiteness of `S`, i.e. `S > = 0`, and trace normalization, i.e. `trace` `(S) = = 1`.
Download figure:
Standard image High-resolution image

An optional third argument to optimize can specify additional optimization parameters, like the selection of the solver, specification of how much information during the execution of the optimization should be printed to the screen, the maximal number of iterations (if the solver allows for it). For instance, to specify that from all available solvers, YALMIP should use SDPT3, not print any information, and limit the number of iterations to 20 one can provide a setting sdpsettings('solver', 'sdpt3', 'verbose', 0, 'sdpt3.maxit', 20). Another useful option 'showprogress' allows us to see the progress of YALMIP, which is useful for debugging purposes for very large problems. Recall that with a modeling tool, before the optimization the problem is being converted to the form suitable for the solver, usually the canonical form discussed section 3.2.1, or SDPA form discussed in section 3.2.2. This formulation in some cases may take more time than the actual solver time. The option 'removeequalities' specifies how constraints of the equalities should be preprocessed before passing to the solver, as discussed in section 3.6. The option 'dualize' tells YALMIP to fit the problem formulation to the primal instead of the dual form. If multiple optimizations are to be executed we recommend storing the settings in a separate variable and providing it as the third optional argument to optimize, especially for smaller problems. The reason for this is the fact that if this argument is not provided explicitly, then YALMIP will re-create its content for each execution of optimize, which requires additional computational time. Turning on the option 'showprogress' shows that the stages with the default settings (with YALMIP version 20210331) are as shown in listing 5.

**Listing 5.** Stages of YALMIP processing in the discussed example.
Download figure:
Standard image High-resolution image

At this stage, the control is passed to the specified solver. Recall from the discussion in section 1.3 that YALMIP provides to the solver the problem framed in the dual form (81). The solver usually prints the values describing the size parameters of the problem passed to the solver, as shown in listing 6 for the case of the solver SDPT3. The listing 6 shows that the dimension of the SDP variable is 6. This stems from the fact, that in the stage Converting to real constraints YALMIP has reformulated the n by n complex variable to 2n by 2n real variable, as discussed in section 3.5; in the considered example n = 3. This reformulation as a real variables problem, requires stating a requirement that the dual SDP variable Z has the form (95). It is easy to see that this requires $\frac{n \cdot (n + 1)}{2}$ matrices A_i to express that the two blocks containing $B^\mathrm{R}$ , and $\frac{n \cdot (n - 1)}{2}$ matrices A_i to express the relation $B^\mathrm{I} = -B^\mathrm{I}$ in the off-diagonal block. This gives in total n² matrices A_i , what yields 9 in this case, displayed as number of constraints = 9. Each of the problem's constraints framed in the dual form corresponds to an additional free variable in the primal problem, as discussed in section 3.6. The fact that there is only one equality trace (S) = = 1 in the case, is expressed as dim. of free var = 1 in listing 6.

**Listing 6.** Sample information about size of the problem to be passed to the solver as printed by SDPT3.
Download figure:
Standard image High-resolution image

We will briefly analyze the two of the settings, viz. 'removeequalities' and 'dualize'. Their default values are both 0, and this is the case analyzed in the previous paragraph. If we set 'removeequalities' to 1, then the discussed stages will result in output given in listing 7. We notice that before calling the solver, two additional stages of YALMIP processing are taken, viz. Solving equalities and Converting problem to new basis, both responsible for the reduction of the number of variables and reformulation of the dual form, as discussed in section 3.6. Comparing with listing 6 we see that the effect is the reduction of the number of matrices A_i from m = 9 to $8 = m - n_\mathrm{F}$ and, at the same time, removal of the $n_\mathrm{F} = 1$ free variables. On the test platform, the problem size reduction resulted in a drop of the solver time from 0.48s to 0.24s, at the cost of additional YALMIP processing time increased to 0.14 s from 0.10 s. If we set 'dualize' to 1, then the problem will be framed in the primal form (80) instead. Again, the complex SDP of size n will be expressed as a real SDP of size 2n, as explained above and in section 3.5. From listing 8 we see that indeed the size of the SDP variable is 6, and there is only one constraint $\{A_i\}$ to express the requirement trace (S) = = 1.

**Listing 7.** Size of the sample problem modelled with YALMIP with `'removeequalities'` set to 1 framed in the canonical dual form.
Download figure:
Standard image High-resolution image

**Listing 8.** Size of the sample problem modeled with YALMIP with `'dualize'` set to 1 in order to frame the optimization problem in the canonical primal form.
Download figure:
Standard image High-resolution image

The SDPT3 solver will provide also information about the chosen algorithm, as shown in listing 9. In this case, the algorithm uses the HKM search direction, see (117). Other input parameters of SDPT3 are gam and expon, and they are used to calculate the value of the step-length $\alpha_\mathrm{P}$ and ${\text{expon}}\_{\text{used}}$ in (120) for the predictor–corrector mechanism, as discussed in section 3.9.

The core part of solving an SDP is an iterative procedure of gradual improvement of the solution $(X^{(i)},y^{(i)},Z^{(i)})$ , with a sample progress report shown in listing 10. The first column is the iteration number, here the solver finished after the 14th iteration. The second and the third are primal and dual step-lengths, see (119), taken in each iteration. When the step-lengths are close to 1, it means that the search direction allowed for a large change in the values of $(X^{(i)},y^{(i)},Z^{(i)})$ , what usually indicates a significant improvement of the solution in the iteration. The fourth and fifth columns are residuals norms of the primal and dual solutions, as discussed in section 3.8. The residual norms are expected to be close to 0 when the solution is feasible, i.e. satisfies all the imposed constraints. The sixth column is proportional to the gap (112) and also should be close to 0 when the iterates approach the solution. Due to numerical inaccuracy, one usually considers values of the gap and the residual norms close to 10⁻⁷ as satisfactory. The 7th columns mean (obj) is the mean value of the primal ${\textrm{Tr}}(C X^{(i)})$ and dual $b^\mathrm{T} \cdot y^{(i)}$ solutions of the current iterate. The 8th column provides the time passed til the current iteration (in the example it passed less than one second). The following three columns, kap, tau, theta provide information about specific parameters used in the determination of the step-length, and the last column provides information about the Cholesky factorization taken in calculation of the search direction; these information are beyond the scope of this work.

**Listing 10.** Iterations of interior-point method in SDPT3.
Download figure:
Standard image High-resolution image

The SDPT3 solver provides a brief summary of its execution, as shown in listing 11. The data include the number of iterations, the value of the primal solution ${\textrm{Tr}}(C X^{(i)})$ , and the value of the dual solution $b^\mathrm{T} \cdot y^{(i)}$ , the value of the gap (112), and the relative gap (i.e. the gap divided by 1 plus the mean value of the primal and dual solutions), the parameters measuring infeasibility (which should be close to 0), the norms of the solution in the last iteration $(X^{(i)},y^{(i)},Z^{(i)})$ , the norms of the matrices defining the problem $\{A_i\}_i$ , b, and C, the total CPU time, and the CPU time per iteration, the termination code. The successful termination code is 0. The values −1,−5, and −9 indicate a lack of progress, when the improvements are too slow; 1 and 2 indicate dual or primal infeasibility of the solution, −6 indicates that the maximal number of iteration has been reached before the desired quality of the solution was obtained; there is also a couple of value indicating various numerical problems. The last information contains the so-called DIMACS statistics for standardized benchmarking purposes [213].

B.2. Correlation matrix

Now, we provide an illustration of the concept of SDP optimization over correlation matrices, as discussed in section 4.6. Let us consider a set $\mathcal{S} = \{x_1, x_2, x_3\}$ . Suppose we have obtained information, possibly from experimental data, that $\textrm{corr}(x_1, x_2) = 0.7 \pm 0.03$ and $\textrm{corr}(x_1, x_3) = 0.8 \pm 0.01$ . The question of interest is: what is the range of possible values for $\textrm{corr}(x_2, x_3)$ ? To answer this, we construct a real 3 by 3 correlation matrix with diagonal elements equal to 1 and impose constraints on its entries based on the specified ranges. We then perform maximization and minimization of the entry corresponding to $\textrm{corr}(x_2, x_3)$ to determine its possible range in listing 12. The results, as shown in listing 13, indicate that $\textrm{corr}(x_2, x_3)$ lies within the interval $[0.074\,153, 0.995\,73]$ .

**Listing 12.** Correlation matrix example in YALMIP.
Download figure:
Standard image High-resolution image

**Listing 13.** Results of the correlation matrix example calculations.
Download figure:
Standard image High-resolution image

B.3. Quantum state discrimination

In the task of discriminating N non-orthogonal quantum states, the goal is to employ a measurement strategy using operators $M_1, M_2, \ldots, M_N$ in order to maximize the average success of the discrimination. This average success is quantified by the expression $\frac{1}{N} \sum_{i \in [N]} {\textrm{Tr}} \left( \rho_i M_i \right)$ , where ρ_i represents the quantum state and M_i corresponds to the measurement operator for the ith state. By optimizing this expression, one can effectively differentiate between the given non-orthogonal states.

The Matlab code provided in listing 14 demonstrates quantum state discrimination using the YALMIP optimization toolbox. The code begins by generating three random quantum states of dimension 3: state1, state2, and state3. These states represent the quantum systems to be discriminated. Next, the code declares three measurement variables, meas1, meas2, and meas3, using the sdpvar function from YALMIP. These variables are Hermitian matrices of size 3 by 3, representing the measurement operators corresponding to each state. The optimization problem is formulated using the optimize function, which takes an objective function and a set of constraints as inputs. The objective function aims to maximize the average success rate of discrimination, given by the expression -trace(state1*meas1 + state2*meas2 + state3*meas3)/3. This expression calculates the average trace of the product of the state and measurement operators. The division by three accounts for the number of states being discriminated. The constraints include ensuring that each measurement operator is PSD (meas1 > = 0, meas2 > = 0, meas3 > = 0) and that the sum of all measurement operators equals the identity matrix (meas1 + meas2 + meas3 = = eye (3)). These constraints guarantee that the measurement operators are valid and form a valid measurement scheme. The output of the optimization will provide the optimal measurement operators that achieve the highest discrimination performance.

We note that in the discrimination process, it is sometimes useful to consider scenarios where the weights of the different states are specified. By assigning specific weights to each state, the discrimination problem can be formulated in a more structured manner. This approach allows for a more targeted optimization of the average success metric, leading to enhanced discrimination capabilities. By leveraging knowledge about the weights of the states, researchers can design measurement schemes and strategies that are tailored to maximize the overall success rate in discriminating the non-orthogonal quantum states.

B.4. Implementation of the Doherty–Parillo–Spedalieri method

The code from listing 15 defines a header of a function called GetDPS, which is designed to create YALMIP variables for quantum states, along with associated constraints, specifically tailored for applying the relaxed separability criteria of the DPS method discussed in section 4.2. These variables and constraints can be used for conducting optimizations involving separable quantum states. The function takes inputs such as the dimensions of subsystems, the number of copies, and the configuration of PPT tests. The bNormalize parameter allows to choose whether the trace of the resulting state should be exactly 1 or not greater than 1. The function outputs the YALMIP variable rho, the corresponding YALMIP constraints F, and the symmetric extension variable rhoSymExt.

The part of the code from listing 16 performs some initial calculations and checks based on the provided input. It ensures that the boolean variable bNormalize is set to true if not explicitly specified, confirming that trace normalization is the default behavior. The code then asserts that the dimensions of subsystems and the number of copies match and that the total number of copies aligns with the number of columns in the provided PPTs matrix. The number of subspaces is determined, considering the input dimensions and the number of copies. The dimensions of the symmetric extension are computed, including those for subsystem copies, which are identified. Finally, the total dimension of the symmetric extension is calculated by taking the product of the individual dimensions. These preliminary steps ensure the consistency and validity of the input.

The code given in listing 17 focuses on creating a basis for the symmetric extension matrices. It begins with an initial basis vector [1], which will accumulate the vectors spanning the symmetric extension subspace. For each subspace specified in the input, it computes the eigenvalues and eigenvectors of a symmetric projection matrix constructed based on the dimensions and the number of copies using a relevant subroutine SymmetricProjection from the package QETLAB [162]. Note that the eigenvalues of the symmetric projection are either 0 or 1, and thus the symmetric subspace can be determined by selecting the eigenvectors corresponding to eigenvalues greater than, for instance, 0.5. The basis of all vectors spanning symmetric extension subspace stored in the variable basis is updated accordingly using the Kronecker product. This process continues for all subspaces, ultimately generating a basis with vectors that span the symmetric extension space. The variable nBasis records the number of basis vectors.

The code from listing 18 is responsible for creating the rhoSymExt variable, which represents the symmetric extension of a quantum state. It starts by defining rhoSymBasis, which is an nBasis by nBasis Hermitian complex matrix. The symmetric extension matrix rhoSymExt is then constructed by performing matrix operations involving the basis vectors. After constructing rhoSymExt, a cleaning operation is applied to remove small numerical artifacts that could affect the numerical calculations during SDP optimizations. The variable F is used to store constraints associated with rhoSymBasis. If the bNormalize flag is set, the trace constraint enforces that the trace of rhoSymExt equals 1, indicating a normalized quantum state.

The part of the code from listing 19 is responsible for creating the rho variable, which represents the quantum state constrained to satisfy the selected DPS criteria. It is derived from its previously created symmetric extension rhoSymExt by tracing out the copies of subsystems (as specified by the variable subsystemsCopiesPos). Following that, the code iterates over the specified on the input PPT checks (passed to the function in the variable PPTs). Each PPT check corresponds to a constraint that enforces the partial transpose of the symmetric extension of the state to be PSD. The partial transposition is performed using the function Tx from the package Quantinf [77]. The constraints are added to the variable F, which collects all the constraints to be used in later optimization problems.

Let us now use the defined function to the example from section VII.B from the paper by Doherty et al [87]. We define auxiliary lambda functions ket and bra that are creating vectors $| \, i \rangle \otimes | \, j \rangle$ and $\langle i \, | \otimes \langle j \, |$ in two ququart spaces. Next, we use them to express the witness W which is non-negative on all product states, as derived in [87]. Next, we use the function GetDPS to create a variable under DPS constraints, where the second subsystem has two copies and two PPT constraints are imposed: for partial transposition of the second and third subsystems. The optimization shows that the maximal value of the witness possible to be obtained by states satisfying these constraints is 0, as expected. A unit testing example is shown in listing 20.

**Listing 20.** Illustration of working of the function creating variable with DPS constraints applied on a separability witness .
Download figure:
Standard image High-resolution image

The second example involves the NTV method discussed in section 4.2, as shown in the unit test from listing 21. We use the Bell functional $P(0,0|1,1) + P(0,0|1,2) + P(0,0|2,1) - P(0,0|2,2) - P_A(0|1) - P_B(0|1)$ with the Tsirelson bound $\frac{1}{\sqrt{2}} - \frac{1}{2}$ [52]. This Bell functional is a form of the CHSH functional [73]. The GetDPS function is used to prepare a variable W that satisfies the relaxed separability criteria DPS method. The dimensions of the four subsystems are specified as [4 2 2]. The first two subsystems are passed jointly as a four dimensional subsystem to reflect that they represent an entangled two-qubit state. The remaining two subsystems represent single qubit measurements. This code creates operators for performing measurements on the entangled state in the following way. The PA1 operator represents SWAP operators between Alice's qubit and Alice's first measurement stored in the third subsystem. Similarly PB1 is the SWAP between Bob's qubit and Bob's measurement contained in the fourth subsystem of W. They are created using the sysexchange function from the mentioned Quantinf package. PA2 and PB2 are the computational basis measurements for Alice and Bob, respectively. The target variable is calculated as the trace of the operator W times the CHSH operator. The code uses YALMIP's optimize function to solve the SDP problem, with constraints defined in F, and the goal is to maximize the target value. Finally, it asserts that the computed value of the target is close to the expected value.

B.5. Implementation of the see-saw method

We will demonstrate a simple implementation of the see-saw technique discussed in section 4.8.

To this end, let us now delve into the concept of a random access code (RAC), which aims to compress a uniformly randomized n-digit string into a single digit, allowing Bob to recover any of the n digits with a high probability [12, 13]. In this scenario, Alice receives a uniformly distributed random input string x comprising n digits. She computes the value a using the function $f[\mathbf{x}]$ and transmits it to Bob. Upon receiving a and another uniformly distributed input y, Bob employs the function $g(a, y)$ to compute the value b. The protocol is considered successful if b is equal to the yth digit of x.

Let us explore a specific case of a $2^2 \rightarrow 1$ RAC. Here, Alice aims to encode two bits into a single bit. The mappings providing the optimal success probability of the task are as follows: $f(00) = 0, f(01) = 0, f(10) = 1, f(11) = 1$ . On Bob's side, the decoding is defined as follows: $g(a,1) = a$ and $g(a,2) = 0$ . The success probability of this classical RAC is $\frac{3}{4}$ .

In the quantum scenario, Alice tries to encode two bits into a qubit system. Here an encoding allowing to get the optimal quantum value is as follows. Take

$\begin{align} f\left(00\right) = \begin{bmatrix} \cos \frac{\pi}{8} \\ \sin \frac{\pi}{8} \end{bmatrix}, f\left(01\right) = \begin{bmatrix} \cos \frac{\pi}{8} \\ -\sin \frac{\pi}{8} \end{bmatrix}, f\left(10\right) = \begin{bmatrix} \sin \frac{\pi}{8} \\ \cos \frac{\pi}{8} \end{bmatrix}, f\left(11\right) = \begin{bmatrix} \sin \frac{\pi}{8} \\ -\cos \frac{\pi}{8} \end{bmatrix}. \end{align} \tag{ B.1 }$

Bob performs his decoding using the projectors

$\begin{align} P^1_0 = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, P^1_1 = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}, P^2_0 = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{bmatrix}, P^2_1 = \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} \\ -\frac{1}{2} & \frac{1}{2} \end{bmatrix}. \end{align} \tag{ B.2 }$

The optimal success probability is $\frac{1}{2} \left(1 + \frac{\sqrt{2}}{2}\right) \approx 0.853\,55$ .

The first step in the implementation of see-saw in Matlab with the YALMIP toolbox is to provide a proper definition of the variables used to express the quantum states for the encoding, see (B.1). We provide such code in listing 22. Frho is a critical component in the formulation of constraints that guarantee the validity of the operators stored within the cell-type data structure rhoCellVar as quantum qubit states. These constraints collectively ensure that the stored density matrices meet the essential criteria for a valid quantum state: they are Hermitian, PSD, and normalized. Note that even though the optimal states in (B.1) are pure, the form of SDP constraints allows for expressing them only as density matrices.

**Listing 22.** Preparation of SDP variables to store the optimized quantum states in the $2^2 \rightarrow 1$ quantum RAC for see-saw implementation with YALMIP.
Download figure:
Standard image High-resolution image

**Listing 22.** Preparation of SDP variables to store the optimized quantum states in the $2^2 \rightarrow 1$ quantum RAC for see-saw implementation with YALMIP.
Download figure:
Standard image High-resolution image

The next code segment from listing 23 focuses on defining variables and constraints related to measurements. The mCellVar cell array is employed to store measurement operators, and the Fmeas set of constraints is used to ensure the validity of these operators. The constraints stipulate that the measurement operators must be Hermitian, PSD, and sum to the identity matrix, i.e. form a POVM. The nested loops over y and b create these measurement operators and add the corresponding constraints to the Fmeas list variable. The constraint of summation to the identity in this example is given outside the loops.

The code snippet from listing 24 is setting up the remaining initialization for the see-saw optimization. The Succ variable is defined as a success probability function for the QRAC. This function is defined as the so-called lambda expression taking two parameters, and it computes the success probability based on input density matrices rhoCellIn and measurement operators mCellIn for the given quantum states and measurements. The nested loops initialize quantum states, i.e. rhoCell with random density matrices using a RandomState function, ensuring that the dimension is set to 2. Additionally, an empty cell array mCell is created to store the current measurement operators in the 'saw' step. Finally, we choose the solver settings, with SDPT3 solver in this example.

The major element of the see-saw is shown in listing 25 which represents the process involving three iterations. The objective of every step is to maximize the success probability of the quantum RAC by alternately optimizing measurement operators and quantum states. The loop consists of two main steps: the 'see' step and the 'saw' step. In the 'see' steps, the code optimizes the measurement operators mCellVar while keeping the quantum states fixed in rhoCell. After each optimization, the optimized measurements are stored as constants in the mCell variable. Then, the success probability with these updated measurements is computed and recorded in the progressTab array. In the 'saw' step, the code optimizes the quantum states rhoCellVar while keeping the measurements fixed in mCell. Similar to the 'see' step, the optimized states are stored as constants in the rhoCell variable. The success probability with these updated states is again computed and recorded in the progressTab array. The process continues for just three iterations as specified. The stopping criteria for the process can be customized based e.g. on progress in optimizing the target value. Already the second iteration provides the optimal result in this simple case.