Justifying quasiparticle self-consistent schemes via gradient optimization in Baym–Kadanoff theory

Sohrab Ismail-Beigi

doi:10.1088/1361-648X/aa7803

1. Introduction

Single-particle approaches for computing the electronic structure of materials have proven very useful for understanding and predicting the properties of materials, particularly ab initio methods such as density functional theory (DFT) [2, 3]. The local density (LDA) or generalized gradient (GGA) approximations [3–5] for DFT provide practical computational approaches that are the de facto workhorses for obtaining total energies, atomic geometries, vibrational modes, thermodynamic data, chemical properties, kinetic barriers, etc of a great variety of materials. Aside from practical usefulness, the single-particle nature of these approaches permits one to straightforwardly analyze the link between the atomic-scale structure of the material and the resulting electronic structure, e.g., via tight-binding or nearly free-electron models. The relative straightforwardness of a single-particle framework permits one to then propose materials design principles whereby one can tune or engineer desirable materials properties. Nevertheless, there are some shortcomings to such an approach. One can categorize the main drawbacks of single-particle schemes such as DFT for electronic structure predictions into two broad categories.

The first is fundamental to the single-particle approach itself when it is applied to strongly correlated electronic systems. When the basic behavior of electrons is determined by strong and localized electronic repulsions, it is difficult to properly describe such a situation using single-particle approaches where each particle moves separately in an effective potential [6, 7]. A number of methods have been proposed to date to deal with such situations, and at present Dynamical Mean Field Theory [6, 7] represents a workable scheme with the requisite compromise between reasonable computational complexity (obtained by approximating the many-body correlated problem in certain ways) and realistic description of actual materials. Even in such cases, however, building a many-body description of the correlated system for a method such as DMFT requires inclusion of important single-particle terms that reflect the structure, local chemistry and bonding; the strong interactions are added on top of this, as exemplified by the canonical Hubbard model and its various extensions. Thus one needs an 'optimal' single-particle description to begin the process.

A second drawback is due to the ground-state nature of DFT approaches and the use of a local effective potential: even without strong correlations, a theory designed to describe the ground state with a local potential will have a difficult time predicting excited state properties such as band energies and band gaps [8–10]. In a number of cases, one can correct the main faults with self-interaction corrected approaches [4] or explicit inclusion of a degree of Fock exchange in hybrid approaches [11–13]. The popular LDA+U approach [14] falls into this category where Fock-type corrections are included for a subspace of states spanned by pre-chosen localized atomic-like orbitals. The main idea in all these methods is to add more complexity to the effective potential in order to better incorporate the important physics of Fock exchange and to remove the closely related problem of electron self-interaction that plagues the canonical DFT approximations. A more ab initio approach that does not require pre-determined localized basis sets or pre-chosen physical effects is to use the many-body perturbation theory of Green's functions. The most successful to date is the GW approximation to the self-energy [15–19]. The GW approximation delivers high quality band structures of many band insulators and simple metals and automatically includes many physical effects such as exact Fock exchange, localized Coulomb repulsion, dynamic screening, and dispersion forces. In addition, LDA + U is a static and localized approximation to GW [14], and the effective potentials used in hybrid methods generally include a subset of the physics in GW (mainly Fock exchange that is screened in some manner) [11–13]. Most GW calculations are performed perturbatively: they compute corrections to an input DFT-like electronic structure. The final result in turn depends on the input description: in cases where the LDA provides a decent starting point, GW corrections provide a good description of the electronic structure [20–25]. But in other situations, the inadequacy of the input DFT description can create quantitative errors [1, 24, 26, 27].

Ideally, one would like to overcome the starting-point dependence by doing a self-consistent calculation within the GW approach itself. One would aim to have an approach that does not assume any particular basis set or set of parameters. Among many possibilities, two methods have been used by a number of researchers. One is the quasiparticle self-consistent GW (QSGW) [1], and the other is the self-consistent COHSEX (scCOHSEX) [27]. Both move one away from having to use DFT as the starting point for a GW calculation by finding a non-interacting band structure approximately but self-consistently. What this means is that one has a parameter-free method to automatically include static and dynamic screening, Fock exchange, and certain aspects of localized Coulombic physics in a single calculation.

While these methods represent exciting developments, they are based on physical insight and/or approximation of the GW self-energy operator to yield workable schemes. A key question is if there is some theoretical sense in which one can derive an optimal non-interacting band structure for any electronic system, and what such a description would look like. Namely, do these schemes, or various modifications of them, have an a priori theoretical justification?

In this work, we answer this question positively by showing that within the appropriate total-energy scheme appropriate for Green's function methods, namely, the Baym–Kadanoff approach [28, 29] together with the Klein functional [30], quasiparticle self-consistent approaches are the most theoretically justified in the sense that they are 'closest' to the stationary point of the energy functional. The quantitative meaning of closeness is based on gradient minimization, namely the length of the gradient of the energy functional is minimized within the search space considered. Our results thus justify the use of the QSGW approach. We emphasize, however, that our main result is not only applicable to the GW approximation alone but to any self-energy operator. The QS scheme is the optimal one for generating a non-interacting band structure (in the sense of gradient optimization).

The remainder of this paper is organized as follows. Section 2 describes our notation and definitions. Section 3 reviews the Baym–Kadanoff approach specifically within the framework of finding optimal non-interacting band structures. Section 4 describes the small parameter used to organize the gradient optimization process. Section 5 describes how gradient optimization leads to quasiparticle self-consistent equations. Section 6 describes an alternative approach for optimization that one might consider based on optimizing the value of the energy functional itself, but it is shown that this approach, while tempting and easy to state, does not seem to lead to further insights or useful results. Section 7 shows, by numerical solving a simple many-body system as well as by providing more general analytical arguments, that the gradient optimization approach requires choosing the quasiparticle solution with largest weight (Z) when deciding among multiple solutions to the Dyson equation that all correspond to a single non-interacting state. Section 8 has a brief summary of the results and their implications.

2. Definitions and notation

We assume that our electronic system is governed by a time-independent and time-reversal invariant many-body Hamiltonian which means that many key physical quantities such as wave functions are real-valued and all time-dependent quantities depend only on time differences. We use atomic units so $\hbar =1$ , the unit of elementary charge $e=1$ , and the electron mass ${{m}_{\text{e}}}=1$ . Hence, energies and frequencies are interchangeable. Wherever sensible, we use matrix notation for compactness. For example, the one-particle electron Green's function for a time-independent system in the frequency ω domain, $G\left(x,{{x}^{\prime}},\omega \right)$ , is a function of three arguments. The x and ${{x}^{\prime}}$ arguments include both spatial coordinates and spin: $x=\left(\vec{r},\sigma \right)$ where $\vec{r}$ is a three-vector and $\sigma =\pm 1$ labels the two spin projections. In matrix notation, we write the matrix $G(\omega )$ whose matrix elements are $\langle x|G(\omega )|{{x}^{\prime}}\rangle =G\left(x,{{x}^{\prime}},\omega \right)$ .

We begin with the non-interacting system. It is specified by the chemical potential μ and the static (frequency-independent) Hermitian single-particle Hamiltonian H₀ with orthonormal eigenstates $|n\rangle$ and real eigenvalues ${{\epsilon}_{n}}$

$\begin{eqnarray}&&{{H}_{0}}|n\rangle ={{\epsilon}_{n}}|n\rangle.\end{eqnarray}$

The unoccupied (conduction) states have ${{\epsilon}_{n}}>\mu$ and the occupied (valence) states have ${{\epsilon}_{n}}<\mu$ . For direct comparison to the interacting Green's function and its self-energy, we separate from H₀ the part that deals with exchange and correlation ${{U}_{xc}}$ . We define

$\begin{eqnarray}&&{{H}_{0}}=T+{{U}_{\text{ion}}}+{{\phi}_{H}}+{{U}_{xc}}.\end{eqnarray}$

Here, $T=-{{\nabla}^{2}}/2$ is the electron kinetic energy operator, U_ion is the electron-ion interaction potential, the Hartree potential ${{\phi}_{H}}$ is determined by the electron density $n(x)$

$\begin{eqnarray}&&n(x)=\underset{n}{\sum}\,\theta \left(\mu -{{\epsilon}_{n}}\right)|{{\psi}_{n}}(x){{|}^{2}}\end{eqnarray}$

via

$\begin{eqnarray}&&{{\phi}_{H}}(x)=\underset{{{\sigma}^{\prime}}}{\sum}\,{\int}^{}\text{d}{{r}^{\prime}}~\frac{n\left({{x}^{\prime}}\right)}{||\vec{r}-{{{\vec{r}}}^{\prime}}||}.\end{eqnarray}$

The static and Hermitian ${{U}_{xc}}$ aims to approximate exchange and correlation effects. In DFT, ${{U}_{xc}}$ is a local potential. Here, we consider a more general non-local ${{U}_{xc}}$ , ${{U}_{xc}}\left(x,{{x}^{\prime}}\right)\ne 0$ when $x\ne {{x}^{\prime}}$ .

The frequency domain non-interacting Green's function ${{G}_{0}}(\omega )$ is given by

$\begin{eqnarray}&&{{G}_{0}}{{(\omega )}^{-1}}=\omega I-{{H}_{0}}=\omega I-T-{{U}_{\text{ion}}}-{{\phi}_{H}}-{{U}_{xc}}.\end{eqnarray}$

We note that for a fixed nuclear configuration and thus fixed U_ion, ${{U}_{xc}}$ determines G₀ and vice versa. This is a useful parameterization of G₀ that we will use below. Finally, the non-interacting one-particle density matrix ${{\rho}_{0}}$ is given by

$\begin{eqnarray}&&{{\rho}_{0}}\left(x,{{x}^{\prime}}\right)=\underset{n}{\sum}\,\theta \left(\mu -{{\epsilon}_{n}}\right){{\psi}_{n}}(x){{\psi}_{n}}{{\left({{x}^{\prime}}\right)}^{\ast}}\end{eqnarray}$

whose diagonal is the density $n(x)={{\rho}_{0}}\left(x,{{x}^{\prime}}\right)$ .

In the frequency domain, the interacting Green's function is given compactly by the Dyson equation

$\begin{eqnarray}&&G{{(\omega )}^{-1}}=\omega I-T-{{U}_{\text{ion}}}-{{\phi}_{H}}-{{ \Sigma }_{xc}}(\omega )\end{eqnarray} \tag{ 1 }$

and the self-energy ${{ \Sigma }_{xc}}(\omega )$ is frequency-dependent (dynamic) and non-Hermitian and encodes the complex exchange and correlation effects of the many-body system. The Dyson equation can be rewritten as

$\begin{eqnarray}&&{{G}^{-1}}(\omega )=G_{0}^{-1}(\omega )-\left[{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}\right].\end{eqnarray} \tag{ 2 }$

Therefore, to find the true interacting Green's function, we must replace the static, Hermitian ${{U}_{xc}}$ by the dynamic, non-Hermitian ${{ \Sigma }_{xc}}(\omega )$ .

The interacting density matrix ρ is most compactly specified by taking the zero-time value of time-domain Green's function $\mathcal{G}\left(x,{{x}^{\prime}},t\right)$

$\begin{eqnarray}&&\rho \left(x,{{x}^{\prime}}\right)=-\text{i}\mathcal{G}\left(x,{{x}^{\prime}},t={{0}^{-}}\right).\end{eqnarray}$

Here, $\mathcal{G}(t)$ is the Fourier transform of $G(\omega )$ and ${{0}^{-}}$ is a negative infinitesimal. This density matrix appears in the expression for the Fock exchange operator.

Finally, for immediate use below, we define two trace operators. For any matrix A, we let $\text{tr}\{A\}$ denote the standard definition

$\begin{eqnarray}&&\text{tr}\{A\}\equiv {\int}^{}\text{d}x\,A(x,x)=\underset{n}{\sum}\,\langle n|A|n\rangle.\end{eqnarray}$

Given a matrix that is a function of frequency, $B(\omega )$ , we define the shorthand $\text{Tr}\{B\}$ to stand for the integral

$\begin{eqnarray}&&\text{Tr}\{B\}\equiv {\int}_{-\infty}^{\infty}\frac{\text{d}\omega}{2\pi \text{i}}{{\text{e}}^{\text{i}\omega {{0}^{+}}}}\text{tr}\left\{B(\omega )\right\}\end{eqnarray}$

where one can convert the integral to a closed contour integral by going over the upper complex ω plane by using the convergence factor ${{\text{e}}^{\text{i}\omega {{0}^{+}}}}$ where ${{0}^{+}}$ is a positive infinitesimal.

3. Baym–Kadanoff approach

The basic point of the approach of Baym and Kadanoff [28, 29] is that both the ground-state total energy and the interacting Green's function $G(\omega )$ can be obtained by finding the stationary point of an energy functional of G. For simplicity, we concentrate on the Klein functional [30], a functional of both G and the non-interacting G₀ given by

$\begin{eqnarray}&&\begin{array}{*{35}{l}} F\left[G,{{G}_{0}}\right]={{E}_{H}}[n]+{{ \Phi }_{xc}}[G] \\ +\text{Tr}\left\{{{H}_{0}}{{G}_{0}}+I-G_{0}^{-1}G+\ln \left[G_{0}^{-1}G\right]-\left(\phi _{H}^{0}+{{U}_{xc}}\right)G\right\}. \\\end{array}\end{eqnarray} \tag{ 3 }$

where $\phi _{H}^{0}$ is the Hartree potential for the electron density corresponding to G₀. In the above expression, the frequency dependence of $G(\omega )$ and ${{G}_{0}}(\omega )$ has been suppressed for clarity. ${{E}_{H}}[n]$ is the Hartree energy for electron density $n(x)$ :

$\begin{eqnarray}&&{{E}_{H}}[n]=\frac{1}{2}{\int}^{}\text{d}x~n(x){{\phi}_{H}}(x).\end{eqnarray}$

The functional ${{ \Phi }_{xc}}[G]$ is the exchange-correlation energy functional for the Baym–Kadanoff approach and, as in DFT, is a complicated and unknown functional of G. Formally, it has a well-defined diagrammatic expansion [31]. As in DFT, choosing an approximate form for ${{ \Phi }_{xc}}$ corresponds to including a certain approximate treatment of exchange-correlation effects. The Klein functional is not the only possible variational functional in Baym–Kadanoff theory: the Luttinger–Ward functional [31] is more widely known, is known to have better variational properties [32] but has a much more complex form. Hence, we will be focusing on the simpler Klein functional.

At the stationary point of F (for both the Klein and Luttinger–Ward forms), the value of F is the ground-state total energy, and the stationary G is the true one-particle Green's function [15, 31]. Unlike DFT, one can obtain, in principle, not just the ground state total energy and electron density but also excited state properties such as quasiparticle wave functions and band energies. Much like the Kohn–Sham DFT approach, there is a functional derivative relation between the exchange-correlation energy functional and the self-energy

$\begin{eqnarray}&&{{ \Sigma }_{xc}}(\omega )=2\pi \text{i}\frac{\delta {{ \Phi }_{xc}}[G]}{\delta G(\omega )}.\end{eqnarray} \tag{ 4 }$

To make practical progress in the Baym–Kadanoff framework, two separate types approximations are necessary. The first is the same as that encountered in DFT: one must choose some approximate ${{ \Phi }_{xc}}$ . The second challenge is that, unlike DFT where N-presentability conditions are known [33, 34], similar conditions for the Green's function $G\left(x,{{x}^{\prime}},\omega \right)$ are unknown. Namely, one does not know which subset of functions $G\left(x,{{x}^{\prime}},\omega \right)$ correspond to physically realizable Green's functions for the standard interacting electronic many-body Hamiltonian. Therefore, one can try to directly tabulate and work with an arbitrary function $G\left(x,{{x}^{\prime}},\omega \right)$ to locate the stationary point of F, which will hopefully correspond to the physical G, but such an approach is very demanding and computationally expensive. Alternatively, one can make some simplifying assumptions on the types of Green's functions considered.

Here, we restrict ourselves to using non-interacting Green's functions for G. Since it is known that F does not in fact depend on G₀ (for fixed G) [35], once we restrict G to be non-interacting, we might as well set G₀ equal to G for convenience without any change to F. This also simplifies the functional significantly to

$\begin{eqnarray}&&F\left[{{G}_{0}},{{G}_{0}}\right]=tr\left\{\left[T+{{U}_{\text{ion}}}\right]{{\rho}_{0}}\right\}+{{E}_{H}}[n]+{{ \Phi }_{xc}}\left[{{G}_{0}}\right].\end{eqnarray} \tag{ 5 }$

This energy functional contains familiar terms: the noninteracting kinetic, electron-ion, and Hartree energy plus the exchange-correlation contribution. The first three terms are identical to their DFT counterparts and depend only on ${{\rho}_{0}}$ . Only ${{ \Phi }_{xc}}$ depends on the added dynamical information in the Green's function G₀.

Due to the stationary nature of F about the optimal G, $F\left[{{G}_{0}},{{G}_{0}}\right]$ provides a variational estimate of the ground-state energy with the error being smallest for the 'best' G₀. The main problem, which we have tried to address previously [35] and which we address in this work, is how to choose a best G₀ and what this means. A tempting idea is to try to minimize or optimize $F\left[{{G}_{0}},{{G}_{0}}\right]$ over various trial G₀ or equivalently over various trial ${{U}_{xc}}$ . However, this program is highly problematic because $F\left[{{G}_{0}},{{G}_{0}}\right]$ does not have any stationary points [35]. Figure 1 illustrates the situation graphically and schematically. The stationary point of F representing the true Green's function that solves the Dyson equation (2) must correspond to a saddle point and to a dynamic and non-Hermitian self-energy, whereas if we constrain ourselves to static and Hermitian self-energies ${{U}_{xc}}$ , then F has non-zero derivatives in the whole subspace.

**Figure 1.** A schematic of the simplest scenario for the Baym–Kadanoff energy functional F. The thick solid lines are level curves of F. The self-energy ${{ \Sigma }_{xc}}(\omega )$ that parameterizes the trial Green's function $G(\omega )$ is divided into two distinct contributions: a static and Hermitian part ${{U}_{xc}}$ that parameterizes the non-interacting Green's functions G₀, and a remaining dynamical, non-Hermitian part ${{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}$ . These two independent contributions are high-dimensional matrices but are schematically shown as independent axes. The black circle represents the stationary point of the energy functional which corresponds to the true self-energy that self-consistently solves the Dyson equation (2). The horizontal axis represents the space of all non-interacting Green's functions. We see that the level curves cross this axis with no interruption reflecting the known fact that F has no stationary point when sampled along the horizontal axis [35]. As the figure illustrates, this also implies that the stationary point of F must be a saddle point.
Download figure:
Standard image High-resolution image

**Figure 1.** A schematic of the simplest scenario for the Baym–Kadanoff energy functional F. The thick solid lines are level curves of F. The self-energy ${{ \Sigma }_{xc}}(\omega )$ that parameterizes the trial Green's function $G(\omega )$ is divided into two distinct contributions: a static and Hermitian part ${{U}_{xc}}$ that parameterizes the non-interacting Green's functions G₀, and a remaining dynamical, non-Hermitian part ${{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}$ . These two independent contributions are high-dimensional matrices but are schematically shown as independent axes. The black circle represents the stationary point of the energy functional which corresponds to the true self-energy that self-consistently solves the Dyson equation (2). The horizontal axis represents the space of all non-interacting Green's functions. We see that the level curves cross this axis with no interruption reflecting the known fact that F has no stationary point when sampled along the horizontal axis [35]. As the figure illustrates, this also implies that the stationary point of F must be a saddle point.
Download figure:
Standard image High-resolution image

We discuss three avenues for avoiding this pathological situation. The first is to constrain the types of G₀ or ${{U}_{xc}}$ being considered based on physical knowledge or intuition. For example, it is known that forcing ${{U}_{xc}}$ to be a local potential is sufficient to create a minimum for the Klein functional [32, 36, 37]. However, a local potential can not directly and consistently describe a number of simple non-local effects, e.g., Fock exchange which automatically removes problematic electron self-interaction effects. Physically motivated non-local forms for ${{U}_{xc}}$ are exemplified by the QSGW or scCOHSEX approaches which are based on incorporating GW-level self-energy effects into ${{U}_{xc}}$ .

A second idea is to recast the optimization process by working with a different variable. For example, an interesting recent work [38] shows that, in principle, one can optimize the Baym–Kadanoff energy functional by using the density matrix ρ as the fundamental variable (instead of G) and have a minimum principle for the resulting energy functional. The main difficulty is that one can not avoid the fact that the stationary point of the Baym–Kadanoff functional is a saddle point, so that the optimization process involving ρ still requires an internal search at fixed ρ for a saddle point [38]. To avoid this complexity, an 'NDE2' approximation has been proposed [38] which removes this saddle point search: it basically consists of evaluating ${{ \Phi }_{xc}}$ at G₀ instead of at G (very similar in spirit to equation (5)). It is our belief that the NDE2 will suffer from this same problems discussed above: ${{ \Phi }_{xc}}\left[{{G}_{0}}\right]$ has no lower bound and the minimization will drive the system to an unphysical minimum with negative infinite energy [35]. However, future studies are needed to carefully evaluate this matter. Overall, the density matrix approach to Baym–Kadanoff theory is a promising new idea in the field.

A third approach is to come up with a different mathematical definition of the 'best' G₀ which does not rely on naïve optimization of F. We follow this more mathematical approach which will yield a self-consistent quasiparticle scheme. Instead of trying to satisfy the impossible condition $\delta F/\delta {{G}_{0}}=0$ , we will minimize the length of the gradient of F. Specifically, we minimize the square length of the gradient $|\delta F/\delta {{ \Sigma }_{t}}{{|}^{2}}$ where ${{ \Sigma }_{t}}$ is a trial self-energy. Our results will be general as we will not assume any specific approximation for the self-energy so that the main results will hold for any chosen form of ${{ \Phi }_{xc}}$ . Our approach has similarities to the optimized effect potential (OEP) method [39–41] in DFT. In OEP, one finds the optimal local potential corresponding to the minimum of some general and possibly orbital dependent exchange-correlation energy functional ${{E}_{xc}}$ : one has a formalism to find the self-consistent local Kohn–Sham potential corresponding to the total energy functional. Hence, both approaches are looking for an optimized and simplified single-particle description (a local potential in OEP and a non-local H₀ in Baym–Kadanoff theory). However, whereas in OEP one can find a local potential that minimizes the total energy functional so the functional derivative of the total energy versus the potential is zero, in the Baym–Kadanoff setting one must settle for a non-zero derivative so that the structure and interpretation of the resulting equations for the non-local single-particle Hamiltonian are more complex.

4. Key small parameter γ

As explained above, this paper is focused primarily on the use of non-interacting Green's functions G₀. We will be working primarily in frequency space, which is the natural representation when discussing quasiparticle energies and how to deal with the frequency dependence of ${{ \Sigma }_{xc}}(\omega )$ . In what follows, we will be employing time-ordered non-interacting Green's functions ${{G}_{0}}(\omega )$ of the following form:

$\begin{eqnarray}&&{{G}_{0}}(\omega )=\underset{n}{\sum}\,\frac{|n\rangle \langle n|}{\omega -{{\epsilon}_{n}}-\text{i}\gamma {{s}_{n}}}\end{eqnarray} \tag{ 6 }$

where ${{s}_{n}}={\rm sgn} \left(\mu -{{\epsilon}_{n}}\right)$ . We choose the real-valued quantity $\gamma >0$ to be small and fixed. Note that we are choosing to use this class of G₀.

In standard textbook treatments [42, 43], one finds the form of equation (6) where the symbol η takes the place of γ, and $\eta \to {{0}^{+}}$ is understood or imposed at some point. The quantity η in these standard treatments is a mathematical quantity: an infinitesimal positive that is needed to ensure that Fourier transformations are absolutely convergent when Fourier transforming from time to frequency domain. Often, it is directly included or absorbed into the definition of the Heaviside θ function for the time-ordering operation [43].

In our case, however, γ is positive and small but always finite. We do not send it to zero. It is a fixed number imposed by us manually with a simple physical meaning: it is a damping rate or inverse lifetime for the ${{\epsilon}_{n}}$ energy states. We are giving our non-interacting states in our input Green's function G₀ of equation (6) a finite but very small decay rate γ which is equivalent to a very large but finite lifetime ${{\gamma}^{-1}}$ .

Mathematically, we will see that γ acts as a small parameter that (i) regularizes the mathematical expressions we compute by ensuring that they are finite (i.e., avoiding division by zero in energy denominators), and (ii) permits rank-ordering of dominant versus subdominant contributions. Hence, from a mathematical viewpoint, it is necessary to keep γ finite.

However, from a physical viewpoint, the finite value of γ is hardly a restriction. We can take γ to be very small so that the imposed lifetime ${{\gamma}^{-1}}$ can be quite long (a day, a month, a year) and certainly longer than any contemplated experiment which would measure the electronic response. Physically, we need to make γ small enough to resolve and not spoil intrinsic energetic shifts, broadenings and lifetimes stemming from electron interactions and scattering. For example, if the system has a quasiparticle energy gap $\Delta$ , then $\gamma \ll \Delta$ is required to resolve this gap precisely. Or, if a low-energy quasiparticle state has a lifetime τ due to electron–electron scattering, then $\gamma \ll 1/\tau$ is needed in order to correctly obtain this lifetime via a calculation of the Green's function. Finally, turning on electron–electron interactions modifies the spectrum of the Green's function (the spectral function) away from its non-interacting analogue. As per equation (2), the energy scale of such changes is determined by the magnitude of the matrix elements of ${{ \Sigma }_{xc}}-{{U}_{xc}}$ , so we require γ to be smaller than these matrix elements to ensure well-converged spectra. This last requirement is directly reflected in the key equations of our analysis below: ratios of matrix elements to γ

$\begin{eqnarray}&&\frac{\langle n|{{ \Sigma }_{xc}}-{{U}_{xc}}|m\rangle}{\gamma}\end{eqnarray}$

appear below and represent the large quantities that must be minimized to obtain the optimal ${{U}_{xc}}$ .

5. Shortest gradient of F

In this section, we implement the standard idea of minimizing the length of the gradient vector: the smaller the gradient of F, the closer we should be to the stationary point. Specifically, we seek the optimum non-interacting G₀ that delivers the shortest gradient of F.

We begin with the following expression for the variation of F versus G (for fixed G₀ and arbitrary G) that is derived by differentiating equation (3):

$\begin{eqnarray}\begin{array}{*{35}{l}} \delta F & = & \text{Tr}\left\{\left[{{G}^{-1}}-G_{0}^{-1}+{{ \Sigma }_{xc}}-{{U}_{xc}}\right]\delta G\right\}. \\\end{array}\end{eqnarray} \tag{ 7 }$

As a reminder, ${{ \Sigma }_{xc}}=2\pi \text{i}\delta {{ \Phi }_{xc}}/\delta G$ . As a matrix derivative, this is equivalent to

$\begin{eqnarray}&&2\pi \text{i}\frac{\delta F}{\delta G(\omega )}=G{{(\omega )}^{-1}}-{{G}_{0}}{{(\omega )}^{-1}}+{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}.\end{eqnarray} \tag{ 8 }$

Setting this matrix derivative to zero locates the saddle point and automatically yields the Dyson equation (2) for the Green's function.

In what follows, it will be convenient to change variables. Instead of varying G directly, we will vary G via variation of a trial self-energy ${{ \Sigma }_{t}}$ :

$\begin{eqnarray}&&{{G}^{-1}}(\omega )=G_{0}^{-1}(\omega )-\left[{{ \Sigma }_{t}}(\omega )-{{U}_{xc}}\right]\end{eqnarray} \tag{ 9 }$

Choosing ${{ \Sigma }_{t}}$ to coincide with the self-energy ${{ \Sigma }_{xc}}$ that solves the Dyson equation locates the saddle point of F as per equation (7). Matrix differentiation of G gives

$\begin{eqnarray}&&\delta G(\omega )=G(\omega )\delta {{ \Sigma }_{t}}(\omega )G(\omega ).\end{eqnarray}$

Using the cyclicity of the trace, the variation of F is

$\begin{eqnarray}&&\delta F=\text{Tr}\left\{G\left[{{G}^{-1}}-G_{0}^{-1}+{{ \Sigma }_{xc}}-{{U}_{xc}}\right]G\,\delta {{ \Sigma }_{t}}\right\}\end{eqnarray} \tag{ 10 }$

which corresponds to the matrix derivative

$\begin{eqnarray}&&D(\omega )\equiv 2\pi \text{i}\frac{\delta F}{\delta {{ \Sigma }_{t}}(\omega )}=G\left({{G}^{-1}}-G_{0}^{-1}+{{ \Sigma }_{xc}}-{{U}_{xc}}\right)G.\end{eqnarray} \tag{ 11 }$

We are interested in the case when G is non-interacting, so we set $G={{G}_{0}}$ and arrive at the simpler derivative

$\begin{eqnarray}&&{{D}_{0}}(\omega )\equiv D(\omega ){{|}_{G={{G}_{0}}}}={{G}_{0}}\left({{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}\right){{G}_{0}}.\end{eqnarray} \tag{ 12 }$

Our objective will be to minimize the squared length of the matrix ${{D}_{0}}(\omega )$

$\begin{eqnarray}&&||{{D}_{0}}|{{|}^{2}}\equiv {\int}_{-\infty}^{\infty}\text{d}\omega \,\text{tr}\left\{{{D}_{0}}{{(\omega )}^{\dagger}}{{D}_{0}}(\omega )\right\}\end{eqnarray}$

and thereby find a gradient-optimal ${{U}_{xc}}$ and associated G₀. The situation is shown schematically in figure 2. Among the set of non-interacting Green's functions parameterized by ${{U}_{xc}}$ , we are searching for the ${{U}_{xc}}$ that makes the gradient ${{D}_{0}}(\omega )$ shortest. Figuratively, we have constrained ourselves to be along the horizontal axis of the figure, and we scan along that axis to find the shortest gradient.

**Figure 2.** Schematic showing the stationary point of $F\left[G,{{G}_{0}}\right]$ (black circle), level curves of F (solid lines), and the gradient vector $D(\omega )$ from equation (11) (purple arrows). The axes represent the choices of trial self-energies ${{ \Sigma }_{t}}(\omega )$ in equation (9) that parameterize the Green's function G. The saddle point corresponds to choosing ${{ \Sigma }_{t}}$ to be the self-energy ${{ \Sigma }_{xc}}$ that generates the true Green's function *via* the Dyson equation. The gradients evaluated along the horizontal axis ${{U}_{xc}}$ , representing the subspace of non-interacting Green's functions, are ${{D}_{0}}(\omega )$ of equation (12) which are the lowest row of arrows (pointing mainly leftwards). The objective is to find the ${{U}_{xc}}$ that gives the shortest gradient along the ${{U}_{xc}}$ axis which in this example is indicated by the dashed circle.
Download figure:
Standard image High-resolution image

We note that we are not seeking the shortest gradient vector projected into the subspace of non-interacting Green's functions. That would correspond to examining how F changes due to variations $\delta {{U}_{xc}}$ which is different from the much larger set of self-energy variations $\delta {{ \Sigma }_{t}}(\omega )$ . We are varying the Green's function both along and away from the non-interacting axis and thus in any arbitrary direction. This is indicated in figure 2 by the fact that the arrows representing the gradient have components both along and perpendicular to the horizontal axis.

We aim to minimize $|\delta F/\delta {{ \Sigma }_{t}}|$ which is the length of the gradient of F versus the trial self-energy ${{ \Sigma }_{t}}$ . As we will see below, it is generally impossible to make the length zero when restricting ourselves to non-interacting Green's functions G₀. Hence, the choice of variable for the derivative will, in principle, change the resulting optimum. For example, one could try to minimize $|\delta F/\delta G|$ from equation (8) instead, and one would arrive at a different set of conditions. Therefore, choosing the variable for the derivative requires some physical motivation. Given the choice between $G(\omega )$ and ${{ \Sigma }_{t}}(\omega )$ , ${{ \Sigma }_{t}}$ is a much better choice: choosing G gives useless or nonsense results as detailed in appendix C.

To progress from these graphical ideas to analytic formulae, we calculate the squared norm $||{{D}_{0}}|{{|}^{2}}$ in the basis of the orthonormal eigenstates $|n\rangle$ by inserting complete sets of states:

$\begin{eqnarray}&&||{{D}_{0}}|{{|}^{2}}\,={\int}_{-\infty}^{\infty}\text{d}\omega \,\underset{n,m}{\sum}\,|\langle n|{{D}_{0}}(\omega )|m\rangle {{|}^{2}}.\end{eqnarray}$

Since G₀ is diagonal in this basis, this turns into

$\begin{eqnarray}&&\begin{array}{*{35}{l}} ||{{D}_{0}}|{{|}^{2}}={\int}_{-\infty}^{\infty}\text{d}\omega \underset{n,m}{\sum}\,\frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}}{|\omega -{{\epsilon}_{n}}\pm \text{i}\gamma {{|}^{2}}|\omega -{{\epsilon}_{m}}\pm \text{i}\gamma {{|}^{2}}} \\ \\ \qquad \;={\int}_{-\infty}^{\infty}\text{d}\omega \underset{n,m}{\sum}\,\frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}}{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]\left[{{\left(\omega -{{\epsilon}_{m}}\right)}^{2}}+{{\gamma}^{2}}\right]}. \\\end{array}\end{eqnarray} \tag{ 13 }$

As expected, this is a sum of strictly positive terms. Our aim is to choose a ${{U}_{xc}}$ so that $||{{D}_{0}}|{{|}^{2}}$ is as small as possible.

We will be using contour integration techniques to evaluate the frequency integral in equation (13). To do this, we need to examine the self-energy ${{ \Sigma }_{xc}}(\omega )$ in more detail. Most generally, the self-energy along the real ω axis can be written as a static term plus a sum over poles

$\begin{eqnarray}&&{{ \Sigma }_{xc}}(\omega )={{ \Sigma }_{x}}+\underset{a}{\sum}\,\frac{{{\sigma}_{a}}}{\omega -{{\xi}_{a}}}\end{eqnarray} \tag{ 14 }$

where ${{ \Sigma }_{x}}$ is the static bare exchange (Fock) operator

$\begin{eqnarray}&&{{ \Sigma }_{x}}\left(x,{{x}^{\prime}}\right)=-\rho \left(x,{{x}^{\prime}}\right)/||\vec{r}-{{\vec{r}}^{\prime}}||.\end{eqnarray}$

The energies ${{\xi}_{a}}$ locate the poles of the self-energy which have residues given by the matrices ${{\sigma}_{a}}$ . Physically, a pole of ${{ \Sigma }_{xc}}(\omega )$ occurs at an energy where there is strong quasiparticle scattering by electronic excitations, strongly reduced quasiparticle lifetimes, and a strongly damped spectral function. For example, within the GW approximation, these poles correspond to charge fluctuation excitations such as single or multiple electron–hole pairs and plasmons [15, 44]. For a finite system, such as a molecule, the energies ${{\xi}_{a}}$ and associated index a are a discrete and countable set (below the ionization threshold). Above the ionization threshold or in a solid-state system, there are continuous energy bands and thus a continuum of excitations so the sum over a will be an integral and the self-energy will have branch cuts as a function of ω. We note that it is possible to have discrete (bound) states embedded in continuum of states [45], and we will return to these distinctions below when comparing different contributions to the integral of equation (13).

Since ${{ \Sigma }_{xc}}(\omega )$ either remains finite or grows sublinearly with ω as $|\omega |\to \infty$ (see appendix D), and the denominator of equation (13) grows as ${{\omega}^{4}}$ , we can safely turn the integral in equation (13) into a closed contour integral which we choose to go over a half circle at infinity in the upper half plane (the lower half plane gives the same results). Therefore, the quantity we aim to study and minimize is

$\begin{eqnarray}&&||{{D}_{0}}|{{|}^{2}}=\underset{n,m}{\sum}\,{{S}_{nm}}\end{eqnarray} \tag{ 15 }$

where

$\begin{eqnarray}&&{{S}_{nm}}\equiv {\oint}^{}\text{d}\omega \frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}}{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]\left[{{\left(\omega -{{\epsilon}_{m}}\right)}^{2}}+{{\gamma}^{2}}\right]}.\end{eqnarray} \tag{ 16 }$

We now perform the contour integral separately for diagonal and off-diagonal contributions to equation (15) since they will scale differently as a function of γ. As a technical aside, we keep in mind that the numerator $|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}$ in equation (15) begins as a function defined along the real ω axis: when extending it to complex ω, we simply substitute in complex ω into the original functional form (i.e., it is only a function of ω and not of ${{\omega}^{\ast}}$ ).

5.1. Diagonal terms

Consider a diagonal contribution ${{S}_{nn}}$ to equation (15)

$\begin{eqnarray}&&{{S}_{nn}}={\oint}^{}\text{d}\omega \frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle {{|}^{2}}}{{{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]}^{2}}}.\end{eqnarray} \tag{ 17 }$

There will be two distinct classes of poles contributing to ${{S}_{nn}}$ .

The first and obvious one is the pole coming from the zero of the denominator at $\omega ={{\epsilon}_{n}}+\text{i}\gamma$ in the upper half plane. More precisely, we have

$\begin{eqnarray}&&{{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]}^{2}}={{\left[\omega -{{\epsilon}_{n}}-\text{i}\gamma \right]}^{2}}{{\left[\omega -{{\epsilon}_{n}}+\text{i}\gamma \right]}^{2}}\end{eqnarray}$

and by using the standard Cauchy integral formula

$\begin{eqnarray}&&{{f}^{\prime}}(a)=\frac{1}{2\pi \text{i}}{\oint}^{}\frac{f(z)\text{d}z}{{{(z-a)}^{2}}}\end{eqnarray}$

the contribution from the pole at ${{\epsilon}_{n}}+\text{i}\gamma$ is

$\begin{eqnarray}\begin{array}{*{35}{l}} {} & {} & 2\pi \text{i}\frac{d}{\text{d}\omega}\left\{\frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle {{|}^{2}}}{{{\left[\omega -{{\epsilon}_{n}}+\text{i}\gamma \right]}^{2}}}\right\}{{|}_{\omega ={{\epsilon}_{n}}+\text{i}\gamma}} \\ {} & {} & \quad \quad \quad \quad \quad \quad \quad =\frac{\pi}{2{{\gamma}^{3}}}|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle {{|}^{2}}{{|}_{\omega ={{\epsilon}_{n}}+\text{i}\gamma}} \\ {} & {} & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad -\frac{\pi i}{2{{\gamma}^{2}}}\frac{d}{\text{d}\omega}\left\{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle {{|}^{2}}\right\}{{|}_{\omega ={{\epsilon}_{n}}+\text{i}\gamma}}. \\\end{array}\end{eqnarray}$

We now series expand in the small parameter γ, and find that the imaginary terms scaling as ${{\gamma}^{-2}}$ cancel as they must since the integral is manifestly positive and real-valued. We end with

$\begin{eqnarray}&&\frac{\pi}{2{{\gamma}^{3}}}|\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|n\rangle {{|}^{2}}\,+\,O\left({{\gamma}^{-1}}\right)\end{eqnarray}$

and note that the dominant contribution scales as ${{\gamma}^{-3}}$ .

The second set of contributions to ${{S}_{nn}}$ will be from poles of ${{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}$ located above the real axis, i.e., those with $\text{Im}~{{\xi}_{a}}>0$ . The analysis of the contributions from these poles is straightforward but somewhat long-winded and is detailed in appendix A. The result is that the contributions from these poles are subleading: for systems containing discrete energy levels the contributions scale as ${{\gamma}^{-1}}$ while for systems with only continuous energy bands they scale as ${{\gamma}^{0}}$ .

All together, we have

$\begin{eqnarray}&&{{S}_{nn}}=A{{\gamma}^{-3}}+O\left({{\gamma}^{-1}}\right)\end{eqnarray}$

where A originates from the pole at ${{\epsilon}_{n}}+\text{i}\gamma$ and is proportional to

$\begin{eqnarray}&&A\propto |\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|n\rangle {{|}^{2}}.\end{eqnarray}$

We focus on reducing the magnitude of the large ratio $\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|n\rangle /\gamma$ to make ${{S}_{nn}}$ as small as possible since it is dominated by the leading $A{{\gamma}^{-3}}$ term.

In the most general case, the self-energy ${{ \Sigma }_{xc}}$ will have both real and imaginary parts. By our assumption of time-reversal symmetry, ${{U}_{xc}}$ and the eigenstates of H₀ are real-valued. So we split the matrix element into real and imaginary parts

$\begin{eqnarray}&&\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|n\rangle =R+iI\end{eqnarray}$

where

$\begin{eqnarray}&&R=\langle n|\text{Re}~{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|n\rangle \end{eqnarray}$

and

$\begin{eqnarray}&&I=\langle n|\text{Im}~{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|n\rangle.\end{eqnarray}$

Thus the leading term in ${{S}_{nn}}$ is

$\begin{eqnarray}&&A{{\gamma}^{-3}}=\frac{\pi}{2{{\gamma}^{3}}}\left[{{R}^{2}}+{{I}^{2}}\right].\end{eqnarray}$

Mathematically, it is unlikely that one can set $A=0$ since the only free variable is the single real number $\langle n|{{U}_{xc}}|n\rangle$ while there are two functions R and I to set to zero. Physically, this corresponds to the fact that ${{U}_{xc}}$ is Hermitian so it can adjust real-valued energies but never give an imaginary part to the energy which would correspond to a quasiparticle lifetime; such lifetimes are described by non-zero $\text{Im}~{{ \Sigma }_{xc}}$ . Thus we can set $R=0$ by choosing $\langle n|{{U}_{xc}}|n\rangle =\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|n\rangle$ while we must settle for a non-zero I in the general case.

Luckily, there are many cases where either $I=0$ or it is not the main quantity of physical consideration. In systems with discrete energy eigenstates, the lifetimes of excitations will be infinite and thus $I=0$ . In solid state systems with continuous energy bands that have an energy gap, quasiparticles whose energies are within one energy gap of either the valence or condition band edges also have infinite lifetimes due to electron–electron interactions since there are no lower-energy states for them to decay into while conserving overall energy. Finally, in most practical GW calculations, one focuses on the real part of ${{ \Sigma }_{xc}}$ in order to correct DFT band energies (e.g., [1, 18, 27]). In all these cases, when we choose $\langle n|{{U}_{xc}}|n\rangle =\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|n\rangle$ , A becomes zero and this reduces the scaling of ${{S}_{nn}}$ from $O\left({{\gamma}^{-3}}\right)$ to $O\left({{\gamma}^{-1}}\right)$ .

Summarizing this section, for small γ, choosing the matrix element of ${{U}_{xc}}$ to obey $\langle n|{{U}_{xc}}|n\rangle =\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|n\rangle$ is the choice that will make ${{S}_{nn}}$ as small as possible. For cases where the imaginary part of ${{ \Sigma }_{xc}}$ is zero because (i) one has discrete energy spectra, (ii) the system has an energy gap and one is focused on states near the band edges, or (iii) one is ignoring the imaginary part because one is focused on the real part of ${{ \Sigma }_{xc}}$ in order to predict band energies, this choice leads to a significant reduction of ${{S}_{nn}}$ from scaling as ${{\gamma}^{-3}}$ to scaling as ${{\gamma}^{-1}}$ . In situations where we also include $\text{Im}~{{ \Sigma }_{xc}}$ , this choice is the most sensible and obvious. However, the reduction is more modest: the coefficient of the ${{\gamma}^{-3}}$ scaling is reduced and becomes $\sim |\langle n|\text{Im}~{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|n\rangle {{|}^{2}}$ . Either way, this choice requires a self-consistent process because the energy ${{\epsilon}_{n}}$ and the self-energy ${{ \Sigma }_{xc}}$ both depend on ${{U}_{xc}}$ .

5.2. Off-diagonal terms

For a general off-diagonal contribution with $n\ne m$

$\begin{eqnarray}&&{{S}_{nm}}={\oint}^{}\text{d}\omega \frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}}{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]\left[{{\left(\omega -{{\epsilon}_{m}}\right)}^{2}}+{{\gamma}^{2}}\right]}\end{eqnarray} \tag{ 18 }$

we have two simple poles above the real ω axis at ${{\epsilon}_{n}}+\text{i}\gamma$ and ${{\epsilon}_{m}}+\text{i}\gamma$ as well as the poles of the numerator stemming from ${{ \Sigma }_{xc}}(\omega )$ . The two simple poles contribute the following term

$\begin{eqnarray}&&\frac{\pi}{\gamma}\centerdot \frac{|\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|m\rangle {{|}^{2}}+|\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{m}}\right)-{{U}_{xc}}|m\rangle {{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\epsilon}_{m}}\right)}^{2}}}\end{eqnarray} \tag{ 19 }$

which scales as ${{\gamma}^{-1}}$ . For the moment, we ignore the additional contributions coming from the poles of ${{ \Sigma }_{xc}}(\omega )$ and instead focus on minimizing the above contribution from the two simple poles. Specifically, our objective is to choose the ${{U}_{xc}}$ that minimizes the above expression. We envisage this as a self-consistent process where (i) we hold all quantities fixed except for $\langle n|{{U}_{xc}}|m\rangle$ which is allowed to vary to optimize the expression, (ii) we update all quantities using the new ${{U}_{xc}}$ , and (iii) we iterate to convergence.

To simplify the algebra, we define $z=\langle n|{{U}_{xc}}|m\rangle$ , ${{\kappa}_{n}}=\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|m\rangle$ , and ${{\kappa}_{m}}=\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{m}}\right)|m\rangle$ . Keeping in mind that z is real-valued due to our assumption of time-reversal invariance, the expression to be optimized is quadratic in z:

$\begin{eqnarray}&&\frac{\pi}{\gamma}\centerdot \frac{|{{\kappa}_{n}}{{|}^{2}}+|{{\kappa}_{m}}{{|}^{2}}-2z\text{Re}\left({{\kappa}_{n}}+{{\kappa}_{m}}\right)+2{{z}^{2}}}{{{\left({{\epsilon}_{n}}-{{\epsilon}_{m}}\right)}^{2}}}\end{eqnarray}$

Setting the derivative of this quadratic versus z to zero, we find the optimum

$\begin{eqnarray}&&z=\text{Re}\left({{\kappa}_{n}}+{{\kappa}_{m}}\right)/2\end{eqnarray}$

or in other words

$\begin{eqnarray}&&\langle n|{{U}_{xc}}|m\rangle =\text{Re}\left\{\frac{\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|m\rangle +\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{m}}\right)|m\rangle}{2}\right\}.\end{eqnarray}$

This choice is guaranteed to minimize the contributions to ${{S}_{nm}}$ scaling as ${{\gamma}^{-1}}$ that originate from the simple poles at ${{\epsilon}_{n}}+\text{i}\gamma$ and ${{\epsilon}_{m}}+\text{i}\gamma$ . For a system with only continuous energy bands, this is a good choice since the contributions coming from the numerator (i.e., the poles of ${{ \Sigma }_{xc}}$ ) are subleading and scale as ${{\gamma}^{0}}$ as shown in appendix A. However, for a system containing discrete energy levels, the contributions from the poles of the numerator also scale as ${{\gamma}^{-1}}$ so that the above considerations do not provide an air-tight argument. In appendix E, expressions for these contributions and estimates of their size are provided in the context of optimizing the gradient once self-consistency (QS) is achieved. Whether one should ignore them or include them in the optimization is a subject for future investigation. Finally, as we saw in the case of the diagonal contribution ${{S}_{nn}}$ , the optimal Hermitian ${{U}_{xc}}$ is only determined by the real part of ${{ \Sigma }_{xc}}$ .

5.2.1. Off-diagonal case with degeneracy

The above discussion of the off-diagonal case assumed that ${{\epsilon}_{n}}\ne {{\epsilon}_{m}}$ . Specifically, in going from the contour integral of equation (18) to the result of equation (19) it was assumed that $|{{\epsilon}_{n}}-{{\epsilon}_{m}}|>\gamma$ so that we had two distinct poles contributing. The correct way of proceeding in the case of degeneracy ${{\epsilon}_{n}}={{\epsilon}_{m}}$ is to return to equation (18) and notice that the denominator becomes identical in structure to that of the diagonal case in equation (17). In fact, the only difference with the diagonal case is the replacement of $\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle$ by $\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle$ while the remainder of the analysis remains identical. We end up with the optimal choice $\langle n|{{U}_{xc}}|m\rangle =\text{Re}\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|m\rangle$ in this degenerate case. It is gratifying that this is identical to the optimal ${{U}_{xc}}$ in the non-degenerate off-diagonal case where we simply set ${{\epsilon}_{n}}={{\epsilon}_{m}}$ .

5.3. Discussion

The main result is that the length of the gradient of the Klein energy functional F is minimized when ${{U}_{xc}}$ is chosen to satisfy

$\begin{eqnarray}&&\langle n|{{U}_{xc}}|m\rangle =\text{Re}\,\,\left\{\frac{\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)|m\rangle +\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{m}}\right)|m\rangle}{2}\right\}\end{eqnarray} \tag{ 20 }$

when γ is small.

This choice of ${{U}_{xc}}$ is identical to that of the QSGW method. In QSGW, one approximates the self-energy to its GW form and one sets the imaginary part of the self-energy to zero. The QSGW has successfully described the band structure of a wide variety of solid state systems within the GW approximation for the self-energy [1]. Therefore, in addition to its practical successes, we can say that the QSGW is also mathematically well-founded as it is the choice for ${{U}_{xc}}$ that minimizes the length of the gradient of the energy functional when approximated within GW. It is 'closest' to the interacting G obeying Dyson's equation.

A critical point of the above derivation is that it is not dependent on the GW approximation itself: the optimum choice of equation (20) holds for any self-energy ${{ \Sigma }_{xc}}(\omega )$ at any level of approximation as long as it is derived from some ${{ \Phi }_{xc}}[G]$ via ${{ \Sigma }_{xc}}=2\pi \text{i}\delta {{ \Phi }_{xc}}/\delta G$ . Namely, if we assume that the shortest gradient of the energy functional is best, the recipe of equation (20) is a general answer to the problem of choosing the best non-interacting Green's function to describe an interacting system.

We also note the significant difference between diagonal and off-diagonal elements. Having the diagonal elements obey equation (20) reduces the length of the gradient by factors of $O\left({{\gamma}^{-3}}\right)$ . If the imaginary part of the self-energy is zero or set to zero, then the reduction is very strong: the diagonal contributions in fact become reduced to $O\left({{\gamma}^{-1}}\right)$ . On the other hand, obeying equation (20) for the off-diagonal elements doesn't actually change the scaling—off-diagonals always contribute $O\left({{\gamma}^{-1}}\right)$ to the length of the gradient—but it does reduce the magnitude of the coefficients of the terms scaling as ${{\gamma}^{-1}}$ . Therefore, from a practical point of view, obeying equation (20) for the diagonal elements is of primary importance while obeying it for off-diagonals is of secondary importance. This is a way of rationalizing the observation, dating back to the earliest fully ab initio GW calculations [18], that in many (but not all) cases the most critical corrections to the quasiparticle properties are handled by the diagonal terms of the self-energy.

6. Smallest energy change $\Delta \boldsymbol{F}$

A alternative approach to quantifying which non-interacting Green's function G₀ is 'best' is to try to find the G₀ that generates the smallest deviation of F from its value at the saddle point. Namely, when scanning along the horizontal axis of figure 1, one looks for the ${{U}_{xc}}$ that generates the G₀ so that $F\left[{{G}_{0}},{{G}_{0}}\right]$ is as close as possible to the true total energy $F\left[\bar{G},{{G}_{0}}\right]$ where $\bar{G}$ solves the Dyson equation (2). Specifically, what we would like to minimize is the magnitude of the energy difference $\Delta F$ defined as

$\begin{eqnarray} &&\Delta F\left[{{G}_{0}}\right]\equiv F\left[{{G}_{0}},{{G}_{0}}\right]-F\left[\bar{G},{{G}_{0}}\right].\end{eqnarray} \tag{ 21 }$

To make headway analytically, we will assume that the 'best' G₀ is sufficiently close to the saddle point so that the difference $\bar{G}-{{G}_{0}}$ or equivalently ${{ \Sigma }_{xc}}-{{U}_{xc}}$ is small enough for a quadratic approximation of F to be accurate. With the quadratic assumption, we can use the general fact that the value of a quadratic function is given by half the dot product of its gradient times the displacement from its stationary point. From equation (10), the gradient of F is $G\left[{{G}^{-1}}-G_{0}^{-1}+{{ \Sigma }_{xc}}-{{U}_{xc}}\right]G$ . The displacement is ${{U}_{xc}}-{{ \Sigma }_{xc}}$ . We are evaluating all these expressions at $G={{G}_{0}}$ . The situation is graphically illustrated in figure 3. Therefore, the quadratic approximation form for $\Delta F$ is¹

$\begin{eqnarray} &&\Delta F=-\frac{1}{2}Tr\left\{{{G}_{0}}\left[{{ \Sigma }_{xc}}-{{U}_{xc}}\right]{{G}_{0}}\left({{ \Sigma }_{xc}}-{{U}_{xc}}\right)\right\}.\end{eqnarray} \tag{ 22 }$

**Figure 3.** Please see the caption of figure 2 for the meaning of the axes, *etc*. The green vector ${{U}_{xc}}-{{ \Sigma }_{xc}}$ represents the deviation of ${{U}_{xc}}$ from the stationary self-energy ${{ \Sigma }_{xc}}$ and connects the saddle point to the chosen ${{U}_{xc}}$ on the horizontal axis. The dashed circle represents the most desirable ${{U}_{xc}}$ since at that point the energy F has the same value as it does at the saddle point.
Download figure:
Standard image High-resolution image

**Figure 3.** Please see the caption of figure 2 for the meaning of the axes, *etc*. The green vector ${{U}_{xc}}-{{ \Sigma }_{xc}}$ represents the deviation of ${{U}_{xc}}$ from the stationary self-energy ${{ \Sigma }_{xc}}$ and connects the saddle point to the chosen ${{U}_{xc}}$ on the horizontal axis. The dashed circle represents the most desirable ${{U}_{xc}}$ since at that point the energy F has the same value as it does at the saddle point.
Download figure:
Standard image High-resolution image

An explicit expression in terms of integrals and matrix elements is

$\begin{eqnarray} &&\Delta F=-\frac{1}{2}{\int}_{-\infty}^{\infty}\frac{\text{d}\omega}{2\pi \text{i}}{{\text{e}}^{\text{i}\omega {{0}^{+}}}}\underset{n,m}{\sum}\,\\ &&\qquad \ \times \,\frac{\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle \langle m|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle}{\left(\omega -{{\epsilon}_{n}}-\text{i}\gamma {{s}_{n}}\right)\left(\omega -{{\epsilon}_{m}}-\text{i}\gamma {{s}_{m}}\right)}.\end{eqnarray} \tag{ 23 }$

One can proceed by closing the integrals over the upper complex half plane and computing the residues of the integral with separate contributions from the poles in the denominator as well as the poles of ${{ \Sigma }_{xc}}(\omega )$ in the numerator. Appendix B contains the details which produce algebraic expressions that do not—for this author—provide insight into how one should proceed.

Aside from the algebraic complexities, there are two other higher level challenges with this approach. First, one is trying to reduce the magnitude of $\Delta F$ or equivalently make it as close to zero as possible. However, since we are close to a saddle point, $\Delta F$ will take both positive and negative values which makes the optimization much more challenging than the minimization of a function bounded from below. Second, unlike the previous approach of minimizing the length of the gradient, there is no obvious small parameter such as γ to permit us to perform the optimization process in an organized fashion and to identify the largest terms that must be handled first. Hence, either this smallest- $\Delta F$ -approach is inherently difficult, or a new idea is needed that will take it in a more successful direction. This is an open question.

7. Application of the shortest gradient

In this section, we apply the shortest gradient approach and its associated quasiparticle self-consistent (QS) scheme to understand its behavior in interacting electronic systems. Since this approach justifies practical schemes such as the QSGW [1], the success of QSGW in predicting realistic electronic properties of a wide class of materials may be viewed as an 'application' of the method. Of course, at its best, QSGW is only as accurate as the underlying GW approximation for the self-energy.

We apply the shortest gradient approach to a solvable model many-body system for which we know the exact self-energy. In this way, we can make some conclusions about how this approach and its associated QS scheme describe a many-body system. We focus on a situation where there are multiple quasiparticle solutions corresponding to a single non-interacting state. The existence of multiple solutions is a hallmark of an interacting many-body system, typically with a multi-determinant ground state. A number of realizations of this multiplicity are found in the literature for model systems as well as realistic molecules and materials [46–49].

We study a two-site Hubbard model with a single orbital per site and single electron per site: this Hamiltonian has also been found useful in studying multiple solution situations in prior work [47]. The Hamiltonian is

$\begin{eqnarray}&&\hat{H}=-t\underset{\sigma \in \left\{\uparrow,\downarrow \right\}}{\sum}\,\left(\hat{c}_{1,\sigma}^{\dagger}{{\hat{c}}_{2,\sigma}}+\hat{c}_{1,\sigma}^{\dagger}{{\hat{c}}_{2,\sigma}}\right)+U\underset{i=1,2}{\sum}\,{{\hat{n}}_{i,\uparrow}}{{\hat{n}}_{i,\downarrow}}\end{eqnarray}$

where ${{\hat{n}}_{\text{i}\sigma}}=\hat{c}_{i,\sigma}^{\dagger}{{\hat{c}}_{i,\sigma}}$ , σ is the spin index, and $i\in \left\{1,2\right\}$ labels the two sites. Due to its simplicity, high symmetry, and small Hilbert space, this Hamiltonian is readily diagonalized by hand for any number of electrons $N\in \left\{0,1,2,3,4\right\}$ . The ground state for $N=2$ is a spin singlet. With some minor effort, one can compute the exact one particle Green's functions analytically.

As expected from symmetry considerations and the singlet ground state, the Green's function is diagonal in spin and also diagonal in the basis of bonding (b) and anti-bonding (a) single-particle states,

$\begin{eqnarray}&&|b,\sigma \rangle =\frac{1}{\sqrt{2}}\left(|1,\sigma \rangle \,+\,|2,\sigma \rangle \right)\,,\,\,|a,\sigma \rangle =\frac{1}{\sqrt{2}}\left(|1,\sigma \rangle \,-\,|2,\sigma \rangle \right),\end{eqnarray}$

with non-interacting energy eigenvalues $\epsilon _{b}^{0}=U/2-t$ and $\epsilon _{a}^{0}=U/2+t$ (the constant $U/2$ is the Hartree potential which we include in the non-interacting eigenvalue as per section 2). The quantity of interest to us is the exact self-energy for exchange and correlation which has two separate forms (bonding and anti-bonding):

$\begin{eqnarray}&&{{ \Sigma }_{b}}(\omega )=\frac{{{U}^{2}}/4}{\omega -U/2-3t-\text{i}\gamma}~,~~{{ \Sigma }_{a}}(\omega )=\frac{{{U}^{2}}/4}{\omega -U/2+3t+\text{i}\gamma}.\end{eqnarray} \tag{ 24 }$

Due to the diagonal nature of the problem in the $a,b$ basis, the shortest gradient is achieved when the QS condition for the self-consistent single-particle ${{\epsilon}_{i,n}}$ is satisfied:

$\begin{eqnarray}&&{{\epsilon}_{j,n}}-\epsilon _{j}^{0}=U_{xc}^{j,n}=\text{Re}{{ \Sigma }_{j}}\left({{\epsilon}_{j,n}}\right).\end{eqnarray}$

Here $j\in \left\{a,b\right\}$ , and $n\in \left\{1,2\right\}$ labels the two solutions to this equation. The solutions are never at the poles of ${{ \Sigma }_{j}}(\omega )$ so when solving the above equation we set $\gamma =0$ . This can be rewritten as the Dyson equation for the eigenvalue:

$\begin{eqnarray}&&{{\epsilon}_{j,n}}=\epsilon _{j}^{0}+\text{Re}{{ \Sigma }_{j}}\left({{\epsilon}_{j,n}}\right).\end{eqnarray}$

Figure 4 shows the usual graphical solution to this equation for weak interaction $U/t=1$ and strong interaction $U/t=10$ . For each of a and b, there are two solutions: one solution occurs where the slope of ${{ \Sigma }_{j}}(\omega )$ is large, and the other solution where the slope is small. Hence, the quasiparticle weights ${{Z}_{j,n}}=1/\left(1- \Sigma _{j}^{\prime}\left({{\epsilon}_{j,n}}\right)\right)$ for the solutions are small and large, respectively.

**Figure 4.** Graphical solution to the QS self-consistency condition for the two-site single-orbital Hubbard model at half filling for $U/t=1$ (left) and $U/t=10$ (right). An energy solution occurs when the thin blue straight line crosses the thick red curves.
Download figure:
Standard image High-resolution image

Figure 5 shows the evolution of the QS energies and quasiparticle weights as a function of interaction strength $U/t$ . As expected, for small $U/t$ , one set of solutions has Z close to unity with energies evolving smoothly into the non-interacting $\epsilon _{j}^{0}$ as $U/t\to 0$ ; the second set of solutions have very small Z in this limit. For large $U/t$ , the system is strongly correlated and does not resemble a single-particle system: all solutions show $Z\to 1/2$ which means that the system becomes impossible to describe with a single Slater determinant. For $U/t\to \infty$ , the QS energies tend to zero or U which are the 'Hubbard bands' for this simple system (the lower and upper Hubbard bands each have half the spectral weight in each a or b channel).

**Figure 5.** Evolution of the QS energy eigenvalues ${{\epsilon}_{j,n}}$ and their associated quasiparticle weights ${{Z}_{j,n}}$ versus $U/t$ for the two-site single-band Hubbard model at half filling. Top panels show Z and bottom panels show , and the colors of the curves correspond for a pair of vertical panels (e.g., the large Z_b bonding solution in solid blue in the upper left corresponds to the lower energy bonding energy ${{\epsilon}_{b}}$ in solid blue in the lower left).
Download figure:
Standard image High-resolution image

Due to the existence of multiple (here two) QS solutions for each non-interacting state (b or a), the QS scheme itself does not provide us with a unique solution for the energy states. For each spin channel, any of the four possible combinations of solutions are self-consistent in the QS sense. We now use the criterion of shortest gradient to make a unique choice. We compute the square length of the gradient from equation (13) given by the integral

$\begin{eqnarray}&&||{{D}_{0}}|{{|}^{2}}=\underset{j,\sigma}{\sum}\,{\int}_{-\infty}^{\infty}\text{d}\omega \frac{|{{ \Sigma }_{j}}(\omega )-U_{xc}^{j,n}{{|}^{2}}}{{{\left[{{\left(\omega -{{\epsilon}_{j,n}}\right)}^{2}}+{{\gamma}^{2}}\right]}^{2}}}\end{eqnarray}$

Due to the high symmetry of the system, in addition to spin degeneracy, the integrals for bonding $j=b$ and anti-bonding $j=a$ have identical values when one uses corresponding solutions (high Z or low Z). Therefore, we omit the spin index and set $j=b$ . Furthermore, since the integrals will scale as $1/\gamma$ for small γ, we focus on

$\begin{eqnarray}&&{{I}_{b,n}}=\gamma {\int}_{-\infty}^{\infty}\frac{|{{ \Sigma }_{b}}(\omega )-U_{xc}^{b,n}{{|}^{2}}}{{{\left[{{\left(\omega -{{\epsilon}_{b,n}}\right)}^{2}}+{{\gamma}^{2}}\right]}^{2}}}.\end{eqnarray}$

We numerically evaluate the integrals and tabulate the results in table 1. The table shows that asking for the shortest gradient is a non-ambiguous procedure that clearly favors choosing the large Z solution for any $U/t$ .

Table 1. Scaled squared length of the gradient ${{I}_{b,n}}$ for each choice of QS solution associated with quasiparticle weight ${{Z}_{b,n}}$ . Values of ${{I}_{b,n}}$ listed in the table are found by first numerically integrating for a set of finite and decreasing values of γ and then extrapolating to $\gamma =0$ .

$U/t$	${{Z}_{b,1}}$	${{I}_{b,1}}$	${{Z}_{b,2}}$	${{I}_{b,2}}$
1	0.985	1.44 $\times {{10}^{-3}}$	0.0149	2.74 $\times {{10}^{4}}$
4	0.854	0.185	0.146	214
8	0.724	0.912	0.276	43.07
12	0.658	1.70	0.342	23.3

An order magnitude estimate of the integral for small γ is provided by the linearization ${{ \Sigma }_{b}}(\omega )-U_{xc}^{b,n}\approx \Sigma _{b}^{\prime}\left({{\epsilon}_{b,n}}\right)\left(\omega -{{\epsilon}_{b,n}}-\text{i}\gamma \right)$ near the QS energy solution, where the term in $\text{i}\gamma$ comes from series expansion of ${{ \Sigma }_{b}}(\omega )$ in γ. The resulting integrals are ratios of simple polynomials and yield $\pi | \Sigma _{b}^{\prime}\left({{\epsilon}_{b,n}}\right){{|}^{2}}=\pi |1-1/{{Z}_{b,n}}{{|}^{2}}$ . However, this underestimates the tabulated value by a factor of two. This estimate has neglected the contribution from the pole of ${{ \Sigma }_{b}}(\omega )$ which, as explained in section 5, also contributes a term scaling as $1/\gamma$ . In this particular model system, the two contributions are equal. Appendix E presents arguments for general cases, beyond this model system, showing that the gradient should be shortest when the solution is chosen that has maximum Z for each non-interacting state.

In this section, we have shown that for systems which present multiple QS solutions corresponding to a single non-interacting state, gradient optimization will choose the solution with largest quasiparticle weight Z. This is sensible since one is asking for the best non-interacting description of an interacting problem, and states with the largest Z are the most non-interacting (i.e., they are the ones best described by a single Slater determinant ground state). This is the best we can do with with a non-interacting description: a non-interacting single-particle Hamiltonian H₀ will have one eigenvalue for each eigenvector and can not describe multiple solutions for the same vector. The gradient optimization approach gives a rational basis for choosing solutions with largest Z for each non-interacting state.

8. Summary

The Baym–Kadanoff approach provides a total energy functional of trial one-particle Green's functions that has a stationary point at the physically correct Green's function that solves the Dyson equation. In addition, it provides a recipe for computing the self-energy via differentiation of an exchange-correlation energy functional. In practice, dealing with arbitrary Green's functions is computationally complex and also conceptually difficult as the representability criteria for physical Green's functions are not known. One way forward is to restrict oneself to simpler non-interacting Green's functions.

We have described two approaches to finding the 'best' non-interacting Green's function. The first is based on minimizing the magnitude of the derivative of the Klein energy functional, and this approach produces definite results that form the main body of this paper. The second approach is based on minimizing the error in total energy of the Klein functional between the trial state and the exact state, but this idea needs further development to be useful.

The gradient minimization approach yields a set of equations for the non-interacting Green's function that are identical to those of the quasiparticle self-consistent GW (QSGW) scheme [1]. This means that this type of approach has a firm, a priori foundation. In addition, we have shown that the resulting quasi-particle self-consistent equations are not unique to the GW approximation but hold for any exchange-correlation functional. Namely, the equations are the same for any self-energy in Baym–Kadanoff theory when using the Klein functional.

Separately, by applying the gradient optimization method to a simple but non-trivial many-body system as well as providing more general arguments, we described how this approach chooses the 'best' non-interacting Green's function in cases where there are multiple quasiparticle solutions corresponding to the same non-interacting state. Specifically, gradient optimization favors choosing the solution with largest quasiparticle weight Z: this is intuitively sensible when approximating an interacting system with non-interacting single-particle states since the states with largest Z have the largest non-interacting component.

Finally, not only does our work justify quasiparticle self-consistent schemes, but it also provides theoretical insight and justification as to why a 'diagonal-only' approach for self-energy calculations is a correct starting point and yields good results whereas inclusion of off-diagonal contributions of the self-energy, while physically important in some cases, is of a subdominant nature. Both findings correlate well with practical experience and observations in the field for ab initio predictions of electronic properties.

Acknowledgments

This work has been supported primarily by the National Science Foundation under Grant No. MRSEC DMR 1119826.

Appendix A. Contribution from poles of Σxc

As stated in the text, when performing the contour integral of equation (16), which is reproduced here

$\begin{eqnarray}&&{{S}_{nm}}={\oint}^{}\text{d}\omega \frac{|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}}{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]\left[{{\left(\omega -{{\epsilon}_{m}}\right)}^{2}}+{{\gamma}^{2}}\right]},\end{eqnarray} \tag{ A.1 }$

a dominant contribution results from the poles above the real axis associated with the denominator at energies ${{\epsilon}_{n}}+\text{i}\gamma$ and ${{\epsilon}_{m}}+\text{i}\gamma$ . The remaining contributions come from the poles of the numerator $|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}$ , and in this appendix we examine these poles and the scaling of their contributions to equation (16). We find that the scaling is either ${{\gamma}^{-1}}$ or ${{\gamma}^{0}}$ depending on whether the spectrum of poles contains discrete states or only continua (bands), respectively.

Taking the matrix element of the self-energy from equation (14) between two states $|n\rangle$ and $|m\rangle$ , we have

$\begin{eqnarray}\begin{array}{*{35}{l}} \langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle & = & \langle n|{{ \Sigma }_{x}}-{{U}_{xc}}|m\rangle +\underset{a}{\sum}\,\frac{\langle n|{{\sigma}_{a}}|m\rangle}{\omega -{{\xi}_{a}}} \\ {} & = & r+\underset{a}{\sum}\,\frac{{{s}_{a}}}{\omega -{{\xi}_{a}}} \\\end{array}\end{eqnarray}$

where for compactness we have defined r and s_a and suppressed the $n,m$ indices since we analyze the contribution of a single $(n,m)$ pair.

The pole energies ${{\xi}_{a}}$ can either have a positive imaginary part $\text{Im}~{{\xi}_{a}}\geqslant \gamma >0$ or a negative one $\text{Im}\,{{\xi}_{a}}\leqslant -\gamma <0$ . A positive imaginary part represents a process moving backwards in time and thus involves holes, while a negative imaginary part means a forward moving process and thus involves electrons. The imaginary part can be larger in magnitude than γ if the associated excitation is physically damped with a significant decay rate $|\text{Im}~{{\xi}_{a}}|\gg \gamma$ , but here we take the worst-case scenario for a conservative analysis by setting $\text{Im}\,{{\xi}_{a}}=\pm \gamma$ .

We split the poles ${{\xi}_{a}}$ into the set with positive imaginary part identified with index b and superscript $+$ and the remaining ones with negative imaginary part identified with index c and superscript $-$ . We also separate out the real part of the pole energies $\xi _{a}^{\pm }$ and define them to be $\omega _{b}^{+}$ and $\omega _{c}^{-}$ . Hence, we have

$\begin{eqnarray}&&\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle =r+\underset{b}{\sum}\,\frac{s_{b}^{+}}{\omega -\omega _{b}^{+}-\text{i}\gamma}+\underset{c}{\sum}\,\frac{s_{c}^{-}}{\omega -\omega _{c}^{-}+\text{i}\gamma}\end{eqnarray} \tag{ A.2 }$

The main physical assumption we will make on the energies $\omega _{b}^{+}$ and $\omega _{c}^{-}$ is that for any b and c, we always have $\omega _{b}^{+}<\omega _{c}^{-}$ . Disobeying this inequality would mean that a scattering process for electrons ( $-$ excitations which move forward in time) has the same or smaller energy as some other scattering process for holes ( $+$ excitations which move backwards in time) which would imply an electron can scatter into a state below the Fermi energy and/or a hole can scatter into a state above the Fermi energy. In fact, one expects the opposite: an electron scatters into another electron state above the Fermi energy and emits an excitation (positive energy); a hole scatters into another hole state below the Fermi energy minus some excitation (negative energy). For an explicit example, within the GW approximation [50, 51] the inequality is never violated because $\omega _{b}^{+}={{\epsilon}_{o}}- \Omega$ where ${{\epsilon}_{o}}<\mu$ is an occupied state and $\Omega >0$ is some charge excitation such as a plasmon or electron–hole pair while $\omega _{c}^{-}={{\epsilon}_{u}}+{{ \Omega }^{\prime}}$ where ${{\epsilon}_{u}}>\mu$ is an unoccupied state and ${{ \Omega }^{\prime}}>0$ is some other charge excitation.

The integral of equation (A.1) has the square of the matrix element, and expanding out the square we have

$\begin{eqnarray}\begin{array}{*{35}{l}} {} & |\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}=|r{{|}^{2}} \\ {} & \quad \quad \quad \quad \quad \quad \quad +{{r}^{\ast}}\left[\underset{b}{\sum}\,\frac{s_{b}^{+}}{\omega -\omega _{b}^{+}-\text{i}\gamma}+\underset{c}{\sum}\,\frac{s_{c}^{-}}{\omega -\omega _{c}^{-}+\text{i}\gamma}\right] \\ {} & \quad \quad \quad \quad \quad \quad \quad +r\left[\underset{{{b}^{\prime}}}{\sum}\,\frac{s{{_{{{b}^{\prime}}}^{+}}^{\ast}}}{\omega -\omega _{{{b}^{\prime}}}^{+}+\text{i}\gamma}+\underset{{{c}^{\prime}}}{\sum}\,\frac{s{{_{{{c}^{\prime}}}^{-}}^{\ast}}}{\omega -\omega _{{{c}^{\prime}}}^{-}-\text{i}\gamma}\right] \\ {} & \quad \quad \quad \quad \quad \quad \quad +\underset{b}{\sum}\,\underset{{{b}^{\prime}}}{\sum}\,\frac{s_{b}^{+}s{{_{{{b}^{\prime}}}^{+}}^{\ast}}}{\left(\omega -\omega _{b}^{+}-\text{i}\gamma \right)\left(\omega -\omega _{{{b}^{\prime}}}^{+}+\text{i}\gamma \right)} \\ {} & \quad \quad \quad \quad \quad \quad \quad +\underset{b}{\sum}\,\underset{{{c}^{\prime}}}{\sum}\,\frac{s_{b}^{+}s{{_{{{c}^{\prime}}}^{-}}^{\ast}}}{\left(\omega -\omega _{b}^{+}-\text{i}\gamma \right)\left(\omega -\omega _{{{c}^{\prime}}}^{-}-\text{i}\gamma \right)} \\ {} & \quad \quad \quad \quad \quad \quad \quad +\underset{b}{\sum}\,\underset{c}{\sum}\,\frac{s_{c}^{-}s{{_{{{b}^{\prime}}}^{+}}^{\ast}}}{\left(\omega -\omega _{c}^{-}+\text{i}\gamma \right)\left(\omega -\omega _{{{b}^{\prime}}}^{+}+\text{i}\gamma \right)} \\ {} & \quad \quad \quad \quad \,\,\quad \quad +\underset{c}{\sum}\,\underset{{{c}^{\prime}}}{\sum}\,\frac{s_{c}^{-}s{{_{{{c}^{\prime}}}^{-}}^{\ast}}}{\left(\omega -\omega _{c}^{-}+\text{i}\gamma \right)\left(\omega -\omega _{{{c}^{\prime}}}^{-}-\text{i}\gamma \right)} \\\end{array}\end{eqnarray} \tag{ A.3 }$

We note that only a subset of these terms have poles above the real axis. Our task it to find the scaling versus γ of the contributions from such poles to the integral of equation (A.1). To avoid excessively long expressions, we define

$\begin{eqnarray}&&g(\omega )\equiv \frac{1}{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]\left[{{\left(\omega -{{\epsilon}_{m}}\right)}^{2}}+{{\gamma}^{2}}\right]}.\end{eqnarray}$

The contributions to the integral from the poles of $|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}$ are $2\pi \text{i}$ times

$\begin{eqnarray}\begin{array}{*{35}{r}} & {{r}^{\ast}}\underset{b}{\sum}\,g\left(\omega _{b}^{+}\right)s_{b}^{+}+r\underset{{{c}^{\prime}}}{\sum}\,g\left(\omega _{{{c}^{\prime}}}^{-}\right)s{{_{{{c}^{\prime}}}^{-}}^{\ast}}+\underset{b,{{b}^{\prime}}}{\sum}\,\frac{g\left(\omega _{b}^{+}\right)s_{b}^{+}s{{_{{{b}^{\prime}}}^{+}}^{\ast}}}{\omega _{b}^{+}-\omega _{{{b}^{\prime}}}^{+}+2\text{i}\gamma} \\ & +\underset{b,{{c}^{\prime}}}{\sum}\,\frac{\left[g\left(\omega _{b}^{+}\right)-g\left(\omega _{{{c}^{\prime}}}^{-}\right)\right]s_{b}^{+}s{{_{{{c}^{\prime}}}^{-}}^{\ast}}}{\omega _{b}^{+}-\omega _{{{c}^{\prime}}}^{-}} \\ & +\underset{c,{{c}^{\prime}}}{\sum}\,\frac{g\left(\omega _{{{c}^{\prime}}}^{-}\right)s_{c}^{-}s{{_{{{c}^{\prime}}}^{-}}^{\ast}}}{\omega _{{{c}^{\prime}}}^{-}-\omega _{c}^{-}+2\text{i}\gamma}. \\\end{array}\end{eqnarray}$

Generally, there is no reason to expect that any of the quasiparticle energies ${{\epsilon}_{n}}$ should precisely equal to any of the excitation energies $\omega _{b}^{+}$ or $\omega _{c}^{-}$ so that $g\left(\omega _{b}^{+}\right)$ and $g\left(\omega _{{{c}^{\prime}}}^{-}\right)$ are finite and well behaved as $\gamma \to 0$ . This is rigorously true at low energies for quasiparticle energies close to the Fermi level for a system with an energy gap where the excitations energies $\Omega$ are greater than or equal to the energy gap. Therefore, the first two terms above are not expected to scale strongly with γ. The third $\left(b,{{b}^{\prime}}\right)$ and fifth ( $c,{{c}^{\prime}}$ ) terms scale as ${{\gamma}^{-1}}$ for the case when the double sums are discrete and when the real part of their denominators vanish, while the fourth term $\left(b,{{c}^{\prime}}\right)$ is finite both by our assumption that $\omega _{b}^{+}<\omega _{c}^{-}$ as well as the fact that the summand is mathematically well behaved as $\omega _{b}^{+}\to \omega _{{{c}^{\prime}}}^{-}$ . Hence, for discrete sums, the entire contribution has leading scaling behavior ${{\gamma}^{-1}}$ . For systems supporting continuous energy bands where the b and ${{c}^{\prime}}$ sums are continuous integrals, the scaling of the third and fifth terms is reduced to ${{\gamma}^{0}}$ from those bands due to the fact that under an integral

$\begin{eqnarray}&&\frac{1}{x+2\text{i}\gamma}=P\frac{1}{x}-\text{i}\pi \delta (x)\end{eqnarray}$

when γ is very small.

In brief, in this appendix we have shown that the contributions to equation (A.1) that stem from the poles of the numerator $|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}$ generically scale as ${{\gamma}^{-1}}$ for any discrete poles and as ${{\gamma}^{0}}$ if there are only continuous bands of poles in the self energy (i.e., branch cuts).

Appendix B. Energy change ΔF

Here we provide more details for the explicit expression for $\Delta F$ in equation (23). For simplicity, let us focus on a single diagonal contribution where $n=m$ :

$\begin{eqnarray} &&\Delta {{F}_{nn}}\equiv -\frac{1}{2}{\int}_{-\infty}^{\infty}\frac{\text{d}\omega}{2\pi \text{i}}{{\text{e}}^{\text{i}\omega {{0}^{+}}}}\frac{{{\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|n\rangle}^{2}}}{{{\left(\omega -{{\epsilon}_{n}}-\text{i}\gamma {{s}_{n}}\right)}^{2}}}.\end{eqnarray}$

The exponential factor in the integrand allows us to turn this into a contour integral by closing the integral contour over the upper complex ω half plane. We then obtain two sets of contributions: residues that originate from the pole at ${{\epsilon}_{n}}+\text{i}\gamma {{s}_{n}}$ which only contribute when ${{s}_{n}}>0$ (i.e., n is an occupied state ${{\epsilon}_{n}}<\mu$ ), and residues originating from the poles of ${{ \Sigma }_{xc}}(\omega )$ in the numerator. Inserting the form for ${{ \Sigma }_{xc}}$ from equation (A.2) into the above integral and performing the contour integral, one arrives at a first expression

$\begin{eqnarray}\begin{array}{*{35}{l}} {} & {} & \Delta {{F}_{nn}}=-\theta \left(\mu -{{\epsilon}_{n}}\right)\langle n|{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}|n\rangle \langle n|\frac{\text{d}{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)}{\text{d}\omega}|n\rangle \\ {} & {} & \quad \quad \quad \quad \quad \quad \quad -r\underset{b}{\sum}\,\frac{s_{b}^{+}}{{{\left(\omega _{b}^{+}-{{\epsilon}_{n}}\right)}^{2}}}+\underset{b}{\sum}\,\frac{{{\left(s_{b}^{+}\right)}^{2}}}{{{\left(\omega _{b}^{+}-{{\epsilon}_{n}}\right)}^{3}}} \\ {} & {} & \quad \quad \quad \quad \quad \quad \quad -\underset{b<{{b}^{\prime}}}{\sum}\,\frac{s_{b}^{+}s_{{{b}^{\prime}}}^{+}}{\omega _{b}^{+}-\omega _{{{b}^{\prime}}}^{+}}\left[\frac{1}{{{\left(\omega _{b}^{+}-{{\epsilon}_{n}}\right)}^{2}}}-\frac{1}{{{\left(\omega _{{{b}^{\prime}}}^{+}-{{\epsilon}_{n}}\right)}^{2}}}\right] \\ {} & {} & \quad \quad \quad \quad \quad \quad \quad -\underset{b,c}{\sum}\,\frac{s_{b}^{+}s_{c}^{-}}{{{\left(\omega _{b}^{+}-{{\epsilon}_{n}}\right)}^{2}}\left(\omega _{b}^{+}-\omega _{c}^{-}\right)}. \\\end{array}\end{eqnarray}$

The first contribution for occupied states comes from the pole of the denominator at ${{\epsilon}_{n}}$ while the remaining terms come from the poles of ${{ \Sigma }_{xc}}$ at $\omega _{b}^{+}$ . For unoccupied states ${{\epsilon}_{n}}>\mu$ , only the second set of terms contribute. For occupied states, we can use equation (23) for the first term and perform some algebra to simplify. The final results are for an unoccupied state $n=u$ we have

$\begin{eqnarray}\begin{array}{*{35}{l}} {} & {} & \Delta {{F}_{uu}}=-r\underset{b}{\sum}\,\frac{s_{b}^{+}}{{{\left(\omega _{b}^{+}-{{\epsilon}_{u}}\right)}^{2}}}+\underset{b}{\sum}\,\frac{{{\left(s_{b}^{+}\right)}^{2}}}{{{\left(\omega _{b}^{+}-{{\epsilon}_{u}}\right)}^{3}}} \\ {} & {} & \quad \quad \quad -\underset{b<{{b}^{\prime}}}{\sum}\,\frac{s_{b}^{+}s_{{{b}^{\prime}}}^{+}}{\omega _{b}^{+}-\omega _{{{b}^{\prime}}}^{+}}\left[\frac{1}{{{\left(\omega _{b}^{+}-{{\epsilon}_{u}}\right)}^{2}}}-\frac{1}{{{\left(\omega _{{{b}^{\prime}}}^{+}-{{\epsilon}_{u}}\right)}^{2}}}\right] \\ {} & {} & \quad \quad \quad -\underset{b,c}{\sum}\,\frac{s_{b}^{+}s_{c}^{-}}{{{\left(\omega _{b}^{+}-{{\epsilon}_{u}}\right)}^{2}}\left(\omega _{b}^{+}-\omega _{c}^{-}\right)}. \\\end{array}\end{eqnarray}$

while for an occupied state $n=o$ we have

$\begin{eqnarray}\begin{array}{*{35}{l}} {} & {} & \Delta {{F}_{oo}}=r\underset{c}{\sum}\,\frac{s_{c}^{-}}{{{\left(\omega _{c}^{-}-{{\epsilon}_{o}}\right)}^{2}}}-\underset{c,{{c}^{\prime}}}{\sum}\,\frac{s_{c}^{-}s_{{{c}^{\prime}}}^{-}}{\left(\omega _{c}^{-}-{{\epsilon}_{o}}\right){{\left(\omega _{{{c}^{\prime}}}^{-}-{{\epsilon}_{o}}\right)}^{2}}} \\ {} & {} & \quad \quad \quad -\underset{b,c}{\sum}\,s_{b}^{+}s_{c}^{-}\left[\frac{1}{\left(\omega _{b}^{+}-{{\epsilon}_{o}}\right){{\left(\omega _{c}^{-}-{{\epsilon}_{o}}\right)}^{2}}}+\frac{1}{{{\left(\omega _{b}^{+}-{{\epsilon}_{o}}\right)}^{2}}\left(\omega _{c}^{-}-{{\epsilon}_{o}}\right)} \right. \\ {} & {} & \left. \quad \quad \quad \,+\frac{1}{{{\left(\omega _{b}^{+}-{{\epsilon}_{o}}\right)}^{2}}\left(\omega _{b}^{+}-\omega _{c}^{-}\right)}\right] \\\end{array}\end{eqnarray}$

Longer expressions with similar structures can be derived for non-diagonal elements $n\ne m$ . However, the main problem is that we have no hint as to how to proceed toward minimizing the magnitude of $\Delta F$ based on such expressions.

Appendix C. Unsuitability of minimizing δ F / δ G

We describe the unfavorable consequences of choosing to minimize $|\delta F/\delta G|$ instead of $|\delta F/\delta {{ \Sigma }_{t}}|$ . The objective is to show that the variable G is a poor choice to generate the derivative whose length we aim to minimize.

Beginning with the expression of equation (7) for $\delta F/\delta G(\omega )$ and evaluating it for non-interacting Green's functions $G={{G}_{0}}$ , we follow the steps in section 5 to find the derivative

$\begin{eqnarray}&&{{\mathcal{D}}_{0}}(\omega )\equiv 2\pi \text{i}\frac{\delta F}{\delta G(\omega )}{{|}_{G={{G}_{0}}}}={{ \Sigma }_{xc}}(\omega )-{{U}_{xc}},\end{eqnarray}$

the squared length to be minimized

$\begin{eqnarray}&&||{{\mathcal{D}}_{0}}|{{|}^{2}}={\int}_{-\infty}^{\infty}\text{d}\omega \,\text{tr}\left\{{{\left({{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}\right)}^{\dagger}}\left({{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}\right)\right\},\end{eqnarray}$

and an explicit expression in the non-interacting eigenbasis

$\begin{eqnarray}&&||{{\mathcal{D}}_{0}}|{{|}^{2}}={\int}_{-\infty}^{\infty}\text{d}\omega \,\underset{n,m}{\sum}\,|\langle n|{{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}|m\rangle {{|}^{2}}.\end{eqnarray}$

Using the expression for the squared matrix element in equation (A.3) and performing the ω integral, we find

$\begin{eqnarray}&&||{{\mathcal{D}}_{0}}|{{|}^{2}}=\underset{n,m}{\sum}\,\left\{|r{{|}^{2}}\left({\int}_{-\infty}^{\infty}\text{d}\omega \right)\,\,+2\pi \text{Im}\left(r\underset{b}{\sum}\,s{{_{b}^{+}}^{\ast}}+{{r}^{\ast}}\underset{c}{\sum}\,s_{c}^{-}\right)\right. \\ &&\left. \qquad \quad +\underset{b,{{b}^{\prime}}}{\sum}\,\frac{s_{b}^{+}s{{_{{{b}^{\prime}}}^{+}}^{\ast}}}{\omega _{b}^{+}-\omega _{{{b}^{\prime}}}^{+}+2\text{i}\gamma}+\underset{c,{{c}^{\prime}}}{\sum}\,\frac{s_{c}^{-}s{{_{{{c}^{\prime}}}^{-}}^{\ast}}}{\omega _{{{c}^{\prime}}}^{-}-\omega _{c}^{-}+2\text{i}\gamma}\right\},\end{eqnarray}$

remembering that $r=\langle n|{{ \Sigma }_{x}}-{{U}_{xc}}|m\rangle$ is the static part of the matrix element involving the static and nonlocal bare Fock operator ${{ \Sigma }_{x}}$ and ${{U}_{xc}}$ .

Unlike the situation in the main text where the squared length of the derivative is always finite for any $\gamma >0$ (i.e., it is a regularized expression since γ acts as an infrared cutoff), the above expression for $||{{\mathcal{D}}_{0}}|{{|}^{2}}$ is manifestly infinite and nonsensical unless $r=0$ for all choices of $n=m$ . The choice $r=0$ for all $n,m$ means ${{ \Sigma }_{x}}={{U}_{xc}}$ or that our non-interacting G₀ must correspond to the Hartree–Fock one regardless of the level of exchange and correlation we decide to include in the exchange-correlation functional ${{ \Phi }_{xc}}$ or the self-energy ${{ \Sigma }_{xc}}$ .

This is a very poor situation indeed: by choosing to minimize $|\delta F/\delta G|$ , we are forced to adopt the Hartree–Fock solution for the non-interacting Green's function G₀ not because it necessarily represents an optimum choice but simply to avoid literal infinities in our mathematical description. By contrast, minimizing $|\delta F/\delta {{ \Sigma }_{t}}|$ yields finite answers when $\gamma >0$ which permits us to proceed in our analysis and find an optimum choice of G₀.

Appendix D. Behavior of self-energy as w → ∞

If we assume that equation (14) is true and that the pole energies ${{\xi}_{a}}$ are finite, then it is clear that ${{\lim}_{\omega \to \infty}}{{ \Sigma }_{xc}}(\omega )$ is finite and equal to ${{ \Sigma }_{x}}$ . Separately, for such forms of self-energy, any excitation energies such as ${{\xi}_{a}}$ will be finite for a finite physical system solved in a finite basis set.

For the more general case, we can show that ${{ \Sigma }_{xc}}(\omega )$ grows at most sublinearly with ω as $\omega \to \infty$ without any assumptions about the functional form of ${{ \Sigma }_{xc}}(\omega )$ . This fact follows directly from the Lehmann representation of the zero-temperature, time-ordered, many-body Green's function for an N-electron system:

$\begin{eqnarray}&&G\left(x,{{x}^{\prime}},\omega \right)=\underset{s}{\sum}\,\frac{{{f}_{s}}(x){{f}_{s}}{{\left({{x}^{\prime}}\right)}^{\ast}}}{\omega -{{e}_{s}}+i{{0}^{+}}\text{sgn}\left({{e}_{s}}-\mu \right)}\end{eqnarray}$

where

$\begin{eqnarray}&&{{f}_{s}}(x)=\langle N,0|\hat{\psi}(x)|N+1,s\rangle \,\text{if} ~ {{e}_{s}}>\mu \,;\,\\ &&{{f}_{s}}(x)=\langle N-1,s|\hat{\psi}(x)|N,0\rangle \,\text{if} ~ {{e}_{s}}<\mu \end{eqnarray}$

and $|M,s\rangle$ labels the exact many-body eigenstate with M electrons and energy ${{E}_{M,s}}$ where $s=0$ is the ground state. For electron-like excitations ${{e}_{s}}={{E}_{N+1,\,s}}-{{E}_{N,0}}>\mu$ while for hole-like excitations ${{e}_{s}}={{E}_{N,0}}-{{E}_{N-1,s}}<\mu$ . The electron annihilating field operator is $\hat{\psi}(x)$ . When we send $\omega \to \infty$ , completeness of the eigenbasis and the canonical commutation relation $\left\{\hat{\psi}(x),{{\hat{\psi}}^{\dagger}}\left({{x}^{\prime}}\right)\right\}=\delta \left(x,{{x}^{\prime}}\right)$ lead to

$\begin{eqnarray}&&\underset{\omega \to \infty}{{\lim}}\,G\left(x,{{x}^{\prime}},\omega \right)=\frac{\delta \left(x-{{x}^{\prime}}\right)}{\omega}.\end{eqnarray}$

In this limit, one can easily invert the $G(\omega )$ matrix to find

$\begin{eqnarray}&&\underset{\omega \to \infty}{{\lim}}\,G{{(\omega )}^{-1}}=\omega I.\end{eqnarray}$

Comparing this with the Dyson equation (1) shows that ${{ \Sigma }_{xc}}(\omega )/\omega$ must vanish as $\omega \to \infty$ . Hence, at most, ${{ \Sigma }_{xc}}(\omega )$ grows sublinearly with ω as $\omega \to \infty$ .

Appendix E. Choosing largest Z in the multi-pole and multi-band cases

Here we describe analytical results and bounds for optimizing the length of the gradient in situations where there are multiple solutions to the QS equations. Once the self-consistency conditions in equation (20) are obeyed, since we work in the diagonal basis of the non-interaction Hamiltonian H₀ determined by this condition, the information content of the non-interacting eigenvalue equation is given by the diagonal QS energy-only Dyson-type equation

$\begin{eqnarray}&&{{\left({{H}_{0}}\right)}_{n,n}}={{\epsilon}_{n}}={{\left(T+{{U}_{\text{ion}}}+{{\phi}_{H}}+{{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)\right)}_{n,n}}.\end{eqnarray}$

For a self-energy given by the form of equation (14), which we write more explicitly here for the case of p poles with real excitation energies ${{\xi}_{a}}$ ,

$\begin{eqnarray}&&{{ \Sigma }_{xc}}(\omega )={{ \Sigma }_{x}}+\underset{a=1}{\overset{p}{\sum}}\,\frac{{{\sigma}^{a}}}{\omega -{{\xi}_{a}}\pm \text{i}\gamma}.\end{eqnarray}$

the above energy-only QS condition requires finding the roots ${{\epsilon}_{n}}$ of a polynomial of order $p+1$ (we are only interested in the real-valued solutions to describe a real-valued non-interacting energy ${{\epsilon}_{n}}$ ). The question is which of the possible $p+1$ solutions has the shortest gradient of the energy functional. We analyze the single-band and multi-band cases separately. We will be assuming time-reversal symmetry and making γ very small to simplify the mathematics.

E.1. Single-band case

Here we only have a single band n to worry about, and the question is which solution ${{\epsilon}_{n}}$ minimizes

$\begin{eqnarray}&&{{S}_{nn}}={\int}_{-\infty}^{\infty}\text{d}\omega \frac{| \Delta {{(\omega )}_{nn}}{{|}^{2}}}{{{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]}^{2}}}\end{eqnarray}$

where we define the shorthand

$\begin{eqnarray} &&\Delta (\omega )={{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}\end{eqnarray}$

and the QS condition means $\Delta {{\left({{\epsilon}_{n}}\right)}_{nn}}=0$ .

One contribution to ${{S}_{nn}}$ for very small γ comes from the pole at $\omega ={{\epsilon}_{n}}\pm i\gamma$ : near this pole, $\Delta {{(\omega )}_{nn}}\approx \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}\centerdot \left(\omega -{{\epsilon}_{n}}\pm \text{i}\gamma \right)$ , and plugging this in permits analytical evaluation of the integrals for γ very small (using ${\int}^{}\text{d}x\,1/\left(1+{{x}^{2}}\right){{}^{2}}={\int}^{}\text{d}x\,{{x}^{2}}/\left(1+{{x}^{2}}\right){{}^{2}}=\pi /2$ ). The second set of contributions to ${{S}_{nn}}$ for very small γ come from the poles of the self-energy near ${{\xi}_{a}}$ : for those parts of the integral, the denominator of the integrand is essentially constant and the numerator has a classic Lorentzian form which is easily integrated. We arrive at

$\begin{eqnarray}&&{{S}_{nn}}=\frac{\pi}{\gamma}\centerdot \left(| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}{{|}^{2}}+\underset{a=1}{\overset{p}{\sum}}\,\frac{|\sigma _{nn}^{a}{{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{4}}}\right).\end{eqnarray}$

Plugging in an explicit form for $\Sigma _{xc}^{\prime}\left({{\epsilon}_{n}}\right)$ gives

$\begin{eqnarray}&&{{S}_{nn}}=\frac{\pi}{\gamma}\centerdot \left(|\underset{a=1}{\overset{p}{\sum}}\,\frac{\sigma _{nn}^{a}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{2}}}{{|}^{2}}+\underset{a=1}{\overset{p}{\sum}}\,\frac{|\sigma _{nn}^{a}{{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{4}}}\right).\end{eqnarray}$

If we have a single pole ( $p=1$ ), then both terms are equal and ${{S}_{nn}}=(2\pi /\gamma )| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}{{|}^{2}}$ so that the solution with smallest $| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}|$ (i.e., largest Z) will have the smallest gradient.

For multiple poles ( $p>1$ ), we will be assuming that the matrices ${{\sigma}^{a}}$ are positive definite and Hermitian (they are such in the GW approximation regardless of whether we use the RPA approximation for W or the exact W [35]). In addition, this is a sensible assumption since it guarantees the diagonal elements of the self energy to have decreasing slopes with ω so quasiparticle weights $Z=1/\left(1-\text{d}{{ \Sigma }_{xc}}/\text{d}\omega \right)$ fall in the physical range $0\leqslant Z\leqslant 1$ . Thus, $\sigma _{nn}^{a}=\langle n|{{\sigma}^{a}}|n\rangle \geqslant 0$ will be assumed.

Given that $\sigma _{nn}^{a}\geqslant 0$ , the above expression for ${{S}_{nn}}$ has sums of p positive quantities being squared separately or squared after summation, and we can bound the values of the sum. We define a vector ${{\vec{v}}_{n}}$ with p real-valued component ${{v}_{an}}=\sigma _{nn}^{a}/\left({{\epsilon}_{n}}-{{\xi}_{a}}\right){{}^{2}}$ . Then we can write ${{S}_{nn}}$ as

$\begin{eqnarray}&&\gamma {{S}_{nn}}/\pi ={{\left({{\vec{v}}_{n}}\centerdot \vec{d}\,\right)}^{2}}+\,\parallel {{\vec{v}}_{n}}{{\parallel}^{2}}\equiv {{P}_{n}}+{{Q}_{n}}\end{eqnarray}$

where $\vec{d}=(1,1,\ldots,1){{}^{T}}$ . The first term P_n is the square of the projection ${{\vec{v}}_{n}}$ onto $\vec{d}$ , and the second term Q_n is the Euclidean squared length of ${{\vec{v}}_{n}}$ . If we fix P_n, we are forcing ${{\vec{v}}_{n}}$ to lie on a hyperplane perpendicular to $\vec{d}$ . For fixed P_n, the minimum value of Q_n occurs when ${{\vec{v}}_{n}}$ is parallel to $\vec{d}$ (i.e., all components of ${{\vec{v}}_{n}}$ are equal), in which case ${{Q}_{n}}/{{P}_{n}}=1/p$ . For fixed P_n, the maximum value of Q_n happens on the extreme corners of the hyperplane in the allowed region ${{v}_{an}}\geqslant 0$ , which means there is only one non-zero component of ${{\vec{v}}_{n}}$ ; in this case, ${{Q}_{n}}/{{P}_{n}}=1$ . Since ${{P}_{n}}=\,| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}{{|}^{2}}$ , we have the bound

$\begin{eqnarray}&&(1+1/p)\centerdot | \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}{{|}^{2}}\,\leqslant \,\gamma {{S}_{nn}}/\pi \,\leqslant \,2| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}{{|}^{2}}.\end{eqnarray}$

${{S}_{nn}}$ is clearly controlled directly by $| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}{{|}^{2}}$ , so all else being equal, smaller $| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}|$ gives smaller ${{S}_{nn}}$ and hence shorter gradients. Specifically, if any two solutions have a ratio of their $\Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}$ larger than $\sqrt{2}$ , then choosing the smaller $\Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}$ will definitely lead to a shorter gradient. It is difficult to say much more a priori without knowing more about the distribution of the p components ${{v}_{an}}$ .

E.2. Multi-band case

For multi-band cases, one must also consider the minimization of off-diagonal contributions ${{S}_{nm}}$ to the length of the gradient for $n\ne m$ . We will be focusing on the non-degenerate case ${{\epsilon}_{n}}\ne {{\epsilon}_{m}}$ since, based on the discussion in the main text, the degenerate case reverts to the diagonal $n=m$ case. Reproducing equation (18), we have

$\begin{eqnarray}&&{{S}_{nm}}={\int}_{-\infty}^{\infty}\text{d}\omega \frac{|{{\left({{ \Sigma }_{xc}}(\omega )-{{U}_{xc}}\right)}_{nm}}{{|}^{2}}}{\left[{{\left(\omega -{{\epsilon}_{n}}\right)}^{2}}+{{\gamma}^{2}}\right]\left[{{\left(\omega -{{\epsilon}_{m}}\right)}^{2}}+{{\gamma}^{2}}\right]}.\end{eqnarray}$

For small γ, we have contributions from the two poles in the denominator (see equation (19)) and contributions from the poles of ${{ \Sigma }_{xc}}(\omega )$ . Together they give

$\begin{eqnarray}&&\gamma {{S}_{nm}}/\pi =\frac{|{{\left({{ \Sigma }_{xc}}\left({{\epsilon}_{n}}\right)-{{U}_{xc}}\right)}_{nm}}{{|}^{2}}+|{{\left({{ \Sigma }_{xc}}\left({{\epsilon}_{m}}\right)-{{U}_{xc}}\right)}_{nm}}{{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\epsilon}_{m}}\right)}^{2}}}\\ &&\qquad \quad \ +\underset{a=1}{\overset{p}{\sum}}\,\frac{|\sigma _{nm}^{a}{{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{2}}{{\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)}^{2}}}.\end{eqnarray}$

The QS condition of equation (20) optimizes the first term in the sum above. Plugging in the optimal choice, and then performing some algebraic rearrangements using the fact that ${{\left({{U}_{xc}}\right)}_{nm}}$ is real by time-reversal invariance, we have

$\begin{eqnarray}\begin{array}{*{35}{l}} \gamma {{S}_{nm}}/\pi =\frac{{{\left(\text{Im}{{ \Sigma }_{xc}}{{\left({{\epsilon}_{n}}\right)}_{nm}}\right)}^{2}}+{{\left(\text{Im}{{ \Sigma }_{xc}}{{\left({{\epsilon}_{m}}\right)}_{nm}}\right)}^{2}}+{{\left[\text{Re}{{ \Sigma }_{xc}}{{\left({{\epsilon}_{n}}\right)}_{nm}}-\text{Re}{{ \Sigma }_{xc}}{{\left({{\epsilon}_{m}}\right)}_{nm}}\right]}^{2}}/2}{{{\left({{\epsilon}_{n}}-{{\epsilon}_{m}}\right)}^{2}}} & {} & \\ \qquad \quad \ +\underset{a=1}{\overset{p}{\sum}}\,\frac{|\sigma _{nm}^{a}{{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{2}}{{\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)}^{2}}}. & {} & \\\end{array}\end{eqnarray}$

As discussed in the main text, the optimization over ${{U}_{xc}}$ can not help us with the imaginary parts since ${{U}_{xc}}$ is Hermitian: we are 'stuck' with whatever these values may be. In what follows, we will ignore the contributions from the imaginary part by assuming they are either zero or small. Dropping these terms and plugging in the explicit formula for the self-energy, we arrive at

$\begin{eqnarray}&&\gamma {{S}_{nm}}/\pi =\frac{1}{2}{{\left(\underset{a=1}{\overset{p}{\sum}}\,\frac{\text{Re}\,\sigma _{nm}^{a}}{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)}\right)}^{2}}\\ &&\qquad \quad \ \;+\underset{a=1}{\overset{p}{\sum}}\,\frac{|\sigma _{nm}^{a}{{|}^{2}}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{2}}{{\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)}^{2}}}.\end{eqnarray}$

Our assumptions of time-reversal invariance and zero part of the imaginary part of the self-energy mean that $\Sigma \left(r,{{r}^{\prime}},\omega \right)$ will be real valued: hence the matrix elements $\sigma _{nm}^{a}$ are also real valued. Thus we have

$\begin{eqnarray}&&\gamma {{S}_{nm}}/\pi =\frac{1}{2}{{\left(\underset{a=1}{\overset{p}{\sum}}\,\frac{\sigma _{nm}^{a}}{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)}\right)}^{2}}+\underset{a=1}{\overset{p}{\sum}}\,\frac{{{\left(\sigma _{nm}^{a}\right)}^{2}}}{{{\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)}^{2}}{{\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)}^{2}}}\equiv \frac{1}{2}{{P}_{nm}}+{{Q}_{nm}}.\end{eqnarray}$

To put bounds on this expression, we will require all contributions to be positive inside the squared term for ${{P}_{nm}}$ . Taking absolute values, we have

$\begin{eqnarray}&&{{P}_{nm}}\leqslant {{\left(\underset{a=1}{\overset{p}{\sum}}\,\frac{|\sigma _{nm}^{a}|}{|\left({{\epsilon}_{n}}-{{\xi}_{a}}\right)\left({{\epsilon}_{m}}-{{\xi}_{a}}\right)|}\right)}^{2}}.\end{eqnarray}$

At this point, we invoke the Cauchy-Schwarz inequality which says that $\left(\sigma _{nm}^{a}\right){{}^{2}}\leqslant \sigma _{nn}^{a}\sigma _{mm}^{a}$ : this follows from defining an inner product with the matrix ${{\sigma}^{a}}$ as the metric, and the positive definiteness of ${{\sigma}^{a}}$ leads to this bound. So

$\begin{eqnarray}&&{{P}_{nm}}\leqslant {{\left(\underset{a=1}{\overset{p}{\sum}}\,\frac{\sqrt{\sigma _{nn}^{a}\sigma _{mm}^{a}}}{|{{\epsilon}_{n}}-{{\xi}_{a}}|\centerdot |{{\epsilon}_{m}}-{{\xi}_{a}}|}\right)}^{2}}={{\left(\underset{a=1}{\overset{p}{\sum}}\,\sqrt{{{v}_{an}}}\sqrt{{{v}_{am}}}\right)}^{2}}\end{eqnarray}$

where we have reintroduced the vectors ${{\vec{v}}_{n}}$ from the previous sections. Viewing the last term as an inner product and applying Cauchy-Schwarz this time to vectors with components $\sqrt{{{v}_{an}}}$ , we have

$\begin{eqnarray}&&{{P}_{nm}}\leqslant \sqrt{{{P}_{n}}{{P}_{m}}}\end{eqnarray}$

where ${{P}_{n}}$ and Q_n were defined in the previous section. Similar logic shows that

$\begin{eqnarray}&&{{Q}_{nm}}\leqslant \sqrt{{{Q}_{n}}{{Q}_{m}}}.\end{eqnarray}$

Since we know how P_n and Q_n are related, we can put an upper bound on ${{S}_{nm}}$ which is

$\begin{eqnarray}&&{{S}_{nm}}\leqslant \frac{3}{2}\sqrt{{{P}_{n}}{{P}_{m}}}=| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}|\centerdot | \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{m}}\right)}_{mm}}|.\end{eqnarray}$

Again, choosing the solution for each state to have the smallest $| \Sigma _{xc}^{\prime}{{\left({{\epsilon}_{n}}\right)}_{nn}}|\,=\sqrt{{{P}_{n}}}$ will ensure ${{S}_{nm}}$ is as small as possible.

Justifying quasiparticle self-consistent schemes via gradient optimization in Baym–Kadanoff theory

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Definitions and notation

3. Baym–Kadanoff approach

4. Key small parameter γ

5. Shortest gradient of F

5.1. Diagonal terms

5.2. Off-diagonal terms

5.2.1. Off-diagonal case with degeneracy

5.3. Discussion

6. Smallest energy change $\Delta \boldsymbol{F}$

7. Application of the shortest gradient

8. Summary

Acknowledgments

Appendix A. Contribution from poles of Σxc

Appendix B. Energy change ΔF

Appendix C. Unsuitability of minimizing δ F / δ G

Appendix D. Behavior of self-energy as w → ∞

Appendix E. Choosing largest Z in the multi-pole and multi-band cases

E.1. Single-band case

E.2. Multi-band case

Footnotes

Justifying quasiparticle self-consistent schemes via gradient optimization in Baym–Kadanoff theory

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Definitions and notation

3. Baym–Kadanoff approach

4. Key small parameter γ

5. Shortest gradient of F

5.1. Diagonal terms

5.2. Off-diagonal terms

5.2.1. Off-diagonal case with degeneracy

5.3. Discussion

6. Smallest energy change \Delta \boldsymbol{F}

7. Application of the shortest gradient

8. Summary

Acknowledgments

Appendix A. Contribution from poles of Σxc

Appendix B. Energy change ΔF

Appendix C. Unsuitability of minimizing δ F / δ G

Appendix D. Behavior of self-energy as w → ∞

Appendix E. Choosing largest Z in the multi-pole and multi-band cases

E.1. Single-band case

E.2. Multi-band case

Footnotes

6. Smallest energy change $\Delta \boldsymbol{F}$