Kohn-Sham computation and the bivariate view of density functional theory

Informed by an abstraction of Kohn-Sham computation called a KS machine, a functional analytic perspective is developed on mathematical aspects of density functional theory. A natural semantics for the machine is bivariate, consisting of a sequence of potentials paired with a ground density. Although the question of when the KS machine can converge to a solution (where the potential component matches a designated target) is not resolved here, a number of related ones are. For instance: Can the machine progress toward a solution? Barring presumably exceptional circumstances, yes in an energetic sense, but using a potential-mixing scheme rather than the usual density-mixing variety. Are energetic and function space distance notions of proximity-to-solution commensurate? Yes, to a significant degree. If the potential components of a sequence of ground pairs converges to a target density, do the density components cluster on ground densities thereof? Yes, barring particle number drifting to infinity.


INTRODUCTION
Density functional theory (DFT) has developed into a ubiquitous tool in physics, chemistry, materials science, and beyond [1][2][3][4][5][6], overwhelmingly in the specific form of Kohn-Sham [7] (KS) computation.The two distinguishing features of KS computation are (i) a splitting of the intrinsic energy functional into noninteracting, Hartree, and exchange-correlation contributions, and (ii) an idiosyncratic procedure of iterating to so-called selfconsistency.Meanwhile, the functional analytic approach initiated by Lieb [8] has had little [9][10][11][12][13] to say about these things.Working in the functional analytic tradition, this paper aims both at filling that gap, and at developing a more physical interpretation of KS computation.Pursuit of these goals is synergetic, as the following sketch of major themes shows.

A. Appetizer
What is the physical interpretation of intermediate stages of a KS computation, i.e., before self-consistency is achieved?The course of the computation can be cast (sec.5 E) as a sequence of ground pairs -pairs consisting of a potential and a corresponding (interacting) ground density.This transparent framing is a promising basis for both theoretical analysis and algorithmic development.Thinking of potential and density simultaneously variable, we have moved into a bivariate perspective.The action takes place in the product of potential and density space.
For an iterative procedure to find a ground density of a given (target) potential, it first needs to make progress toward that goal from one iteration to the next.A scheme using just the usual Kohn-Sham computational resources, is described (sec.6 E), which makes progress in the sense of finding a density with lower energy in the target potential, barring exceptional circumstances (hitting a potential with a degenerate ground state or none at all, lack of exchange-correlation potential).The proposed scheme involves potential mixing, in contrast to the usual density-mixing ones.
Such progress is far short of convergence.However, as already noted, we can cast all KS computations as sequences of ground pairs.Suppose, optimistically, that we have such a sequence (v n , ρ n ) for which v n converges to the target potential.Does it follow that the densities ρ n converges to a target ground density?The pleasant answer (Sec.14) is that the sequence of densities (ρ n ) clusters with respect to L 1 metric at target ground densities, as long as it does not have nonzero particle number drifting to infinity.If the target ground density is unique, this means the sequence converges.In motto form: look after the potential and the density will take care of itself.
With the finding about a potential-mixing scheme, this supports the idea that current ways of thinking are too density-centric.Some trends in computational practice, such as the use of hybrid functionals [14] are in harmony with this thought.
It is known, since Lieb's seminal work [8], that the intrinsic energy functional F (a.k.a.Levy or Levy-Lieb functional, see section 2 C 3) is not continuous.Indeed it is unbounded above on every neighborhood with respect to natural topologies.Should the practitioner be worried about that?A surprisingly encouraging answer emerges (Secs.10 and 13).Restricted to the set of ground pairs, F is continuous (with respect to product topology, that is).Living on this subset of ground pairs within the product space, KS computation is, in a sense, insulated from the discontinuity.
Section 2 traces the reduction of quantum mechanics to density functional theory, characterizing DFT as an observable/state theory.This primitive physical framework must be the touchstone for all mathematical, in particular topological, refinement.Section 4 presents a version of unilateral functional differentiation for realvalued functions.To avoid explicitly introducing topological considerations at this stage, derivatives are defined in an unusual way, relative to a dual pair.Section 5 analyzes the basic operations of KS computation and the ways they can be combined, and abstracts these resources in the form of a Kohn-Sham machine.The bivariate view and the excess energy (∆) make their appearance here.For an exact functional, ∆(v, ρ) is the lowest energy achievable by a quantum state of density ρ in potential v, relative to the ground energy, thus quantifying the "mismatch" of v and ρ.This is a very natural function to work with in the bivariate view.Section 6 examines the possibility of guaranteed progress in the sense of reducing ∆(v ⊙ , ρ), where v ⊙ is the target potential, verifiable by the resources of a KS machine.A proposed scheme is argued to usually (vide infra for the meaning of this) be able to progress.It is a potential-mixing scheme in contrast to the usual density-mixing schemes, which we are unable to meaningfully analyze.A general discussion of semimetrics and metrics in Section 8 prepares the way to bring topology into consideration.This is essential for discussing convergence and approximation in densitypotential space.Observable-state duality is the basis for the development here.Because we take the question of what kind of metrical structure is appropriate on these spaces to be a physical question, the mathematics we are pushed into is possibly more sophisticated than might seem quite natural from a purely mathematical point of view, which would simply declare densities to be living in a particular Banach space and get on with the theorems.Sections 10 and 13 are concerned with the character of the main functionals, intrinsic energy, ground energy (Section 2 C), and excess energy (Section 5 C) on the product space V × D of potential-density pairs.One of the more notable findings is that, although the intrinsic energy F is unbounded above on every neighborhood, it's degree of discontinuity in a certain sense is less and less as we consider it in restriction to a subspace of smaller and smaller excess energy.Section 11 examines how proximity in density-potential space to a ground pair compares to small excess energy, showing that a pair of low excess energy is close to a ground pair, and slightly perturbing the potential component of a ground pair increases the excess energy only a little.Finally, section 14 shows that convergence to v of the potential components of a sequence of ground pairs guarantees that the density components accumulate on ground densities of v as long as particle number does not drift to infinity.Throughout the paper, axioms are introduced one-byone, verified on the standard interpretation, and their consequences traced.This helps to make their specific significance clearer.Interludes serve to motivate steps of the development.
The reader should not hesitate to skip proofs and demonstrations on a first reading.The short Interlude sections are intended as guideposts, indicating where the development is going and why.For readers wishing a modest-length introduction to mainstream DFT, Ref. [5] is suggested.

C. Notational notes
Parentheses are used for pairs, e.g., (v, ρ), and also for sequences, e.g., (x n ) n∈N .However, the index is usually obvious, so we can write ((v n , ρ n )), for a sequence of potential-density pairs, or even just (v n , ρ n ) since sequences of this specific type are ubiquitous here.Limit inferior is denoted lim inf or lim, correspondingly, limit superior by lim sup or lim.The abbreviation "iff" is used for "if and only if".Functions on a function space are sometimes called functionals, sometimes not.

FROM QM TO DFT
This section sketches a view of DFT as a common sort of state/observable theory.Classical as well as quantum theories can be framed this way.However, we develop the theme only to an extent which can reasonably ground the subsequent development, obtaining DFT proper by a contraction of the full quantum mechanical description of an N -particle system.A motto for this section is: density is not an observable.
The viewpoint of this section on the relation between DFT and QM is analogous to that between thermodynamics and statistical mechanics.Thermodynamics is a theory, in the sense of giving an autonomous description of certain aspects of the world and having its own proper vocabulary and concepts.It is weak in the sense that it does not have the resources to compute equations of state or free energy functions.For that, one relies on statistical mechanics.However, thermodynamics proper imposes constraints, for instance, a free energy must be convex in certain of its variables, and concave in the others.One of the aims is to formulate DFT as a theory in an analogous way.

A. general quantum mechanics
The setting for general quantum mechanics is a Hilbert space H. Observables (Obs) are represented by bounded hermitian linear operators on H: (2.1) States (Sts) are represented by normalized, positive, trace-class operators: Finally, there is a canonical pairing between observables and states given by ⟨A , Γ⟩ = Tr ΓA. (2.3)This represents the expectation value of observable A in the state Γ.The notation on the LHS may seem gratuitous, however, it represents a general idea of pairing observables and states which may have different operational formulas (RHS) in different contexts.This occurs in particular for DFT.Pointy brackets are also a common notation in functional analysis for dual pairings of vector spaces.Obs is a vector space over R. Sts is not a vector space, but it is identified as a subset of the vector space of trace-class operators.Moreover, B 1 (H) is the linear span of Sts, denoted span Sts.The pairing naturally extends from a mapping Obs×Sts → R to a mapping Obs × span Sts → R, which is bilinear.In this sense, we can say that our observables are linear.
The more specific context that interests us is a system of N identical particles in three-dimensional space R 3 , or more generally, a three-dimensional riemannian manifold M. The case of a three-torus shows that the more general situation is of genuine interest.For a single particle on M, the relevant Hilbert space is For the N -particle system, H is the symmetrized (bosons) or antisymmetrized (fermions) N -fold tensor product of H 1 .Everything we do will be valid for both cases.

B. Function spaces defined by integrability conditions
This subsection is not called on until (2.7), but placed here to minimize disruption of the flow.
For measurable functions, we use the following standard notation for 1 ≤ p < ∞: Actually, to be accurate, f above should be considered an equivalence class of functions, any two of which differ only on a set of measure zero.However, it is common to gloss over the distinction, and we will follow that custom.In addition, we define ∥f ∥ ∞ in [0, ∞] to be the largest number such that {x : |f (x)| > ∥f ∥ ∞ } has measure zero.For a bounded continuous function, this is just the maximum, but more generally we again must accomodate measure-zero exceptional sets.Now, we define the vector spaces L p (M) is L p (M), equipped with ∥ • ∥ p as a norm.At this stage of the development, we are using ∥ • ∥ p only as a selection mechanism.That is, there is no defined distance between members of L p (M). Topological considerations (norms, seminorms and so forth) are deferred to Section 8.
We need spaces a little more complicated than the pure L p (M).The intersections L p (M) ∩ L q (M). of the two spaces L p (M) and L q (M) is again a vector space, as is the sum L p (M) + L q (M), consisting of all sums of a function from each of the summand spaces.

C. DFT
Our development of DFT begins with a contraction of the general QM observables, although we shall later expand to a set which is neither subset nor superset of B(H).

Contracting QM
We put subscripts on Obs and Sts to help avoid confusion, as there will be more than one set.Start with Here, Num(U ) is the operator reporting the number of particles in the set U .(Choosing to start specifically with open sets is a somewhat arbitrary choice.)Appealing to well-known facts about QM, the map U → ⟨Num(U ) , Γ⟩ extends to a Borel measure, which is, moreover, absolutely continuous with respect to Lebesgue measure (Fubini is helpful here).This implies that there is an integrable function ρ : M → R such that for any Lebesguemeasurable set U , The measure theory just deployed is no cause for anxiety.The main point is that, while in some other contexts (e.g., classical statistical mechanical) we might want to consider Dirac measures, the underlying QM precludes that here.We give the QM-state-to-density mapping the name dens.Then, ρ in the preceding integral is dens Γ.
Where there is a measure, there is an integral.Indeed, we can write the preceding formula as an integral of the indicator function 1(A), equal to one on A, zero elsewhere: This extends to some measurable functions as in the usual way, i.e., approximating f by a linear combination of indicator functions of sets.However, which functions f are legitimate here?If we are dealing with the densities associated with general quantum mechanical states, the answer is bounded ones, denoted L ∞ (M), because otherwise we are not assured that the integral in (2.6) exists.Thus, we pass to a second stage with Relative to this class of observables, the state simply is a density, specifically, dens Γ in (2.6).The states are now non-negative integrable functions with total integral N , and the observable-state pairing is which satisfies ⟨f , ρ⟩ = ⟨Num(f ) , Γ⟩ , whenever Γ ∈ dens −1 ρ.The pairings on the LHS and RHS are not literally the same thing, but are realizations of the same abstract idea in two different settings.
For DFT, we want to modify this structure somewhat, by restricting to densities coming from states of finite kinetic energy, and considering nonlinear observables.

finite kinetic energy
In the general QM context, a single-particle wavefunction which is, for example, a nonzero constant over a cubical region and zero outside, is legitimate, but it has infinite kinetic energy.(Physically, this is pretty clear.Mathematically, we extend the expectation of the kinetic energy operator outside its ordinary domain by saying it is +∞ there.This is unambiguous because kinetic energy is bounded below.)However, we want to insist on finite kinetic energy, and this entails a state space smaller than Sts 1 .We will denote this set of densities by D; Lieb [8] calls it I N .Precisely, the additional requirement for ρ to be in D is that ∇ √ ρ be square integrable, so (2.9) Correspondingly, the space of observables can be expanded.In fact, the integral vρ dx is well-defined for every ρ ∈ D not only when v is essentially bounded, but also when |v| 3/2 is integrable.Thus, (2.10)

Nonlinear observables and intrinsic energy
Now suppose A is any bounded operator.We can certainly associate the set {Tr ΓA : dens Γ = ρ} with ρ.
In case A represents an energy, it is physically wellmotivated to associate the infimum of this set to ρ.That works even if A is only bounded below, like kinetic energy.Define, therefore, F 0 (ρ) := inf Tr T Γ : dens Γ = ρ . (2.11) This makes sense for all densities.
For some ρ, F 0 (ρ) = +∞ by this definition.Those densities for which F 0 is less than +∞ is called the effective domain, denoted dom F 0 .It is exactly D. Because dens is a linear map, it follows from (2.11) that F 0 is convex: (2.12) Because F 0 is bounded below, it does not matter whether ρ and ρ ′ are in dom F 0 is the noninteracting intrinsic energy (functional).If Ŵ is an interaction between the particles, then we can analogously define an interacting intrinsic energy (F , no subscript) with T in (2.11) replaced by T + Ŵ .Assuming Ŵ is relatively bounded with respect to T , e.g., Coulomb interaction, dom F = dom F 0 .

Constrained search and Legendre-Fenchel transform
Now, suppose v is some external one-body potential.The minimum energy of states with density ρ in presence of v is F (ρ) + vρ dx.Thus, if there is a ground state, the ground state energy is This embodies the central, appealing, idea of the constrained-search formulation [8,15].The minimum will not exist, and E(v) will not be defined, if there are no ground states.This is not an exotic possibility; it occurs for a constant potential on R 3 .That problem is easily fixed by replacing min by inf.Even so, what is the domain of E? Consider that the integral vρ dx is well-defined, for a trap potential, i.e., bounded below and satisfying v(x) → ∞ as |x| → ∞.For some densities, the integral has the value +∞, for others it is finite.We will rule these out, however, requiring the integral to be finite for every ρ.One might regard this a valid physical requirement as it stands.Another reason, discussed below, is that potentials play the role of derivatives of F .They should therefore be unambiguously integrable against differences of densities.Thus, we arrive at the conclusion that the sensible space of potentials is precisely Obs 2 (2.10), times a unit of energy.For notational simplicity (and forgetting about the energy unit), we give this space a new name: In a more abstract context, we continue to use the symbol V to represent whatever vector space plays this role.The integral in (2.13) is thus our previously introduced pairing, giving us the final form Definition 2.1 (ground energy).The ground energy of v ∈ V is As the infimum of a collection of linear functionals, the ground energy is automatically concave, i.e., −E is convex.For a concave functional, the effective domain is defined oppositely from that for a convex functional (i.e., where it is greater than −∞).Although not obvious on its face, E(v) > −∞ for every v ∈ V .So, dom E = V .Now, in case there is a ground state for v, a basic idea of calculus suggests that the minimum of the RHS of (2.15) should have a differential characterization, e.g.
for some kind of derivative D. Therefore, we turn next to the problem of differentiation of functions in the context of a dual pair of vector spaces.Although D is not a vector space, if we are interested not only in densities, but arbitrary multiples of differences of densities, the vector space generated by D, denoted Vec D comes in naturally.Then, we also need to extend the intrinsic energy to Vec D. Some elements of Vec D which are not in D are densities for which F and F 0 are already defined as +∞.Now we give that value to all the others, as well.This maintains convexity of those functionals.

INTERLUDE: INTERPRETATIONS
The previous section sketched a view of standard DFT, and introduced most of the main characters: F 0 , E 0 , F , and E. Others derived from these, such as the Hartree-exchange-correlation energy Φ, will be added later.These are all unambiguously, though not accessibly, defined in terms of the quantum mechanics of many body systems in Euclidean space.They are, in the jargon, exact.
However, approximations, or perhaps better to say, models, are inevitable in this business, and we would like to draw conclusions applicable to computations with these models which do not depend in some uncontrolled way on their being close enough to exact, in some vague sense.This motivates an axiomatic approach.We identify key properties of the standard ("exact") interpretation and proceed assuming only that our functionals, and the spaces they are defined on, have those properties.The list of assumptions (axioms/postulates) will grow in a couple of stages, so that later sections assume more.In addition to approximate exchange-correlation energy functionals, this makes room for other kinds of deviations from the standard interpretation, such as a system living on a torus rather than Euclidean space, a background confining potential, nonzero temperature, extra degrees of freedom (e.g., spin DFT, current DFT).A concrete system satisfying the axioms is referred to as an interpretation.So, for example, the exact noninteracting functional F 0 with local density approximation for exchange-correlation energy defines a interpretation.

A UNILATERAL DERIVATIVE
Starting in this section, we begin to work in a relatively abstract way.For instance, instead of the specific vector spaces Vec D and V in (2.14), we say simply that we have a pair of vector spaces V and X with a nondegenerate pairing ⟨• , •⟩ (See Def.4.1).

A. Motivation
We pursue here the idea that the essence of derivative is some sort of (local) linear approximation, what kind of approximation being open to discussion.For example, the derivative of a smooth function f : R 2 → R is packaged as a linear functional through its gradient.The graph of the affine function is tangent to the graph of f at x, and in that sense consitutes the best affine approximation to f near x.The dot product is a pairing of R 2 with itself, ⟨x , y⟩ = x • y, so we might also write this as f (x) + ⟨∇f (x) , x ′ − x⟩.Now, suppose that f is not smooth, for example, f (x) = |x|, at the origin.If our interest is in minimization, a one-sided kind of approximation can be perfectly suitable.If |n| ≤ 1, the graph of the linear functional x → ⟨x , n⟩ touches that of f at x = 0 and is nowhere above it.In that weak sense, it is a kind of linear approximation.Because there is not just a single n which works here, we see that when we relax our notion of approximation in this way, we can end up with derivatives which are set-valued.

B. Lower and upper semiderivatives
We will define derivatives for dual systems.Definition 4.1 (dual system).A dual system consists of a pair of vector spaces V and X and a map ⟨• , •⟩ : V × X → R which is linear in each variable with the other held fixed, and such that for every x ∈ X , there is v ∈ V (and for every v ∈ V there is x ∈ X ) such that ⟨v , x⟩ ̸ = 0.For a compact notation, we denote this dual system by ⟨V , X ⟩ Nondegeneracy is the new concept in this definition.Essentially it means neither space involved is "too small", since it says that x ∈ X can be unambiguously identified by the values of ⟨v , x⟩ as v ranges over V , and vice versa.Now we can define a unilateral notion of derivative relative to a dual system.Putting a bar above or below 'R' indicates augmentation by +∞ or −∞, for instance, R = R ∪ {−∞, +∞}, and lim denotes limit inferior (lim inf).Definition 4.2.The lower semiderivative of f : The lower semiderivative is denoted Df (x).
The upper semiderivative, Df (x) is defined by an analogous equation with lim replaced by lim, and ≥ by ≤.
Similarly, exchanging the roles of X and V , we obtain semiderivatives of functions on V with respect to the same pairing.
C. Remarks 1. Geometrically, u ∈ Df (x) means that the hyperplane y → f (x) + ⟨u , y⟩ in X × R ∪ ±∞ is asymptotically not above the graph of f as y → x.
3. For a convex function f : X → R, Df has a much simpler characterization, and it does not involving limits at all.v ∈ Df (x) precisely when, for all y, f (x) + ⟨v , x⟩ ≤ f (y) + ⟨v , y⟩ .
For application to DFT, this would suffice for F 0 , F , E 0 and E. However, in Kohn-Sham approach, we deal with Φ. defined by F = F 0 + Φ, and this cannot be assumed to be either convex or concave.
).This follows since the limit inferior of a sum is at least as large as the sum of the limits inferior, but the geometric description of the first item might be an easier way.Beware!This does not work with subtraction, that is,

KOHN-SHAM MACHINES
The top level of a Kohn-Sham computation involves densities and potentials alone, with no explicit reference to quantum mechanics.This Section abstracts that top level perspective as a Kohn-Sham machine, offering a limited menu of operations on potentials and densities, and provided by modules which are regarded as black boxes.The following Section then analyzes the question, given an external potential v ⊙ , how can those operations be harnessed to make progress toward finding an interacting ground density for v ⊙ ?This will be given an abstract phrasing, and we will have to find an appropriate sense of progress to deal with it.

A. Postulates
We abstract the situation described in the preceding section in the form of the following assumptions.
pairing of a second real vector space V with Vec D.
These are just the beginning.Additional axioms refining the set-up will be added in Sections 10 and 13 as their desirability becomes clear.In all cases, they reflect properties of exact functionals in the standard interpretation.Postulates A1, A2, and A3 are descriptive.In this section and the next, we assume that there is a second function F 0 satisfying A2.Moreover, we will have computational/procedural assumptions on F 0 and Φ := F − F 0 as specified in section 5 D. Those will have no direct relevance for the development following section 5 D. The functions F 0 , F , and Φ are extended to all of Vec D by setting them equal to +∞ off of D. This is a matter of convention, designed to maintain convexity of F and F 0 and the equality while creating no barriers to lower semi-differentiability.

B. Standard interpretation
The standard interpretation is that D is the set of densities of finite intrinsic energy introduced in Section 2 C, i.e., D = dom is the noninteracting (and F the interacting) intrinsic energy.That A1 -A3 are satisfied in the standard interpretation was already established in Section 2 C. Ground energy E is defined from and F by (2.15).In these abstract terms, the Standard Problem of finding a ground density for potential v ⊙ can be phrased as: find ρ ∈ DE(v ⊙ ).We will find a different formulation more useful and enlightening.
However, whereas what follows Section 6 has some interest in the case of the standard interpretation, where everything is exact, that is hardly true in this Section, and the next.These two Sections are so closely tied to the mechanics and possibilities of Kohn-Sham computation that the realistic attitude with which to read them is that F 0 is exact, while Φ is a model Hartree-exchange-correlation energy, constrained only by the requirement that F satisfy A2.The term Hartreeexchange-correlation energy indicates that Φ is considered physically (interpreted) to consist of three parts: the classical Coulomb interaction energy of the given charge density (Hartree), and two quantum effects, exchange and correlation.Often Φ is split explicitly into the Hartree energy, which is simple, unambiguous and explicit, and the exchange-correlation (XC) energy, the part which is really approximated.For our purposes, however, it makes more sense to take it as a unit.
Intrinsic energy F is a function of density, ground energy E of potential.Underlying our approach is the idea that it is fruitful to think in terms of both density and potential simultaneously.This means that we mostly think of things as functions on the product space V × D, and package F and E together into the excess energy (5.2) ∆(v, ρ) answers the question, "how close to the ground energy E(v) can one get with states of density ρ?" and is convex in each variable, holding the other fixed.The zero set ) contains all possible solutions of all possible ground density problems.If (v, ρ) is in Z , we call it a ground pair.
Noninteracting versions, E 0 , ∆ 0 , and Z 0 are defined from F 0 in the same way as E, ∆ and Z from F .In distinguishing between the two, we prefer the more neutral designations reference/perturbed to noninteracting/interacting.Fig. 1 depicts, in a cartoon way, the zero sets in the product space V × D.

D. Primitive operations and feasibility
Some of the functions of the theory listed above, e.g., F , are not provided in modular form by ordinary DFT software.This is the reason why it is an interesting to ask about strategies to solve the basic problem.The menu of primitive operations consists of: solution of the noninteracting problem, computation of HXC energy and potential, and calculation of the integral v(x)ρ(x) dx.In our more neutral language, they are given in Table I.The primitive operations, as well as anything achievable by a finite combination of them, is feasible.We will mostly be engaged in demonstrating feasibility by exhibiting appropriate such combinations.Section 6 D makes a soft claim of infeasibility, but it must be recognized that such claims are significantly trickier, and potentially subject to criticism on the grounds that our list of primitive operations is incomplete.Certainly, nothing here should be construed as making claims about what completely different methods, such as quantum Monte Carlo, can do.
1. Schematic representation of the bivariate perspective in the product space V × D. The zero excess energy sets Z0 and Z are indicated, along with some of the functions listed in Table II.Of course, the picture is unfaithful in some aspects: V and D are generally infinite-dimensional, Z and Z0 are not likely to be smooth, or even (single-valued) functions.

E. Generating ground pairs
From the primitive operations (Table I), we will now synthesize some new feasible operations which allow generation of reference and perturbed ground pairs, and which may be useful in solving the Standard Problem.They are listed in Table II and some are illustrated in a schematic way on Fig. 1.
Let us consider these operations.Z 0 is a trivial rephrasing of DE 0 ; it merely pairs a potential with a corresponding reference system ground density.It is not a map into densities D, but into the subset Z 0 of the product space V × D. KS puts DΦ to work, and is more interesting.
, by remark 3 of Section 4 C.An analogous statement holds for F and Z .Since The reverse implication is not valid.Summing up: given v ∈ V , Z 0 v is a reference ground pair, and To see the point of this, recall that our Standard Problem is to find a point on Z with specified first component v ⊙ , so we are naturally interested in how close v is to v ⊙ .The map R v ⊙ supplies that information.Usually, we will suppress v ⊙ for notational simplicity.These functions are all partial, which is why the Table contains '⇀' rather than '→' in the type column.Certainly, some potentials have no ground density.For example, the uniformly zero potential in R 3 .Given that partiality, there is no benefit for us to assuming that [DΦ] is total.Computationally, our assumption is that an exception, rather than garbage, is returned in case there is no value.
The perspective revealed here is different from the usual one.Ground pairs are the only points in V × D which are usefully accessible.Reference ground pairs can be feasibly selected by their first component, but perturbed ground pairs only in a distorted kind of way.The common talk of "self-consistency" seems inappropriate from this perspective.Points on Z generated by using the basic operations are certainly not inconsistent in any sense.Their only possible defect is not being one that we want.The question then, is how to use the expanded stock of basic operations in Table II to find a suitable pair, that is, one solving the Standard Problem.The next section takes up the question of how to make progress toward that goal.First, we discuss the last row of the table.

F. HK maps
The Table II and therefore The superscript HK, standing for 'Hohenberg-Kohn', is there because this is much closer to the original [1] intrinsic energy ("universal functional") definition of Hohenberg and Kohn than the later constrained-search formulation [15,16].The point is that auxiliary data consisting of a potential partner in the reference system is needed to obtain F (ρ). Since ( v, ρ) is a perturbed ground pair, once we have F (ρ), the interacting ground energy of v is obtainable as

G. Reduced KS-machine
Generally, the term Kohn-Sham machine refers to any collection of feasible operations, such as those in Table II.It is easier to focus on the essentials, though, if we consider a reduced KS-machine offering the single operation This is straightforwardly constructed from those in Table II.E( v) and F (ρ) come from the HK maps.One use of the reduced KS-machine gives us a perturbed ground pair in Z , and its essential characteristics.The only problem is that it is unclear how to control either its potential or its density component.

VERIFIABLE PROGRESS
Essentially, the only feasible access to Z is via Z 0 .The picture of the previous section suggests the following approach to the Standard Problem.Pick a potential v (somehow), obtain Z 0 v, compare its first component to v ⊙ , if the difference R v is not satisfactorily small, choose a new input to Z 0 based on the experience.Repeat until satisfied.This section is concerned with how to make that choice of next input so that some form of progress is assured.

A. Progress
Suppose we generate a sequence of points (v n , ρ n ) on Z .How would we ascertain that we were making progress toward the solution to the Standard Problem?One interpretation would be that v n and ρ n are converging to the target potential and density.However, the latter is unknown.We could ask if v ⊙ − v n is becoming small, but that requires a quantitative measure of the "size" of a potential difference.We defer such topological considerations to the following sections, in order to see what can be done without them.
Fortunately, the basic feasible operations in hand already provide the means to assess whether one density is energetically better than another, provided we have them in the form of components of points on Z 0 or Z .The energetic measure of how close ρ is to a ground density If this is less than zero, ρ is a "better" density than ρ ′ , indicating that going from ρ ′ to ρ is progress of a sort.The important point is that Evidently, this is feasible.It is the measure of progress we will use in this section.

B. Conventional fixed-point formulation
Given v 0 as input, the KS-machine produces (barring exceptions) a reference ground pair (v 0 , ρ 0 ) = Z 0 v 0 and a perturbed ground pair ( v 0 , ρ 0 ) = Z 0 v 0 as output.For purposes of comparing with the usual formulation of KS iteration, it may be helpful to refer to ρ 0 and v 0 as the output density and output potential, respectively.Now, in that situation, a simple idea for the next input is 3) The pattern can be continued to entire sequence Unpacking definitions shows that this is equivalent to and (6.3) is thereby revealed to be the usual naive iteration step.This is labelled "naive" because it is a well-known empirical fact that this scheme is subject to problems which can be ameliorated by mixing."Chargesloshing", for instance, is a situation where, from iteration to iteration, the density gets stuck alternating between two fairly well-defined but distinct densities.In the bivariate perspective being built here, mixing in general would be expressed as the idea that (6.3) is a good "direction" in which to shift the input potential, but that maybe a more cautious step is advisable: Conventionally, the same rough idea is implemented differently.An auxiliary ingredient, an input density is introduced to parametrize the input potential, as and mixing is done on the auxiliary quantity: This kind of parameterization gives rise to the apparently common view that Kohn-Sham theory intrinsically involves a fixed-point problem, i.e., of the map ρ in n → ρ n .From the bivariate perspective, that is entirely incidental.It is unclear what advantages it may have over working directly with potentials as in (6.5).Most importantly for the present work, I am unable to prove anything about such schemes, whereas favorable results will be obtained for something like (6.5).

C. Utilities
We collect some useful identities, proven by straighforward manipulation, which will be used in Sections 6 D and 6 E. Items 1 -3 hold for either the reference system (in which case subscripts 0 should be attached) or the perturbed system.They are entirely elementary and depend only on convexity properties of F and E. Recall the definition of excess energy: Each of v, v ′ , ρ, and ρ ′ appears once in a ∆ on either side.Upon substituting the definition of ∆, all F 's and E's cancel out, leaving only potential-density pairings.
Expand ∆(v ′ , ρ) − ∆(v, ρ) using the definition of ∆. 4. Assuming ρ ∈ dom DΦ, and with D ρ denoting the semiderivative with respect to ρ at fixed v, According to the definition of excess energy, , the conclusion follows.
D. An infeasible strategy Given v 0 , define v 1 as in (6.3), i.e., v 1 = v 0 + R v 0 .Corresponding densities are defined by the conditions (v 0 , ρ 0 ), (v 1 , ρ 1 ) ∈ Z 0 . (6.12) Now, we consider two ideas for interpolation.The first is defined by a linear interpolation in density.Assuming ρ 1 ̸ = ρ 0 , it makes sense to define for 0 ≤ λ.A form of the following Proposition appears to have been first given by Wagner et al. [20], later corrected and rigorized by Laestadius et al. [10].At first sight, it appears quite consequential.The difficulty in applying it is discussed after the proof.
whenever the derivative exists.
Unfortunately, there is a serious problem with this as a basis of a strategy.To be able to use it in a non-blind way, we must be able to test the value of ∆(v ⊙ , ρ λ )−∆(v ⊙ , ρ 0 ).As previously discussed, the only evident feasible way to do that is to obtain ρ λ as the second component of a point on Z 0 , which means we need to know a potential having ρ λ as a ground density.

E. A feasible strategy
A second attempt to find a method of feasibly making progress involves linear interpolation of the potential according to: Corresponding densities ρ λ are defined implicitly via Caution: we are recycling notation here!Although ρ λ interpolates between ρ 0 and ρ 1 , this interpolation is generally nonlinear, unlike in (6.13).
This is bounded above by either of the following: ) With the preceding notation, assuming Recall that ∆ 0 and ∆ are everywhere non-negative.The remarkable, and encouraging, aspect of the inequality (6.21) is the extra factor λ −1 in the negative term; more about this in the next subsection.
This relies on ( v λ , ρ λ ) ∈ Z .In the resulting expression, each of E( v λ ), E(v ⊙ ), and E( v 0 ) occurs once with a plus and once with a minus sign, cancelling to leave . By (6.17), this is Drop the negative first term here to obtain the upper bound (6.20a).
The second form (6.20b) of the upper bound follows upon the substitution Returning to the previous display, use the crossdifference identity (6.8) once for the reference system and once for the perturbed system to rewrite that display as Equating to the LHS of (6.22) yields (6.19).

F. Analyticity
The question now is, under what circumstances is the RHS of the inequality (6.21) negative for some range of λ?If both ∆( v 0 , ρ λ ) and ∆ 0 (v 0 , ρ λ ) are O(λ 2 ), that would be more than enough.Recall that v λ = (1 − λ)v 0 + λv 1 and ρ λ is a noninteracting ground density for v λ .Both ∆( v 0 , ρ λ ) and ∆ 0 (v 0 , ρ λ ) certainly have a minimum (zero) at λ = 0.If ρ λ varies at all smoothly, we would expect both these excess energies to be quadratic in λ near the minimum, exactly as needed.
Supposing F is an exact functional, so that ∆ comes from a well-defined quantum mechanical problem, the following can be proved [21]: If the noninteracting problem for v 0 , and the interacting problem for v 0 , both have a nondegenerate ground state with nonzero spectral gap, then both these excess energies are not just O(λ 2 ), but analytic at λ = 0. On the other hand, if the nondegeneracy and gap conditions are not satisfied, we should not be at all surprised if the excess energies behave in a way which dashes our hopes.The strategy of section 6 E is therefore conditionally vindicated.

INTERLUDE: TOWARD TOPOLOGY
The rest of this paper develops a functional analytic picture which is not directly dependent on our analysis of Kohn-Sham machines, but very much influenced by it.Questions asked, and hypotheses imposed are chosen to be relevant.For instance, in asking about limits of a sequence ((v n , ρ n )) of ground pairs, we will decide it is reasonable to ask that the F (ρ n ) be bounded on the grounds that the KS machine provides this information when it generates a ground pair.
We saw that the course of a Kohn-Sham computation can be distilled a sequence (v n , ρ n ) of ground pairs.Moreover, barring exceptions, the computation can be done such that ∆(v ⊙ , ρ n+1 ) < ∆(v ⊙ , ρ n ).This we called "progress", but maybe we should call it ∆-progress as there may be other sorts.Since we do not know any ground densities of v ⊙ (else we would not be doing the computation), deciding whether ρ n+1 is closer to such than is ρ n certainly cannot be done directly, at least.Surely, though, we could see whether v n+1 is closer to v ⊙ than v n ?Only if we know what "closer" means.If we had a metric d ′ on V , that would provide one answer, and we could speak of "d ′ -progress".
This brings us to the issue of topologies on our function spaces, which we have so far deliberately avoided.The next section contains a review of relevant ideas, tailored to our needs.For us, equipping V and D with topologies is not merely a matter of mathematical convenience, but has physical significance, and will be done based on the considerations of section 2 C.After all, how do we distinguish one state (i.e., density) from another?By finding an observable which takes differing values for them.Thus arises the most physically-grounded notion of neighborhood of a density.

TOPOLOGICAL NOTIONS
This section review some important topological concepts and relates them to the physical state-observable duality.Because of the latter, readers already comfortable with all the mathematics maybe should skim it.By topology, I refer to the classical idea of defining neighborhoods of points in a point set, closely related to approximation.Actually, we do not deal with general topologies, but metrics and semimetrics.One may wonder whether even that is excessive.For that reason, it bears emphasizing at the outset that we will do this in order to ground the mathematics physically.Following the development in section 2, the fundamental means at our disposal to distinguish densities and define neighborhoods is via the observables.There are infinitely many of these, and they naturally give a system of seminorms.If we choose to work with a norm, for convenience, it is desirable that it have some justification tracing back to the observables.If X is a vector space, metrics which are compatible with the linear structure are of most interest.This means d(x + z, y + z) = d(x, y) (translation invariance) and, for c ∈ R, d(cx, cy) = |c|d(x, y) (homogeneity).A corresponding norm can then be defined as the distance ∥x∥ = d(0, x) from the origin.Such a metric is recovered from the corresponding norm as d(x, y) = ∥x − y∥.

A. Metrics, norms, semimetrics, seminorms
The last two listed defining conditions for a metric pertain to its role in distinguishing points.The third condition shows how it does that, and the fourth, separation, says that the metric can distinguish any distinct points.Dropping the separation condition yields the definition of a semimetric.A single semimetric may fail to separate points, but a collection {d i : i ∈ I} of semimetrics can collectively separate, even if none does so individually.That is, for each x ̸ = y, there is some i ∈ I such that d i (x, y) > 0. A sequence (x n ) converges to x with respect to the system of seminorms {d i : i ∈ I} if and only if d i (x n , x) → 0 for each i.Extending our comparison (≾) to systems of semimetrics has a slight subtlety.One way to proceed is to use open balls again.The "size" of the open ball B(J, r; x) = {y : d i (y, x) < r, ∀i ∈ J} is parameterized by not only a radius, but also a selection (J ⊂ I) of a finite number of seminorms.Then, {d i : i ∈ I} ≾ d ′ j : j ∈ I ′ if and only if for any d size (I, r), there is a d ′ size (I ′ , r ′ ) such that B ′ (I ′ , r ′ ; x) ⊆ B(I, r; x) for every x.As concerns convergence, a collectively separating finite system {d 1 , . . ., d n } can be replaced by the single metric d * (x, y) = max(d 1 (x, y), . . ., d n (x, y)).Hence, only infinite systems of semimetrics are really of interest.
Just as for the passage from metric to norm, to make a seminorm respect the linear structure of a vector space, one imposes translation invariance and homogeneity.Such a compatible semimetric is a seminorm.(Terminological note: seminorm is standard.Accepting that, semimetric seems natural.However, what we are calling semimetric is called pseudometric by some.)

B. Seminorms and dual pairs
Seminorms have been lurking all along in our pairing maps.Suppose X and V form a dual system (Def.4.1).Each v ∈ V defines a proper seminorm p v on X , defined by Similarly, each x ∈ X defines a seminorm on V .No single p v separates, but the entire system of seminorms separates collectively.For X = Vec D and V our spaces of states and observables, respectively, this is something we should insist on.If two states cannot be distinguished by any observable, on what ground would we say they are distinct?Relatedly, if we admit the physical meaningfulness of a set of observables, it is very unclear on what grounds we could reject the physical meaningfulness of the corresponding seminorms and the topology which they generate.Since systems of seminorms arising this way are of great importance to us, we introduce a notation.
Definition 8.1.For a dual pair ⟨V , X ⟩, the system {p v : v ∈ V } of seminorms defined in (8.1) is denoted σ(X , V ).Swapping the roles of X and V gives the system σ(V , X ) V .If D is a subspace of X , we write σ(D, V ) for the system of semimetrics induced from σ(X , V ).

C. Norm compatibility with a dual system
Once X has a topology, we have a new criterion with which to distinguish linear functionals, namely, those which are continuous.It turns out that the linear functionals on X continuous with respect to σ(X , V ) are those (and only those) of the form x → ⟨v , x⟩ for v ∈ V .Physically, this makes sense: the linear observables ought to be exactly the continuous linear functionals on states, or something has been chosen incorrectly.
It is not as easy to work with a seminorm system such as σ(X , V ) as with a simple norm, at either the level of general results or that of specific spaces.This motivates us to equip X with a norm, but it also raises the question of potential grounds for considering a norm to be "physical".I propose a principle based on the observation of the previous paragraph.A topology τ on X , defined by seminorms, is said to be compatible with the duality ⟨V , X ⟩ if the set of linear functionals on X which are continuous with respect to τ are exactly those of the form x → ⟨v , x⟩ for v ∈ V .Then, the principle is that, to the extent that the choice V of observables is physical, topologies compatible with the duality ⟨V , X ⟩ are the "more physical" ones.
This matter of topologies compatible with a given duality is a standard chapter of the theory of locally convex spaces.(Often literally, e.g., Chapter III of Horváth's book [22].)We list some relevant facts.Not only do all topologies compatible with a given duality have the same continuous linear functionals, but also (i) the same lower semicontinuous convex functions into R, (ii) the same closed convex subsets of X , (iii) the same bounded sets.
An important observation is that, if there is a norm on X compatible with the duality, it is essentially unique, and defined by the weakest seminorm dominating all the p v for v ∈ V .Fortunately, we have such a case.V = L ∞ (M) + L 3/2 (M) continues to be the dual space of Vec D when the system σ(V , Vec D) is stengthened to the norm Vec D is not a Banach space under this norm.Its completion (see section 8 E) is the Banach space With the canonical norm V becomes a Banach space.A norm which is equivalent to this canonical one, and possibly more convenient, or at least more explicit, is (8.4)However, we will not actually make any use of this concrete form.

D. Variations on continuity
We collect some variations on the concept of continuity for metric spaces.Recall that a function f : X → Y between metric spaces is continuous at x ∈ X iff, given δ > 0, there is an ϵ > 0 such that f carries the ball of radius ϵ centered at x into the ball of radius δ centered at f (x).
In section 10, we shall use a slightly stronger form of continuity, as follows.
Definition 8.2 (locally Lipschitz continuous).For a metric space X , a function f : X → R is locally Lipschitz continuous (for short, locally L-continuous) iff for each point x ∈ X , there is a neighborhood U ∋ x and K > 0 such that Example: the function x → |x| is locally L-continuous on R \ {0}, but only continuous at zero.
Just as for the unilateral forms of derivative introduced in Def.4.2, a unilateral form of continuity is relevant in optimization situations.Definition 8.3 (lower/upper semicontinuity).A function f : X → R on a topological space X is lower semicontinuous (lsc) at x ∈ X when, for any level c < f (x), there is a neighborhood U of x such that Lower semicontinuous without qualifier means lsc everywhere.f is upper semicontinuous (usc) if −f is lsc.
For a convergent sequence x n → x, lower semicontinuity of f implies that lim inf n→∞ f (x n ) ≥ f (x).The value of f at the limit point x might be "smaller than anticipated", but not "larger than anticipated".A realvalued function is continuous at a point iff it is both lsc and usc there.The concept of lower semicontinuity is very important for us because F is lsc, but not usc (see section 9 C).
If S is a set of lsc functions, then their pointwise supremum, f (x) : = sup {g(x) : g ∈ S} is also lsc.In particular, if S consists of continuous functions, then the supremum is lsc, though there is no reason, in general to suppose it continuous.For an pertinent example, consider E. It is the pointwise infimum of affine functionals v → F (ρ) + ⟨v , ρ⟩, hence is usc if those are continuous, which they will be if V is equipped with a system of seminorms at least as strong as σ(V , D).
Finally, we introduce a weakening of continuity which will be useful because it allows us to bound how discontinuous F can be in certain circumstances.
for any ϵ ′ > 0, there is δ such that 2. ϵ-almost continuous on X precisely if: f is ϵ-almost continuous at x for every x ∈ X .
3. g-almost continuous, where g : Due to its importance in the investigation, we conclude this section with a brief review of the concept of completeness for metric spaces.Roughly, a metric space is complete if a sequence actually has a limit whenever it "appears to be converging" in the following sense.
The diameter of a set A, diam A is sup {d(x, y) : x, y ∈ A}.Definition 8.6 (Complete).A metric space X is complete if every Cauchy sequence in X has a limit in X .
For a familiar example of a metric space which is not complete, consider the rational numbers Q with the ordinary distance function.If x n is √ 2 to n decimal places, then (x n ) is Cauchy, but does not converge to anything since √ 2 is not in Q.A Banach space is a complete normed space.There is a canonical, abstract, way to complete any normed space V .The completion is a Banach space and V is dense in it.Then, any Cauchy sequence has a limit in the completion.This seems very convenient, but is not always appropriate.Later we will be interested in the metric d 1 on the space of densities D which derives from the L 1 norm.We will not use a completion because we will want to know that limits are in D itself.
Our partial order on metrics, ≾, behaves well with respect to completeness.Namely, if (X, d) is complete and d ≾ d ′ , then (X, d ′ ) is also complete.

INTERLUDE: GENERAL STRATEGY
Suppose we have found a well-motivated metric on V × D, and return to the sequence ((v n , ρ n )) of ground pairs.Questions which naturally arise are: Does it converge if it is Cauchy (see section 8 E for this notion)?If it does, is the limit a ground pair?The following sections put together a topological perspective on DFT.We consider regularity properties of E and F , the relation between the energetic version of nearly a ground pair (small excess energy) and distance to Z 0 or Z , and convergence of sequences of ground pairs.

A. Room for error
Our analysis of Kohn-Sham machines assumed that they produce points exactly on Z , i.e., with zero excess energy ∆.Assuming that only ∆(v n , ρ n ) < ϵ is an idealized model of a certain kind of error.In the following sections, therefore, we will be interested not only in Z , but also sets of bounded ∆ in V × D, in order to understand how the conclusions are robust against such error.

B. the axiomatic approach
Additional axioms will be added to A1 -A3 already announced.They will be motivated by what we can deduce about the exact quantum mechanical situation, but are expressed at the DFT level.This style of working allows us to keep track of exactly what we have used from the underlying QM (not a lot), and gives room for the results to apply to model functionals.
There will only ever be a single F involved.However, it need not be an exact functional, clearly traceable to a quantum mechanical Hamiltonian.Any F which satisfies the axioms will do, so it could be F 0 , an exact F , or F 0 +Φ for a model HXC energy.The axioms reflect properties of exact functionals, but are not particularly constraining.
Final results are funnelled through the axioms, so to speak.There is work to be done both in proving that the axioms are satisfied in standard interpretation, and in getting from them to claims formally stated as theorems.This is not always most efficient approach.Two later axioms will supercede earlier ones.Choice of axioms aims for mathematical simplicity, physical transparency, and generality (hence flexibility in application).

C. F is very far from continuous
In the physics literature, it is often implicitly assumed that intrinsic energy F is well-behaved, continuous at least, and possibly smooth.This is not only unjustified, but incorrect.With respect to the norm ∥ • ∥ already mentioned, and dealt with in the next section, the exact functional F is lsc, but not usc.In fact, F 0 already has this problem.To see this, consider a density ρ, and select a region U and ϵ > 0. By adding oscillations of bounded amplitude but increasingly small wavelength to ρ in the region U , we can produce a sequence of densities ρ n such that ∥ρ n − ρ∥ < ϵ, but F 0 (ρ n ) > n. (See Section II of [23] for further discussion.)Hence, F 0 is unbounded above on every neighborhood.The excess energy ∆ inevitably inherits this problem.This is worth emphasizing because some of what follows, though by no means all, would be somewhat trivial if F were continuous.In addition, we also have E and ∆ to worry about.

STRUCTURE AND REGULARITY I A. New postulates
In addition to A1 -A3 from section 5 A, we now assume B1. dom E ⊇ V .B2. V is the topological dual of (Vec D, ∥ • ∥) with respect to the pairing ⟨ , ⟩.
Recall that 'dom ' indicates the set on which a function takes a proper (noninfinite) value.All the potentials under consideration are in V , so B1 might reasonably have been written with "=" in place of "⊇".This version makes the point that it would not be a problem if E were well-defined and finite for something outside V .With B1, all our functions F , E, and ∆ take proper values over all of V × D. Axiom B3 is perhaps somewhat unsatisfactory insofar as it is not immediately clear what property of F as given implies that such an extension is possible, and one would prefer not to have to think outside D, or Vec D. This will be addressed in Section 13 A. For now we work with this fairly standard form.
Together with the pairing ⟨• , •⟩, the norm ∥•∥ on Vec D induces a canonical norm (8.3) ∥ • ∥ ′ on V , under which it is a Banach space.The corresponding metrics are denoted by d and d ′ , respectively.

B. Standard interpretation
The interpretation is [See Eqs.(8.2) and (8.4)] The pairing was already defined on a bigger set than V × D, so the extension described is not really necessary, but it is worth noting that the extension recovers the original pairing on the bigger set.

C. Structure theorem
In this section, we equip V × D with the metric Until further notice, convergence will be considered with respect to d ′ , d and d ′ + d in V , D, and V × D, respectively.
A subset of V × D over which ∆ is bounded (i.e., a subset of {∆ ≤ M } for some M < ∞) is called a ∆-bounded set.Later we will be interested in F -bounded sets, which are defined similarly.In that case, however, there is an ambiguity: {F ≤ M } could be a subset of D, or a subset of V × D with unrestricted V coordinate.Context will make clear which is intended.
F is lsc by assumption (B3), while E is usc by construction (see section 8 D).∆ is then the sum of lower semicontinuous functions of density, F (ρ), of potential, −E(v), and a separately continuous function (v, ρ) → ⟨v , ρ⟩. ∆ is therefore separately lsc.Just as for continuity, joint lower semicontinuity (i.e., as a function on V × D) is not in general a consequence of separate lower semicontinuity.Much of the force of the following Proposition 10.1 is in showing that the situation is actually better than just observed.The improvement is clear as regards E (conclusion 2).Conclusion 1, although stated in a somewhat raw form, implies that ∆ is lsc on V ×D, as is thoroughly explained in Section 13 A. Conclusions 3 and 4 show that F is better behaved in restriction to subsets of small excess energy.Beware of misinterpretation.Conclusion 3 does not mean that F is continuous with respect to d on the set of v-representable densities.Rather, we can rephrase it as: F (ρ ′ ) is close to F (ρ) if ρ ′ is close to ρ and a realizing potential for ρ ′ is close to one for ρ.The relevance of considering Z is that Kohn-Sham computation delivers points on Z , or, in a less-idealized version, on {∆ ≤ ϵ}.
In more concrete terms directly related to Kohn-Sham computation, Prop.10.1 has the following immediate consequence.Suppose Proof of B1.The problem is to show that, for fixed v, F (ρ) + ⟨v , ρ⟩ is bounded below with respect to ρ.The crucial facts are (i) ∥ρ∥ 3 ≤ a + bF (ρ), where b > 0, and (ii) v can be split as v ′ + v ′′ , where v ′ ∈ L 1 and ∥v ′′ ∥ 3/2 is as small as desired.The second item is Lemma 10.2 below.Using normalization of ρ (N particles), Now, it is only necessary to choose v ′′ so that 1 − ∥v ′′ ∥b > 0.
Actually, the first "crucial fact" here is the reason [8] for choosing the L 3 norm.
Proof of B2.Vec D is dense in the Banach space L 1 ∩ L 3 , the topological dual of which is the Banach space A. E is locally L-continuous.
Proof.The Cauchy sequence (v n , ρ n ) ⊂ {∆ ≤ ϵ} has a limit (v, ρ) ∈ V × X since V and X are complete under d ′ and d, respectively.Now, ∆(v, ρ) = F (ρ) + ⟨v , ρ⟩ − E(v), and we need to show that ∆(v, ρ) ≤ lim inf ∆(v n , ρ n ).This follows because F is lsc (B3), while E and ⟨ , ⟩ are continuous, as shown in items A and B. D. F is locally L-continuous on Z .
Proof.Given the preceding, the proof of is simple.F (ρ) = E(v)+⟨v , ρ⟩+∆(v, ρ).The last term on the RHS is identically zero on Z , while the first and second are locally L-continuous by items A and B, respectively.
Proof.The first two terms on the RHS of F (ρ) = E(v) + ⟨v , ρ⟩ + ∆(v, ρ) are continuous functions, by preceding results, while the final term is in the interval [0, ϵ].

NEARLY-A-GROUND-PAIR VERSUS NEAR-A-GROUND-PAIR
If ∆(v, ρ) is small, then (v, ρ) is nearly a ground pair in an obvious sense.But, does that imply that there is a genuine ground pair nearby in V × D? The latter occurence, (d + d ′ )((v, ρ), Z ) is small, is naturally described as (v, ρ) is near a ground pair.This section is concerned with the extent to which these two concepts are commensurate.
All the axioms announced so far, A1 -A3 and B1 -B3, are assumed here.
A. Nearly-a-ground-pair implies near-a-ground-pair The low (excess) energy part of the product space V × D is metrically close to the ground pairs Z .Proposition 11.1.If (v, ρ) is nearly a ground pair, i.e., ∆(v, ρ) is small, then it is near some ground pair: Proof.See Thm.I.  [27,28].
B. Perturbing the potential of a ground pair does not increase excess energy much Nothing quite so straightforward or satisfactory is possible in the opposite direction.Here is why.We have Now, Lemma 11.5 below shows that the second bracketed term can be bounded as For the other bracketed term, The difference between F (ρ) and F (ρ ′ ) is uncontrollable, even as ρ ′ → ρ, because F is unbounded above on every neighborhood.The best we can hope for is that if (v, ρ) is a ground pair and v ′ is near v, then ∆(v ′ , ρ) is small.Lemmas 11.5 and 11.6 give the best forms of this claim.We proceed to examine the situation in detail.
In crudest terms, the next few lemmas are concerned with comparing functionals on V × D and showing when one of them is large somewhere, then another one is also.There are a lot of undetermined constants (a, b, c, etc.) in the statements, and they cannot be assumed to have the same value from one occurence to the next, except within a proof, as indicated by context.In the course of the demonstration that B1 is satisfied by the standard interpretation (Section 10 D), Lemma 11.2 and part of Lemma 11.3 were effectively already shown to hold in that interpretation.Now we will see that, conversely, they are implied by the very simple B1, with help from the other axioms.Proof.We first prove that the bound holds for v individually, and extend to a neighborhood afterward.For any M > 0, and either choice in ±, we have the inequality F (ρ) ± M ⟨v , ρ⟩ ≥ E(±M v).Together, they imply . Appealing to finiteness of E(±M v) (axiom B1) and lower-boundedness of F (A2), this gives |⟨v, ρ⟩| ≤ c + 1 M F (ρ) for some c.Since M may be taken as large as desired, this suffices.Now, improve this to uniformity over U .Let ϵ > 0 be given.By Lemma 11.2 the preceding paragraph, Here, ϵ ′ can be chosen as small as desired at the potential cost of large c.Choose ϵ ′ and r so that ϵ ′ + br < ϵ.This ensures that whenever ∥v ′ − v∥ ′ ≤ r, |⟨v ′ , ρ⟩| ≤ (c + ra) + ϵF (ρ).
With the aid of the preceding lemmas, we turn to examining Lipschitz constants for v → ∆(v, ρ).Here is a loose paraphrase.For v varying over U , ∆(v, ρ) is either uniformly large, or does not vary much, depending on ρ.In particular, the maps v → ∆(v, ρ) for ρ's which are ground densities of some potential in U all have a common Lipschitz constant over U .
Proof.Take U to satisfy Lemma 11.5 and Lemma 11.4 with ϵ = 1 (for instance).Lemma 11.5 implies that Lemma 11.4.Finally, the preceding technical lemmas can be applied to obtain something more digestible.Recall that Prop.10.1.1 implies that if (v n , ρ n ) is a sequence of ground pairs such that v n → v and ρ n → ρ, then (v, ρ) is a ground pair.The next proposition shows that, if the assumption that (ρ n ) converges is dropped, we can still assert that the ρ n are asymptotically nearly ground densities of v in the sense of having small excess energy.

Now apply
(11.5) Proof.If we restrict our attention to some tail of the sequence (n ≥ N ), all the v n 's are in a neighborhood U of v satisfying Lemma 11.6.Then, for n < N can be accomodated in the same kind of bound at the possible expense of increasing a + bϵ to some c.

INTERLUDE: IN PURSUIT OF COMPACTNESS
Prop. 11.7 demonstrates that, when (v n , ρ n ) is a sequence of ground pairs with v → v ⊙ , the situation is good with respect to excess energy.The conclusion ∆(v ⊙ , ρ n ) → 0 is similar to energetic progress from section 6.If we have a weaker metric than d, it will be easier for the sequence of densities to converge, but the limit might not be a ground density of v ⊙ in that case.We turn our attention to finding a metric which is usefully weaker, but which is strong enough that lim ρ n will be a ground density of v ⊙ .We would be assured that the sequence at least had cluster points, if we could guarantee that it was confined to a compact, or totally bounded set.That remark calls for a review of the important topological notion of compactness, in a form suitable for our purposes.Although no overt appeal to this concept is made until section 14, it already begins to exert an influence on the direction of the development.

A. Compactness and total boundedness
A helpful slogan is, "a compact set is almost finite, in a topological sense".A metric space X is said to be totally bounded exactly if, for any specified ϵ > 0, there is a finite set of points x 1 , . . ., x N ∈ X such that X is covered by the balls of radius ϵ centered at those points.A complete, totally bounded metric space is compact.Although not the usual definition, this is equivalent to the latter, and immediately captures the significance for our purposes.If (y i : i ∈ N) is any (not necessarily Cauchy!) sequence in a compact metric space X, then some subsequence converges to a point in X.
For example, any closed bounded interval [a, b] ⊂ R (−∞ < a ≤ b < ∞) is compact.The entire real line is not, since the sequence y i = i has no convergent subsequence.So, unboundedness is a way to avoid being compact.Another is having infinitely many dimensions.For instance, the closed unit ball of an infinite-dimensional Hilbert space is not compact.If {ψ i : i ∈ N} is an orthonormal basis, then, the sequence i → ψ i does not converge in norm.In an infinite dimensional Banach space, a compact set is both bounded and "almost finitedimensional" in being within any prescribed distance of some finite-dimensional affine subspace.

B. Total variation metric is a physically grounded candidate
The weaker a metric on D, the more compact sets it will have.We are thus motivated to consider metrics weaker than d, induced by the norm ∥•∥.Focusing on the standard interpretation, there is a particularly attractive possibility, namely the metric d 1 induced by L 1 norm.
Earlier, we argued that, topologically, one should start from σ(D, V ).If ∥ • ∥ 1∩3 is physically motivated, then any metric strictly between these two is also.This is not quite true of d 1 .As we shall see, d 1 is stronger than σ(D, V ) on F -bounded sets, but not globally.However, d 1 has strong independent physical credentials.
First, the L 1 norm hews tightly to the very concept of density, as an instrument for telling us how much "stuff" is in any specified region, whereas the L 3 norm is, as observed, really a proxy for something else.Indeed, if N (A), respectively N ′ (A), is the number of particles in region A according to density ρ, respectively ρ ′ , then This metric has a privileged place in probability theory (a probability measure taking the place of ρ), where it is known as total variation metric.
Secondly, the map dens from quantum mechanical states (density matrices) with the natural trace norm to Vec D is continuous with respect to total variation metric, but not L 3 norm.The former therefore has a direct link to the underlying quantum mechanics as well.
The next section examines what it takes to replace d by a weaker metric.The motivation for this lies in the possibility of convenient compact sets, but that theme will be put aside for now.

STRUCTURE AND REGULARITY II
This section is concerned with conditions (C1 -C3) under which we can replace d by a weaker metric, d 1 so that the product space V × D continues to enjoy (nearly all of) the favorable properties listed in Proposition 10.1, now with respect to the metric d ′ + d 1 .In the standard interpretation, d 1 is the L 1 metric.

A. Complete lower semicontinuity
In referring to a completion of D, axiom B3 makes reference to points outside D. With a weaker metric, we would have even more of these.We would like to avoid that, due to the physically dubious status of those points, and phrase everything in terms of D.Here we identify the concept to do this, which turns out to be the same as appears in item 1 of Prop.10.1.Thus, we achieve some unification at the same time.
Definition 13.1 (completely lower semicontinuous).A function f : (X , d) → R on a metric space is completely lower semicontinuous precisely if, for each M < ∞, the metric subspace {f ≤ M } ⊆ X is complete (Def.8.6).
Normally, we are interested in a fixed function on the set X and want to know whether f is completely lower semicontinuous with respect to d.If so, we say that d makes f completely lsc.
Here is the fundamental fact about this concept.c.If f is bounded above on the Cauchy sequence (x n ) ⊂ (X , d), then it has a limit x in X and f (x) ≤ lim inf n→∞ f (x n ).
In particular, a completely lsc function is lsc.
Proof.b ⇔ c is elementary.
a ⇒ c: Assume a, and let (x n ) be a Cauchy sequence in (X , d) on which f is bounded by, say, M < ∞.It has a limit x ∈ (X , d), and by a, f (x) ≤ M , and therefore x ∈ X .
c ⇒ a: Let (x n ) be a Cauchy sequence in (X , d) with limit x ∈ (X , d), such that lim inf f (x n ) < ∞. (If this condition fails, there is nothing to show).Take c > lim inf f (x n ).Then, the subsequence consisting of x m for which f (x m ) < c is bounded above and also converges to x. Apply (c) to this subsequence to conclude f By B3 and Lemma 13.1, the metric d itself satisfies these axioms.Of course, we have in mind a different, strictly weaker candidate for d 1 , namely the L 1 metric, motivated by compactness properties to be discussed in later sections.These new axioms actually render B3 redundant, because the properties in C2 and C3 are stable under strengthening d 1 .

C. Standard interpretation
The new ingredient here is d 1 .The standard interpretation is that d 1 is the metric induced by the L 1 norm ∥•∥ 1 , as discussed in the Interlude.This is the motivation for the subscript 1 on 'd 1 '.

D. Improved structure theorem
From now on, we assume A1 -A3,B1, B2, C1 -C3.The metric on D is d 1 , and this will continue to be the metric of interest in following sections.On the other hand, we continue to use the metric d ′ on V .Note that even if d 1 comes from a norm (which we do not require), V is generally smaller than the dual of (Vec D, d 1 ).The statement of the main proposition in this section is similar to that of Prop.10.1, but for the use of the new terminology.The use of d 1 instead of d indicates that the proposition is stronger than Prop.10.1, except for the minor point that we no longer obtain Lipschitz continuity of F on Z .The proof is given in Section 13 F.
Proof of C3.This is a consequence of lower semicontinuity of F as a function on L 1 (R 3 ) when extended as +∞ off D. See Thm.4.4 of Ref. 8 for details of the latter.

F. Proof of Proposition 13.3
A. E is locally L-continuous.
Proof.E depends only on v, and the norm on V has not changed, so this is the same as in section 10 E.
hence also σ(D, V )-bounded by C2.Therefore, by the Uniform Boundedness Principle, {∥ρ n ∥} is bounded.Proof.The proof is formally just like that for item E in section 10 E.
E. F is continuous on Z .
Proof.Special case of item D.

DENSITY CLUSTERING AND TIGHTNESS
This Section assumes A1-A3, B1,B2,C1 -C3.We aim for a simple criterion to guarantee that whenever (v n , ρ n ) is a sequence of ground pairs and v n → v, then ρ n → ρ with respect to d 1 .That would ensure that ρ is a ground density of v, by Prop.13.3.1.Actually, this is asking too much.Instead of asking for convergence of the sequence (ρ n ), we ask only that it cluster on a nonempty set D. This means that every density in D is the limit of a subsequence of (ρ n ), and every subsequence of (ρ n ) has a further subsequence converging to something in D. An alternative way to say the same thing is the following.Denote the d 1 closure of {ρ m } m≥n by T n .This is a decreasing (with n) sequence of closed sets, and the equivalent statement is that the limit (i.e., intersection ∩ n T n ) is precisely D. The important thing is that, every density in D is a ground density for v, in this case.

A. A first attempt
We begin our search for a criterion with Lemma 14.1.
Thus, for some M < ∞, {ρ n } ⊆ {F ≤ M }, which is a complete metric space under d 1 by C2.3.By hypothesis, {ρ n } is d 1 -totally bounded, so its closure in {F ≤ M } is compact.Therefore, there is a set D of densities such that (ρ n ) clusters on D. But, now we are dealing with sequences such that both components, v n and ρ n , converge.Prop.13.3.1 completes the proof.So, the density components cluster on ground densities of v if the sequence of densities is d 1 -totally bounded, and one is tempted to consider this condition to be the answer to our problem.Physically, though, it is not very simple or transparent.We will keep looking.
The first part of the Lemma says that the set {ρ n } is necessarily F -bounded.Therefore, what we should look for is a property of sets of densities which guarantees that it is d 1 -totally bounded as soon as it is F -bounded.For brevity, we will call such a property a compactness test.This is too special to be enshrined as an official definition.So, to repeat: P is a compactness test if every F -bounded set which is P (we use it as an adjective) is d 1 -totally bounded.Proposition 14.2.Let P be a compactness test, and (v n , ρ n ) ⊆ {∆ ≤ ϵ} a sequence such that v n → v.Then, if {ρ n } is P, the sequence (ρ n ) clusters on a set of ground densities for v.
Proof.Follows immediately from the definition of compactness test and Lemma 14.1.
The axioms do not seem very helpful in finding a compactness test, so we will look more closely at the special features of the standard interpretation.

B. Tightness
Let us approach the problem from a different angle.In standard interpretation, if the sequence of densities (ρ n ) is to converge, it is certainly necessary that the following hold: given arbitrary ϵ > 0, there is some sphere such that, from some point in the sequence on, ρ n puts less than particle number ϵ outside the sphere.Otherwise the sequence is "leaky" or "lossy" in the sense that some nonzero particle number is inexorably moving off to infinity.This necessary condition is called tightness.It is also sufficient.In conjunction with F -boundedness, guaranteed by Lemma 14.1, tightness implies d 1 -total boundedness.We pass to details.In using this notion, we are implicitly working in an interpretation, in particular of D, in which it makes sense.Tightness seems to be a property more easily reasoned about than d 1 -total boundedness.It is especially so if we are content with just a fair level of confidence, since a lot of quantum mechanical intuition can be brought to bear on it.Lemma 14.3.F -bounded tight subsets of D are d 1 -totally bounded.In other words, tightness is a compactness test in the standard interpretation.
Proof.There are four ingredients.1.The Rellich-Kondrachov theorem, a standard tool in theory of Sobolev spaces.(See, for example, Thm. 9.16 of Ref. [29].)For our purposes, it says: If Ω be a bounded subset of R n , and K ⊂ L 1 (Ω) such that both ∥f ∥ 1 and ∥∇f ∥ 1 are bounded over K, then K is totally bounded in L 1 (Ω).2. If A is a set of densities in {F ≤ M }, then ∥∇ρ∥ 1 is bounded over A. See (4.7) of Ref. [30]. 3. To apply the Rellich-Kondrachov theorem, we need to be able to ignore the tails of the densities.This is the role of tightness.The general principle is this.A set A in a metric space is totally bounded if, for any ϵ > 0, there is a totally bounded set K such that A is in the ϵ-dilation of K (every point of A within ϵ of K).Using this, given ϵ, take Ω to be the ball B(R) with R as in (14.1). 4. L 1 (Ω) is isometrically embedded in L 1 (R n ), so that a totally bounded subset of the former can be construed as a totally bounded subset of the latter.This result has a nice semiclassical interpretation.The idea is that a volume h N in phase space corresponds to one dimension in Hilbert space.Now, if F is tight, then densities in F come from states almost bounded in position, and the bound on F implies a bound on momentum.This gives us that dens −1 (A ∩ {F ≤ M }) is an "almost finite dimensional" set of density matrices, i.e., it is compact.Since the map dens : L 1 (H) → L 1 (R 3 ) is continuous, the image of that compact set is compact.
Finally, combining Lemma 14.3 and Lemma 14.1, we reach the objective of this section, and a major objective of the paper.

C. Interpretations with automatic tightness
There are at least a couple of interesting variations on the standard interpretation in which tightness is automatic, and therefore we require no condition on the sequence (ρ n ) in the above setting.One such is the case where M is not R 3 , but a three-torus, or more generally a closed manifold.In that case, L 1 (M) is isomorphic to L 1 ([0, 1] 3 ).If it is thought of that way, all sequences in D are tight.
Another case leaves everything as in the standard interpretation, except F , which is further specialized (beyond what the axioms say) to an exact functional with a repulsive interaction, and a background trap potential tending to +∞ as |x| → ∞.For example, a harmonic potential.In this case, the condition (14.1) is implied by F -boundedness.

RECAPITULATION
Here is a very brief, and necessarily imprecise, recapitulation of the findings, with emphasis on the standard interpretation and exact functionals, hence cutting out the axiomatic middlemen.
Kohn-Sham computation can be viewed as a walk on ground pairs in V × D. Indeed, the entire development is based on a commitment to think, explicitly, in this bivariate way.A simple iterative scheme, focusing on potential, is shown to make progress (with caveats) in the sense of being able to move to a density with lower excess energy ∆(v ⊙ , ρ) in the presence of the target potential.Somewhat surprisingly, no metrics on potential or density space is required to carry out that analysis.For a deeper treatment, in particular to discuss convergence questions, however, some metric or topological structure is necessary.With respect to the metric d ′ +d 1 on V ×D, the following hold: Ground energy E is continuous, while intrinsic energy F and excess energy ∆ are completely lower semicontinuous.F is also ∆-almost continuous.Thus, although F is unbounded above on every neighborhood, this phenomenon and possible unpleasant consequences are strongly mitigated as long as we restrict attention to the low intrinsic energy subspace, and F is even continuous on Z .Low excess energy pairs are close to the set Z of ground pairs, metrically.Conversely, ∆ increases only slightly when shifting the potential of a point in Z .(The corresponding statement with respect to density is absolutely not true, not even for the ∥ • ∥ metric.)If (v n , ρ n ) of ground pairs is such that v n → v ⊙ , then the densities automatically accumulate on ground densities of v ⊙ , as long as the density sequence does not have particle number drifting to infinity.

SOME CONCLUSIONS
This work aimed to bringing rigorous mathematical analysis of DFT a little closer to the computational practice of DFT, and in the process to get a more physical picture of both.It is based on a few simple ideas.First, the procedures and operations of KS computation should be physically interpreted.Second, the topologies (norms) on potential and density spaces entering a functional analytic theory also require physical grounding.Third, one should work explicitly in the product of potential and density space as much as possible.These are also, especially the last, conclusions as starting points.They are vindicated by the results achieved in taking them seriously.
A number of the results in this paper point to the somewhat ironic conclusion that more attention should be payed to potential in density functional theory.These are, primarily, the demonstration in section 6 E that an iterative scheme focusing on potential can make progress, with provisos, and the result, Prop.14.4, on automatic convergence of density.
) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality) d(x, y) > 0 ⇒ x ̸ = y d(x, y) = 0 ⇒ x = y The set together with the metric, (X, d) is a metric space.The open ball of radius r about x ∈ X is the set B(r; x) := {y ∈ X : d(x, y) < r} of points at distance less than r from x.If d and d ′ are two metrics on the same space, d ′ is stronger than d, written d ≾ d ′ , or d ′ ≿ d, if for every r > 0, there is r ′ > 0 such that B ′ (r ′ , x) ⊆ B(r, x) for every x.Here, B ′ denotes an open ball for d ′ .d and d ′ are equivalent, d ∼ d ′ , in case both d ≾ d ′ and d ′ ≾ d.These comparisons are significant for convergence of sequences.Three ways to express the same thing are: sequence (x n ) converges to x with respect to d, lim n→∞ d(x, x n ) = 0, and, for any r > 0, some tail of the sequence is inside B(r, x).Hence, d ≾ d ′ implies that every d ′ -convergent sequence is d-convergent.

Lemma 11 . 5 .
Every potential has a neighborhood U on which the maps v → ∆(v, ρ) are all locally L-continuous with local Lipschitz constants c + bF (ρ).Here, c may depend on the neighborhood, but b does not.Proof.Using local L-continuity of E, choose a neighborhood U on which |E(v ′ ) − E(v)| ≤ L∥v ′ − v∥ ′ .We confine our attention to U henceforth.By definition of excess energy apply Lemma 11.2 to bound ∥ρ∥ here by c + bF (ρ).Lemma 11.6.Any given potential has a neighborhood U on which the maps v → ∆(v, ρ) are all L-continuous with Lipschitz constants c + b inf v∈U ∆(v, ρ).

Definition 14 . 1
(tight).A set F of integrable functions on R d is tight if, for every ϵ > 0, there exists R such that for every f ∈ F, |x|>R |f (x)| dx < ϵ. (14.1)

TABLE II .
Basic feasible functions/operations, described in the text.• is the composition operator, π V extracts V component, and ⇀ indicates a partial (not everywhere defined) entries discussed to this point use only DE 0 and [DΦ] from the primitives of Table I.The others (namely, Φ, E 0 and ⟨• , •⟩) are needed for F HK and E HK Refer to discussion in section 8 C.
E. Proof of Proposition 10.1 . ∆ is completely lsc.Now we lift the restriction to {F ≤ M }.Suppose ((v n , ρ n )) ⊂ {∆ ≤ ϵ} is a Cauchy sequence with v n → v.Some tail of the sequence is in the neighborhood U of Prop.11.4, hence is F -bounded.So, without assuming F -boundedness, we get it anyway, and recover the situation of item B.